jamal writes:
> I didnt follow that discussion; archived for later entertaining reading.
> My take on it was it is 2.6.x related and in particular the misbehavior
> observed has to do with use of rcu in the route cache.
>
> > It appears this problem became worse in 2.6 with HZ=1000, because now
> > the napi rx softirq work is being done 10X as much on return from the
> > timer interrupt. I'm not sure if a solution was reached.
>
> Robert?
Well it's a general problem controlling softirq/user and the RCU locking
put this on our agenda as the dst hash was among the first applications
to use the RCU locking. Which in turn had problem doing progress in hard
softirq environment which happens during route cache DoS.
NAPI is a part of RX_SOFTIRQ which is well-behaved. NAPI addresses only
irq/sofirq problem and is totally innocent for do_sofirq() run from other
parts of kernel causing userland starvation.
Under normal hi-load conditions RX_SOFTIRQ schedules itself when the
netdev_max_backlog is done. do_softirq sees this and defers execution
to ksoftirqd and things get under (scheduler) control.
During route DoS, code that does a lot do_softirq() is run for hash and
fib-lookup, GC etc. The effect is that ksoftirqd is more or less bypassed.
Again it's a general problem... We are just the unlucky guys getting
into this.
I don't know if packet capture tests done by Luca ran into this problems.
A profile could have helped...
Cheers.
--ro
|