On Tue, 30 Sep 2003 10:27:08 -0700
"Feldman, Scott" <scott.feldman@xxxxxxxxx> wrote:
> At this point, I'm leaning towards removing the offending code in the
> timer callback now, and taking a step back to solve the bigger problem,
> either with a better locking scheme, or a new plan on how to flush the
> "stuck" work. We don't need kernel panics when you trip over the
> Ethernet cable! Sound like a plan?
Why do you even need to use IRQ locking here?
Your e1000 netdev->hard_start_xmit method doesn't need to do anything
special, why does this timer code? I suppose you need to synchronize
with e1000_clean_tx_irq() in the non-NAPI case right? If so, that's
not being accomplished by what your code is doing. If nobody else
takes that xmit_lock in an IRQ disabling manner, the e1000 timer code
doing so doesn't make any difference.
I have an idea for attacking the problem, once you figure out what
kind of locking you really need. Do whatever you need to do to
synchronize on the hardware side, but instead of directly freeing
the SKB, add each one to a list. A pointer to the head of this list
is stored on the stack of the timer routine, and passed down into
the TX purger.
Then at the top level you can drop all your locks, re-enable hw IRQs
and whatever else you need to do, then pass the SKBs in the list off
to dev_kfree_skb_irq() (this is the appropriate routine to call to
free an SKB from a timer handler, which runs in soft interrupt