netdev
[Top] [All Lists]

Re: NAPI note

To: jamal <hadi@xxxxxxxxxx>
Subject: Re: NAPI note
From: Jeff Garzik <jgarzik@xxxxxxxxx>
Date: Tue, 18 Feb 2003 22:14:55 -0500
Cc: Manfred Spraul <manfred@xxxxxxxxxxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxx>, zaitcev@xxxxxxxxxx, jbourne@xxxxxxxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <20030218212441.M25195@xxxxxxxxxxxxxxxx>
Organization: none
References: <3E4D66DF.3040800@xxxxxxxxxxxxxxxx> <3E4D8295.2050400@xxxxxxxxx> <20030217.185719.28797590.davem@xxxxxxxxxx> <3E525FD8.1060009@xxxxxxxxxxxxxxxx> <20030218212441.M25195@xxxxxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021213 Debian/1.2.1-2.bunk
jamal wrote:

On Tue, 18 Feb 2003, Manfred Spraul wrote:


David S. Miller wrote:


 From: Jeff Garzik <jgarzik@xxxxxxxxx>
 Date: Fri, 14 Feb 2003 18:58:13 -0500

 Manfred Spraul wrote:
 > It seems to be a generic NAPI restriction:
 > The caller of netif_receive_skb() must not own a spinlock that is
 > acquired from an interrupt handler.

 Thanks much for noticing this, Manfred.

I think this logic is buggy.

In the example I've seen posted, only a NAPI implementation bug
could cause the situation to occur.

If cpu1 is in ->poll() for the driver, then by definition the
device shall not cause interrupts.  The device's interrupts
are disabled before we enter the ->poll() handler, and as such
the "cpu2 take device interrupt and takes driver->lock" cannot
occur.



No. I think the rule is that drivers that use the NAPI interface must
not cause interrupts for packet receive and out-of-rx-buffers conditions.


Ah, but that is only one of two rules.
Theres other drivers which dont follow this rule and just shutdown
all interupt sources. I know that the e1000 for example does this.
I am not sure about the tg3. I think the doc says this but may not
emphasize it as strongly.
So if tg3 uses method 2 then its as Dave says - a bug.

tg3 shuts down all interrupt sources, and handles all interrupt events in dev->poll().

David and I hashed it out a bit on IRC. The problem is that deliver_to_old_ones() waits, and thus the deadlock that Manfred described. For 2.4.x, the solution is simply to avoid the deadlock in the driver. For 2.5.x, David hinted that deliver_to_old_ones() may be going away.


But what about media error interrupts, or tx interrupts? Or MIB counter
overflow, etc. What about shared pci interrupts?


Shared interupts should be interesting actually.
However if you are in poll mode and you receive an interupt you should be
able to quickly determine its not yours without much effect on shared
locks, no?

Normally, yes. However tg3 grabs a lock just about anytime it does anything. ;-) A long term project of mine is to slowly remove these locks, but that must wait until the driver stabilizes, and is overall a long process. Most of the locks _are_ removeable, but we keep hit deadlock bugs like this, and hardware bugs which need workarounds, so those come first.


All of them could occur, and if they take a spinlock that is held across
netif_receive_skb(), then it can deadlock.



yes this could happen with method 1 of programming the driver; however,
tx, receive, link are essentially separate threads and would hardly share
locks.

They do in tg3's case.

The locks can be removed eventually, but such is the state of life right now.

        Jeff




<Prev in Thread] Current Thread [Next in Thread>