netdev
[Top] [All Lists]

serious netpoll bug w/NAPI

To: netdev@xxxxxxxxxxx
Subject: serious netpoll bug w/NAPI
From: "David S. Miller" <davem@xxxxxxxxxxxxx>
Date: Tue, 8 Feb 2005 20:16:34 -0800
Cc: mpm@xxxxxxxxxxx
Sender: netdev-bounce@xxxxxxxxxxx
Consider a NAPI device currently executing it's poll function,
pushing SKBs into the networking stack.

Some of these will generate response packets etc.

If for some reason a printk() is generated by the packet processing
and:

1) the netconsole output device is the same as the NAPI device
   processing packets

2) netif_queue_stopped() is true because the tx queue is full

the netpoll code will recurse back into the driver's poll function.
This is incredibly illegal and results in all kinds of driver state
corruption.  ->poll() must execute only once at a time.

This situation is actually quite common, via the ipt_LOG.c packet
logging module.

What the netpoll code appears to be trying to do is get the TX
queue to make forward progress by invoking ->poll() if pending.
The trouble is, that ->poll() at the top level will not clear the
__LINK_STATE_RX_SCHED bit and delete itself from the poll list 
until it is done with ->poll() processing.

So we get backtraces like:

tg3_rx()
tg3_poll()
poll_napi()
netpoll_poll()
write_msg()
..
printk()
...
ip_rcv()
...
netif_receive_skb()
tg3_rx()
tg3_poll()
net_rx_action()
__do_softirq()
do_softirq()

resulting in RX queue corruption in the driver and usually
NULL skb pointer dereferences.

<Prev in Thread] Current Thread [Next in Thread>