Received: with ECARTIS (v1.0.0; list netdev); Sun, 30 Jan 2005 18:24:25 -0800 (PST) Received: from cheetah.davemloft.net (mail@adsl-63-197-226-105.dsl.snfc21.pacbell.net [63.197.226.105]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0V2OIaH027125 for ; Sun, 30 Jan 2005 18:24:18 -0800 Received: from localhost ([127.0.0.1] helo=cheetah.davemloft.net ident=davem) by cheetah.davemloft.net with smtp (Exim 3.36 #1 (Debian)) id 1CvR9S-0003NP-00 for ; Sun, 30 Jan 2005 18:18:54 -0800 Date: Sun, 30 Jan 2005 18:18:54 -0800 From: "David S. Miller" To: netdev@oss.sgi.com Subject: LLTX fix proposal Message-Id: <20050130181854.7f088a95.davem@davemloft.net> X-Mailer: Sylpheed version 1.0.0 (GTK+ 1.2.10; sparc-unknown-linux-gnu) X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 1029 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: davem@davemloft.net Precedence: bulk X-list: netdev Content-Length: 1382 Lines: 44 Ok, how about we do this for 2.6.11 before it goes out and do something more sophisticated (if necessary) later. The BUG() we're trying to catch in the ->hard_start_xmit() routines is the illegal state: driver_tx_queue_empty() && !netif_queue_stopped(dev) Therefore we could handle the race, and avoid the printk() in the race case but not in the BUG() case above. The test in tg3.c is currently: /* This is a hard error, log it. */ if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) { netif_stop_queue(dev); spin_unlock_irqrestore(&tp->tx_lock, flags); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n", dev->name); return NETDEV_TX_BUSY; } and I'm proposing we change it to something like: if (unlikely(TX_BUFFS_AVAIL(tp) <= (skb_shinfo(skb)->nr_frags + 1))) { /* We can race with queue processing on another * cpu due to LLTX. If the queue is not stopped, * that is a hard error, log it. */ if (!netif_queue_stopped(dev)) { netif_stop_queue(dev); printk(KERN_ERR PFX "%s: BUG! Tx Ring full when " "queue awake!\n", dev->name); } spin_unlock_irqrestore(&tp->tx_lock, flags); return NETDEV_TX_BUSY; } Any objections? Again, I'm not saying %100 this is what we should do long-term. It's meant to be correct and eliminate the bogus log messages when the LLTX race is hit.