Received: with ECARTIS (v1.0.0; list netdev); Mon, 31 Jan 2005 03:34:29 -0800 (PST) Received: from arnor.apana.org.au (mail@arnor.apana.org.au [203.14.152.115]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0VBYJfB006739 for ; Mon, 31 Jan 2005 03:34:20 -0800 Received: from gondolin.me.apana.org.au ([192.168.0.6] ident=mail) by arnor.apana.org.au with esmtp (Exim 3.35 #1 (Debian)) id 1CvZoX-0004jR-00; Mon, 31 Jan 2005 22:33:53 +1100 Received: from herbert by gondolin.me.apana.org.au with local (Exim 3.36 #1 (Debian)) id 1CvZo6-0001Bz-00; Mon, 31 Jan 2005 22:33:26 +1100 From: Herbert Xu To: okir@suse.de (Olaf Kirch) Subject: Re: [PATCH] arp_queue: serializing unlink + kfree_skb Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Organization: Core In-Reply-To: <20050131102920.GC4170@suse.de> X-Newsgroups: apana.lists.os.linux.netdev User-Agent: tin/1.7.4-20040225 ("Benbecula") (UNIX) (Linux/2.4.27-hx-1-686-smp (i686)) Message-Id: Date: Mon, 31 Jan 2005 22:33:26 +1100 X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 1070 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: herbert@gondor.apana.org.au Precedence: bulk X-list: netdev Content-Length: 1855 Lines: 49 Olaf Kirch wrote: > > The problem is that IBM testing was hitting the assertion in kfree_skb > that checks that the skb has been removed from any list it was on > ("kfree_skb passed an skb still on a list"). That must've be some testing to catch this :) > One possible fix here would be to add an smp_wmb after the __skb_unlink > and a smp_rmb before the assertion in __kfree_skb, as in the attached > patch. > > Does this make sense? It makes sense. However, I'm not sure whether we want to add a read barrier to the common path in kfree_skb just for a debugging test. If it was only for the skb->list test we could move the read barrier inside the if and reread skb->list if it were non-NULL. What you've done is expose a much bigger problem in Linux. We're using atomic integers to signal that we're done with an object. The object is usually represented by a piece of memory. The problem is that in most of the places where we do this (and that's not just in the networking systems), there are no memory barriers between the last reference to that object and the decrease on the atomic counter. For example, in this particular case, a more sinister (but probably impossible for sk_buff objects) problem would be for the list removal itself to be delayed until after the the kfree_skb. This could potentially mean that we're reading/writing memory that's already been freed. Perhaps we should always add a barrier to such operations. So kfree_skb would become if (atomic_read(&skb->users) != 1) { smp_mb__before_atomic_dec(); if (!atomic_dec_and_test(&skb->users)) return; } __kfree_skb(skb); Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt