Received: with ECARTIS (v1.0.0; list netdev); Thu, 18 Aug 2005 14:43:50 -0700 (PDT) Received: from e35.co.us.ibm.com (e35.co.us.ibm.com [32.97.110.133]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j7ILhgH9024840 for ; Thu, 18 Aug 2005 14:43:44 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j7ILf048547928 for ; Thu, 18 Aug 2005 17:41:04 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j7ILeaJ7396302 for ; Thu, 18 Aug 2005 15:40:36 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j7ILebfp000915 for ; Thu, 18 Aug 2005 15:40:37 -0600 Received: from localhost.localdomain (frylock.austin.ibm.com [9.53.91.14]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j7ILebbk000864; Thu, 18 Aug 2005 15:40:37 -0600 Received: by localhost.localdomain (Postfix, from userid 1000) id A3C3A93764; Thu, 18 Aug 2005 16:40:36 -0500 (CDT) Date: Thu, 18 Aug 2005 16:40:36 -0500 From: Ryan Harper To: shemminger@osdl.org Cc: netdev@oss.sgi.com Subject: Possible race with br_del_if() Message-ID: <20050818214036.GH10593@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.6+20040907i X-archive-position: 3510 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ryanh@us.ibm.com Precedence: bulk X-list: netdev Content-Length: 1922 Lines: 58 Hello, I've encountered several oops when adding and removing interfaces from bridges while using Xen. Most of the details are available [1]here. The short of it is the following sequence: CPU0 CPU1 add_del_if() unregister_netdevice() br_del_if() notifier_call_chain(NETDEV_UNREGISTER) del_nbp() br_stp_disable_port() // port->state == BR_STATE_DISABLED br_device_event() // dev->br_port != NULL yet // event is NETDEV_UNREGISTER br_del_if() sysfs_remove_dir(p) kobject_del() dget(dentry) BUG_ON(!atomic_read(&dentry->d_count) This sequence doesn't happen all of the time. In many cases, CPU0 moves along right into destroy_nbp() which sets dev->br_port = NULL, and be_device_event check (p == NULL) hits and a second br_del_if() isn't called. The attached patch is a workaround for the double case, but I'm not sure if is the right way to deal with this issue, or if it any issue at all. 1. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=90 -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com diffstat output: br_if.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Signed-off-by: Ryan Harper --- Simple workaround for double call to br_del_if(). Signed-off-by: Ryan Harper --- linux-2.6.12/net/bridge/br_if.c 2005-06-17 14:48:29.000000000 -0500 +++ linux-2.6.12-xen0-smp/net/bridge/br_if.c 2005-08-18 15:17:27.302615846 -0500 @@ -382,7 +382,7 @@ { struct net_bridge_port *p = dev->br_port; - if (!p || p->br != br) + if (!p || p->br != br || p->state == BR_STATE_DISABLED) return -EINVAL; br_sysfs_removeif(p);