Received: with ECARTIS (v1.0.0; list netdev); Thu, 18 Aug 2005 15:13:59 -0700 (PDT) Received: from smtp.osdl.org (smtp.osdl.org [65.172.181.4]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j7IMDnH9027662 for ; Thu, 18 Aug 2005 15:13:50 -0700 Received: from shell0.pdx.osdl.net (fw.osdl.org [65.172.181.6]) by smtp.osdl.org (8.12.8/8.12.8) with ESMTP id j7IMBTjA013484 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Thu, 18 Aug 2005 15:11:30 -0700 Received: from dxpl.pdx.osdl.net (dxpl.pdx.osdl.net [10.8.0.74]) by shell0.pdx.osdl.net (8.13.1/8.11.6) with ESMTP id j7IMBSX7011574; Thu, 18 Aug 2005 15:11:29 -0700 Date: Thu, 18 Aug 2005 15:12:02 -0700 From: Stephen Hemminger To: Ryan Harper Cc: netdev@oss.sgi.com Subject: Re: Possible race with br_del_if() Message-ID: <20050818151202.6fe6ded4@dxpl.pdx.osdl.net> In-Reply-To: <20050818214036.GH10593@us.ibm.com> References: <20050818214036.GH10593@us.ibm.com> X-Mailer: Sylpheed-Claws 1.9.13 (GTK+ 2.6.7; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-MIMEDefang-Filter: osdl$Revision: 1.114 $ X-Scanned-By: MIMEDefang 2.36 X-archive-position: 3511 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: shemminger@osdl.org Precedence: bulk X-list: netdev Content-Length: 1427 Lines: 35 On Thu, 18 Aug 2005 16:40:36 -0500 Ryan Harper wrote: > Hello, > > I've encountered several oops when adding and removing interfaces from > bridges while using Xen. Most of the details are available [1]here. > The short of it is the following sequence: Doesn't the mutex in RTNL work right? or are you calling routines with out asserting it? > CPU0 CPU1 > add_del_if() unregister_netdevice() > br_del_if() notifier_call_chain(NETDEV_UNREGISTER) > del_nbp() > br_stp_disable_port() // port->state == BR_STATE_DISABLED > br_device_event() // dev->br_port != NULL yet > // event is NETDEV_UNREGISTER > br_del_if() > sysfs_remove_dir(p) > kobject_del() > dget(dentry) > BUG_ON(!atomic_read(&dentry->d_count) > > This sequence doesn't happen all of the time. In many cases, CPU0 moves > along right into destroy_nbp() which sets dev->br_port = NULL, and > be_device_event check (p == NULL) hits and a second br_del_if() isn't > called. > > The attached patch is a workaround for the double case, but I'm not sure > if is the right way to deal with this issue, or if it any issue at all. > > 1. http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=90 >