On Fri, 19 Aug 2005 14:10:52 -0500
Ryan Harper <ryanh@xxxxxxxxxx> wrote:
> * Stephen Hemminger <shemminger@xxxxxxxx> [2005-08-18 17:36]:
> > On Thu, 18 Aug 2005 17:23:23 -0500
> > Ryan Harper <ryanh@xxxxxxxxxx> wrote:
> >
> > > * Stephen Hemminger <shemminger@xxxxxxxx> [2005-08-18 17:11]:
> > > > On Thu, 18 Aug 2005 16:40:36 -0500
> > > > Ryan Harper <ryanh@xxxxxxxxxx> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I've encountered several oops when adding and removing interfaces from
> > > > > bridges while using Xen. Most of the details are available [1]here.
> > > > > The short of it is the following sequence:
> > > >
> > > > Doesn't the mutex in RTNL work right? or are you calling
> > > > routines with out asserting it?
> > >
> > > unregister_netdevice asserts RTNL, add_del_if() in br_ioctl.c doesn't
> > > seem to do so. I don't see it down dev_get_by_index() path either. It
> > > looks like any caller of add_del_if() isn't asserting RTNL. The two
> > > callers I see are:
> > >
> > > br_dev_ioctl() in br_ioctl.c
> > > old_dev_ioctl() in br_ioctl.c
> >
> > But the pat to br_dev_ioctl() is via the socket ioctl and that
> > should already have gotten RTNL.
> >
> >
> > dev_ioctl
> > rtnl_lock()
> > dev_ifsioc()
> > dev->do_ioctl --> br_dev_ioctl
>
>
> Just to follow-up, the issue was a race between the call_rcu() callback
> for destroy_nbp() and an unregister_netdev() call. Sometimes the
> br_device_event() routine was triggered and destroy_nbp() had not been
> run yet leaving dev->br_port non-NULL to which br_device_event then
> correctly calls br_del_if().
>
> We caused this by issuing a brctl delif from userspace scripts and
> having a in kernel handler invoke unregister_netdev() call.
>
> Our fix is to not bother calling brctl delif because the
> unregister_netdev() call will automatically remove the device from the
> bridge when the notify_call_chain() kicks in from
> unregister_netdevice().
I'll get back to you, this needs some review, I have a bunch of old
test suites to dig up for it.
|