This can get pretty hairy.
Suppose the linkwatch code backs-off in the case that rtnl_sem is held
legitimately by thread A. Meanwhile, thread B is doing a
flush_scheduled_work in order to wait for pending linkwatch events to
complete.
In the proposed solution this will result in incorrect behaviour
(flush_scheduled_work returns with the linkwatch work not really done).
(Admittedly I'm not sure if such a scenario really is feasible.)
My initial though was to use a seperate work-queue, un-entangled with
the global queue used for flush_scheduled_work. This would allow
linkwatch events to be synchronized against explicitly. For this
solution though I think it would be nice to not have to have a thread
per cpu for the linkwatch work queue.
On the other hand, ic_open_devs appears to be the only place where
rtnl_sem is held while going into a driver's open() function, and so
maybe the right rule is that rtnl_sem is not held when calling
dev->open().
--
Michal Ostrowski <mostrows@xxxxxxxxxxxxxx>
On Mon, 2004-01-05 at 09:50, Stefan Rompf wrote:
> Am Montag, 05. Januar 2004 14:07 schrieb Michal Ostrowski:
>
> > ic_open_devs grabs rtnl_sem with an rtnl_shlock() call.
> >
> > The sungem driver at some point calls gem_init_one, which calls
> > netif_carrier_*, which in turn calls schedule_work (linkwatch_event).
> >
> > linkwatch_event in turn needs rtnl_sem.
>
> Good catch! The sungem driver shows clearly that we need some way to remove
> queued work without scheduling and waiting for other events.
>
> I will change the linkwatch code to use rtnl_shlock_nowait() and backoff and
> retry in case of failure this week. Call it a workaround, but it increases
> overall system stability.
>
> Btw, what is the planned difference between rtnl_shlock() and rtnl_exlock()?
> Even though the later is a null operation right now, I don't want to hold
> more locks than needed in the linkwatch code.
>
> Stefan
>
signature.asc
Description: This is a digitally signed message part
|