netdev
[Top] [All Lists]

Re: tx_timeout and timer serialisation

To: Rusty Russell <rusty@xxxxxxxxxxxxxxxx>
Subject: Re: tx_timeout and timer serialisation
From: Andrew Morton <andrewm@xxxxxxxxxx>
Date: Tue, 30 May 2000 18:16:52 +1000
Cc: kuznet@xxxxxxxxxxxxx, netdev@xxxxxxxxxxx
References: Your message of "Thu, 11 May 2000 17:09:29 +0400." <200005111309.RAA32574@xxxxxxxxxxxxx> <20000530043810.48AAA8189@halfway>
Sender: owner-netdev@xxxxxxxxxxx
Rusty Russell wrote:
> 
> In message <200005111309.RAA32574@xxxxxxxxxxxxx> you write:
> > Actually, existing *_timer primitives are very inconvenient.
> > And I did not find any good way to improve them. Essentially,
> > del_timer_sync(), timer->running and mod_timer() returning
> > value are all that I was able to do.
> 
> There's still a tiny race with unloading modules between timer_exit()
> and return in the timer handler.

Yep.

> A better interface would be to make the timerfn return a pointer to
> the timer:
> 
>         struct timer_list *function(unsigned long data);
> 
> Get rid of the braindead timer_exit() macro, and inside
> run_timer_list, do:
> 
> #ifdef CONFIG_SMP
>         if (timer->function(timer->data)) {
>                 timer->running = 0;
>                 mb();
>         }
> #else
>         timer->function(timer->data);
> #endif

Yes.  My patch (which is going out in about two hours...) has the same
characteristics, but records the fact that a timer is running by writing
a pointer to it into a global var in kernel/timer.c.  In del_timer, spin
until that pointer is not equal to the timer-being-deleted.

So my patch does not require every timer handler to be visited.  But
this is not necessarily an advantage!  They _all_ need looking at.  It
doesn't take very long - I've audited about 100 del_timer calls over
three evenings  (500 to go) but some of the fixes are non-obvious.

> Then del_timer_sync() becomes the default, and del_timer_async() is
> used for self-deleting timers and special effects.

Yes.  I talked Alan into this idea, but now I don't think it's prudent. 
It seems that about 70% of calls to del_timer are racy and 20% are
deadlocky :(  Some are both racy with async and deadlocky with sync!  So
either way we lose.

What I will now propose is:

- Fix del_timer_sync
- Rename del_timer to del_timer_async
- #define del_timer del_timer_async

OK, a no-op so far.

- Spend a couple of weeks working on net, scsi, ide, drivers/net
migrating all del_timer calls to either del_timer_async or
del_timer_sync.

- Change the #define to #define del_timer del_timer_sync, see what
happens.  There will be some deadlocks, but the fancy deadlock detection
code will catch them quickly.

- Spend the next year banishing del_timer _completely_.  So the presence
of del_timer indicates "unaudited, possibly buggy" code.

> (BTW, returning a pointer not an int so bugs like `kfree(timer);
> return timer;' are more obvious).
> 
> So can we live with the current braindeath in 2.4?

Ship with known bugs? Please, no.  We need to at least fix them in areas
which are important to SMP.  core, net, disk.

<Prev in Thread] Current Thread [Next in Thread>