netdev
[Top] [All Lists]

Re: tx_timeout and timer serialisation

To: Jeff Garzik <jgarzik@xxxxxxxxxxxxxxxx>
Subject: Re: tx_timeout and timer serialisation
From: Andrew Morton <andrewm@xxxxxxxxxx>
Date: Sun, 21 May 2000 06:14:52 +1000
Cc: Andrey Savochkin <saw@xxxxxxxxxxxxx>, Donald Becker <becker@xxxxxxxxx>, netdev@xxxxxxxxxxx, Alan Cox <alan@xxxxxxxxxx>
References: <3925BB00.B1CDDFE7@xxxxxxxxxxxxxxxx> <Pine.LNX.4.10.10005192039250.825-100000@xxxxxxxxxxxxx>, <Pine.LNX.4.10.10005192039250.825-100000@xxxxxxxxxxxxx>; from "Donald Becker" on Fri, May 19, 2000 at 08:48:15PM <20000520122715.A7682@xxxxxxxxxxxxx> <39262113.19447850@xxxxxxxxxx> <3926D55D.766781B2@xxxxxxxxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
Jeff Garzik wrote:
> 
> Andrew Morton wrote:
> > I have just written a little kernel module which has confirmed that the
> > handler-keeps-running-after-del_timer bug exists in both 2.2.14 and
> > 2.3.99-pre9.  Not good.  Very not good, IMO.
> 
> This ties neatly together a public thread and a private thread.
> 
> >From what I can gather,
> The timer semantics change which concerns Donald occurred when the new
> timers and SMP were written (2.1.?).  In old 2.0 kernels, SMP in the
> kernel context didn't really matter due to the BKL-related
> synchronization.  When the new timers and SMP came about in 2.1.x days,
> suddenly it was possible for a timer to be running on one CPU, after
> del_timer successfully returned.

Sounds right.   The issue now is that the accepted API for timers is *so
magical* that it's very hard to fix them.

- Timer handlers can readd timers
- timer handlers can delete timers
- timer handlers can kfree() the memory which contained the darn
timer_struct.

this means that neither del_timer() nor run_timer_list() is allowed to
touch the timer_struct after (or even during) the handler run.

I think the sanest thing to do is to separate out the timer storage
management: alloc_timer(), free_timer(), register_timer,
unregister_timer().  Botching sane handling into the current permissive
API isn't proving pretty.

I do have a mostly-fix coded up.  It maintains state arrays:

timer_struct *running_timers[NR_CPUS];

run_timer_list()
{
        running_timers[this_cpu] = timer;
        timer.function();
        running_timers[this_cpu] = 0;
}

del_timer()
{
        if (timer_is_in(running_timers))
                spin_until_it_isnt();
        do_funky_locking()
        maybe_spin_again()
}

This is fairly sane for 2 CPUs, but there may be problems with three:

CPU0 is running the handler
CPU1 is spinning in del_timer
CPU2 re-adds the timer

I think I'm not detecting CPU2's action when the del_timer spin
completes.

We really, really need to be able to touch the timer_struct after the
handler has run, so we can maintain state within it.  Then we can manage
the storage and refcount the timers.  (As an old C++ hack, this sort of
crap makes me want to scream.   It would be _so_ easy...)

> The 2.3.x timer->running change seems like not enough, because there is
> still a race between the time the function calls timer_exit(), and the
> time that the module can be unloaded.  In order to guarantee an accurate
> timer_is_running() value, should timer_set_running() and timer_exit()
> instead be called from the core kernel code, instead of the driver?

Yes it should.  But the problem is, the timer_exit() function can't be
performed by the core kernel because the handler could have kfreed()'d
(or recycled) the memory which contains the timer.  This is why
run_timer_list() is careful to not touch the timer after calling the
handler.  And this is why my proto-patch maintans state outside the
struct timer_list.

> Whenever the code is in the driver, there will be a small race between
> timer_exit() time and the time when the timer function is actually
> complete.
> 
> AFAICS from this, 2.2.x drivers might be exiting while their timer
> routine is still running.  And 2.3.x drivers will do this too, until
> every one is updated to call timer_set_running, timer_exit, and to check
> timer_is_running.
> 
> Is that a correct assessment?

Yes, but there are several other, more serious failure scenarios with
the current code:

mainline()                       handler()
==========                       =========
                                 enter xxx_timer()
del_timer()
kfree(some_resource)
                                 access(some_resource)

mainline()                       handler()
==========                       =========
                                 enter xxx_timer()
del_timer()
kfree(timer)
                                 add_timer(timer)


This one's the worst.  The timer is pending, but the timer_struct is in
kfree'ed memory.  As Alexey would say, "Your kernel has been destroyed
secretly and fatally".



The current del_timer_sync() is close to being OK.  But its problem is
that after timer_synchronise() returns, the timer_struct may have been
kfree'ed by the handler, and del_timer_sync then reads and writes it,
potentially corrupting something it doesn't own.

Anyway, I've put a very-proto-patch up at

        http://www.uow.edu.au/~andrewm/timer.patch

and I have updated the timertest module:

        http://www.uow.edu.au/~andrewm/timertest.tar.gz

(the kernel thread now shows up correctly in ps. I had to set

        current->mm->arg_start = current->mm->arg_end = 0;

Thanks, Andrey!)

timer.patch makes all del_timer()s synchronous.   del_timer_sync() maps
onto the synchronous del_timer().  exit_timer() is a no-op.  It's ugly,
not obviously correct, but the fastpath is still fast (could be
faster).  It does introduce the potential for deadlocks which Alexey
identified:

mainline()                 handler()
==========                 =========

spin_lock(some_lock)
                           spin_lock(some_lock)
del_timer()  [ spins forever ]

This is manageable, but has to be fixed on a case-by-case basis.




The alternative to all of the above is to simply throw up our hands in
horror and start again:

- kernel maintains a freelist of struct new_timer_structs
- new_timer_struct is refcounted
- New API:

        new_timer_struct *alloc_timer()
        free_timer(new_timer_struct *)  (refcounted)
        register_timer(new_timer_struct *)
        unregister_timer(new_timer_struct *)


mnm:/usr/src/linux-2.3.99-pre9-2> egrep -r "add_timer|mod_timer" . | wc
-l
    696

Any volunteers?

-- 
-akpm-

<Prev in Thread] Current Thread [Next in Thread>