Jeff Garzik wrote:
> Andrew Morton wrote:
> > I have just written a little kernel module which has confirmed that the
> > handler-keeps-running-after-del_timer bug exists in both 2.2.14 and
> > 2.3.99-pre9. Not good. Very not good, IMO.
> This ties neatly together a public thread and a private thread.
> >From what I can gather,
> The timer semantics change which concerns Donald occurred when the new
> timers and SMP were written (2.1.?). In old 2.0 kernels, SMP in the
> kernel context didn't really matter due to the BKL-related
> synchronization. When the new timers and SMP came about in 2.1.x days,
> suddenly it was possible for a timer to be running on one CPU, after
> del_timer successfully returned.
Sounds right. The issue now is that the accepted API for timers is *so
magical* that it's very hard to fix them.
- Timer handlers can readd timers
- timer handlers can delete timers
- timer handlers can kfree() the memory which contained the darn
this means that neither del_timer() nor run_timer_list() is allowed to
touch the timer_struct after (or even during) the handler run.
I think the sanest thing to do is to separate out the timer storage
management: alloc_timer(), free_timer(), register_timer,
unregister_timer(). Botching sane handling into the current permissive
API isn't proving pretty.
I do have a mostly-fix coded up. It maintains state arrays:
running_timers[this_cpu] = timer;
running_timers[this_cpu] = 0;
This is fairly sane for 2 CPUs, but there may be problems with three:
CPU0 is running the handler
CPU1 is spinning in del_timer
CPU2 re-adds the timer
I think I'm not detecting CPU2's action when the del_timer spin
We really, really need to be able to touch the timer_struct after the
handler has run, so we can maintain state within it. Then we can manage
the storage and refcount the timers. (As an old C++ hack, this sort of
crap makes me want to scream. It would be _so_ easy...)
> The 2.3.x timer->running change seems like not enough, because there is
> still a race between the time the function calls timer_exit(), and the
> time that the module can be unloaded. In order to guarantee an accurate
> timer_is_running() value, should timer_set_running() and timer_exit()
> instead be called from the core kernel code, instead of the driver?
Yes it should. But the problem is, the timer_exit() function can't be
performed by the core kernel because the handler could have kfreed()'d
(or recycled) the memory which contains the timer. This is why
run_timer_list() is careful to not touch the timer after calling the
handler. And this is why my proto-patch maintans state outside the
> Whenever the code is in the driver, there will be a small race between
> timer_exit() time and the time when the timer function is actually
> AFAICS from this, 2.2.x drivers might be exiting while their timer
> routine is still running. And 2.3.x drivers will do this too, until
> every one is updated to call timer_set_running, timer_exit, and to check
> Is that a correct assessment?
Yes, but there are several other, more serious failure scenarios with
the current code:
This one's the worst. The timer is pending, but the timer_struct is in
kfree'ed memory. As Alexey would say, "Your kernel has been destroyed
secretly and fatally".
The current del_timer_sync() is close to being OK. But its problem is
that after timer_synchronise() returns, the timer_struct may have been
kfree'ed by the handler, and del_timer_sync then reads and writes it,
potentially corrupting something it doesn't own.
Anyway, I've put a very-proto-patch up at
and I have updated the timertest module:
(the kernel thread now shows up correctly in ps. I had to set
current->mm->arg_start = current->mm->arg_end = 0;
timer.patch makes all del_timer()s synchronous. del_timer_sync() maps
onto the synchronous del_timer(). exit_timer() is a no-op. It's ugly,
not obviously correct, but the fastpath is still fast (could be
faster). It does introduce the potential for deadlocks which Alexey
del_timer() [ spins forever ]
This is manageable, but has to be fixed on a case-by-case basis.
The alternative to all of the above is to simply throw up our hands in
horror and start again:
- kernel maintains a freelist of struct new_timer_structs
- new_timer_struct is refcounted
- New API:
free_timer(new_timer_struct *) (refcounted)
mnm:/usr/src/linux-2.3.99-pre9-2> egrep -r "add_timer|mod_timer" . | wc