netdev
[Top] [All Lists]

Re: tx_timeout and timer serialisation

To: Andrew Morton <andrewm@xxxxxxxxxx>
Subject: Re: tx_timeout and timer serialisation
From: Jeff Garzik <jgarzik@xxxxxxxxxxxxxxxx>
Date: Sat, 20 May 2000 14:11:41 -0400
Cc: Andrey Savochkin <saw@xxxxxxxxxxxxx>, Donald Becker <becker@xxxxxxxxx>, netdev@xxxxxxxxxxx, Alan Cox <alan@xxxxxxxxxx>
Organization: MandrakeSoft
References: <3925BB00.B1CDDFE7@xxxxxxxxxxxxxxxx> <Pine.LNX.4.10.10005192039250.825-100000@xxxxxxxxxxxxx>, <Pine.LNX.4.10.10005192039250.825-100000@xxxxxxxxxxxxx>; from "Donald Becker" on Fri, May 19, 2000 at 08:48:15PM <20000520122715.A7682@xxxxxxxxxxxxx> <39262113.19447850@xxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
Andrew Morton wrote:
> I have just written a little kernel module which has confirmed that the
> handler-keeps-running-after-del_timer bug exists in both 2.2.14 and
> 2.3.99-pre9.  Not good.  Very not good, IMO.

This ties neatly together a public thread and a private thread.

>From what I can gather,
The timer semantics change which concerns Donald occurred when the new
timers and SMP were written (2.1.?).  In old 2.0 kernels, SMP in the
kernel context didn't really matter due to the BKL-related
synchronization.  When the new timers and SMP came about in 2.1.x days,
suddenly it was possible for a timer to be running on one CPU, after
del_timer successfully returned.

The 2.3.x timer->running change seems like not enough, because there is
still a race between the time the function calls timer_exit(), and the
time that the module can be unloaded.  In order to guarantee an accurate
timer_is_running() value, should timer_set_running() and timer_exit()
instead be called from the core kernel code, instead of the driver? 
Whenever the code is in the driver, there will be a small race between
timer_exit() time and the time when the timer function is actually
complete.

AFAICS from this, 2.2.x drivers might be exiting while their timer
routine is still running.  And 2.3.x drivers will do this too, until
every one is updated to call timer_set_running, timer_exit, and to check
timer_is_running.

Is that a correct assessment?

        Jeff




-- 
Jeff Garzik              | Liberty is always dangerous, but
Building 1024            | it is the safest thing we have.
MandrakeSoft, Inc.       |      -- Harry Emerson Fosdick

<Prev in Thread] Current Thread [Next in Thread>