netdev
[Top] [All Lists]

Re: tx_timeout and timer serialisation

To: Andrew Morton <andrewm@xxxxxxxxxx>
Subject: Re: tx_timeout and timer serialisation
From: Donald Becker <becker@xxxxxxxxx>
Date: Fri, 19 May 2000 15:37:46 -0400 (EDT)
Cc: netdev@xxxxxxxxxxx
In-reply-to: <39240EA9.389AE2FC@xxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
On Fri, 19 May 2000, Andrew Morton wrote:
> Donald Becker wrote:
> > > kuznet@xxxxxxxxxxxxx wrote:
> > > > Timers are self-destructable as rule. See? Normal usage
> > > > for timer is to have it allocated inside an object and
> > > > timer event detroys the object together with timer.
> > I'm curious -- what code does this?
> me too.

I think this is the crux of this specific problem: someone has assumed that
  - timers deal only with objects allocated specifically for that instance,
  - the timer function always frees the object.

This assumption is flawed
  - it's not the way timers are used
  - the timer function cannot delete its own object, since del_timer()
    would lead to memory leaks

> > I don't see the semantic problem here.
> 
> Well, the timer handler could be executing on another CPU _while_ the
> timer is being deleted.  So when del_timer() returns, the handler is
> still executing!  So in this example, CPU0 could execute
> use(some_resources) AFTER CPU1 has done release(some_resources).

When I saw the claim that timer handling was flawed in 2.3, I had guessed
some similar scenerio.  But I couldn't imagine something thinking that such
flawed semantics would be reasonable.

> > This was the recommended way to use the timer routines.  If the semantics
> > have changed, there should be new names for the changed semantics.
> 
> I agree.  I assume what happened was that when the timer code went SMP,
> the synchronous semantics of del_timer() became asynchronous but the
> name was unchanged.
> 
> I believe what _should_ have been done when converting this to SMP was
> to
> 
> - manage the "handler is running" flag _outside_ the handler
> - preserve del_timer()'s synchronous semantics (via the spin)
> - introduce a new function del_timer_async() which has the
>   handler-can-still-be-running semantics.

That would be the only reasonable way to handle this.
People would quickly find out that del_timer_async() is almost useless,
because you the timer routine is presumably doing something, and that
something might happen at some unpredictable future time.

Keep in mind that with modversions, the following will not work
 #define del_timer(..) working_del_timer(..)
nor will
 static inline del_timer() {...}

> I'm not sure if 2.2 is clean in this regard.  It doesn't have
> del_timer_sync().  Spent ten minutes peering at 2.2, and I can't see
> why, if some random piece of kernel code does a del_timer() in IRQ or
> process context, concurrent execution of the post-del_timer() code and
> the handler will not occur.

The problem window for net drivers is presumably small, since the timer
routines run for minimal time every few seconds.  The only thing that might
take more than a handful of microseconds is a MII read, which can take tens
of microseconds, and the routines are run as little as every 60 seconds.
The risk is that the timer handler re-adds itself, when the driver thinks
that it has been shut down.

> > It would be useful to distinguish between "bugs" and "interfaces changes
> > that have made the following no longer correct since version X.Y.Z".

Donald Becker                           becker@xxxxxxxxx
Scyld Computing Corporation
410 Severn Ave. Suite 210
Annapolis MD 21403



<Prev in Thread] Current Thread [Next in Thread>