netdev
[Top] [All Lists]

Re: tx_timeout and timer serialisation

To: kuznet@xxxxxxxxxxxxx
Subject: Re: tx_timeout and timer serialisation
From: Andrew Morton <andrewm@xxxxxxxxxx>
Date: Wed, 03 May 2000 00:27:07 +1000
Cc: netdev@xxxxxxxxxxx
References: <390E3143.5CF7D4AD@xxxxxxxxxx> from "Andrew Morton" at May 2, 0 06:13:23 am <200005021334.RAA14118@xxxxxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
kuznet@xxxxxxxxxxxxx wrote:
> 
> ...
>
> > speedo_timer does mdio_read()s.  speedo_tx_timeout() does mdio_read()s
> > and mdio_write()'s.  mdio functions are stateful.  Race.
> 
> Are they touched in normal rx/tx path and/or irq? If they are not,
> it is easy to repair with _separate_ mdio bh protected spinlock.

I believe you are correct.  This is a good approach, because the mdio
functions, although rarely called, are slow.

3Com's GPL'ed driver for the 3c90x series is interesting.  They use four
spinlocks.

- One for setting multicast mode
- One for the start_xmit path
- One for the interrupt
- One for misc "recv mode/close/timer".

I haven't studied it closely; it's a _very_ differently structured
driver from the norm.  I would guess that it has been ported from
another OS.  They ifdef all the spinlocks out of existence if !__SMP__. 
Interesting...


> The problem can be with control registers, which are reprogrammed
> at IRQ level.
> 
> > Another issue: del_timer_sync().  It deletes a timer, but if that timer
> > happens to be running, del_timer_sync() blocks until the timer's handler
> > returns.
> 
> Do some drivers really use this?

A few of them use it.  With the exception of eepro100, they use it for
killing the media timer in the close() routine, which sems sensible - we
don't want to release resources underneath the timer handler's feet.
a2065.c forgets to use timer_exit().

But eepro100 uses del_timer_sync() in tx_timeout() (not under any
spinlock) and does not use timer_exit().

I don't see why eepro100's tx_timeout() doesn't simply lock up in SMP,
actually.  timer->running will be set in run_timer_list() and appears to
never be cleared, and del_timer_sync() will wedge in
timer_synchronize().

Wanna take a look?  I must have missed something.

> Beware, this function is smart,
> but it is not so easy to use. First, timer handler must do timer_exit()
> on exit. Second, del_timer_sync() cannot be called under spinlock, which
> could be grabbed inside timer handler, it will be deadlock.

It would be rather nice to document these requirements in
kernel/timer.c!

Many drivers call del_timer() in their close() methods. It is
conceivable (but rather unlikely) that the timer routine could get
confused trying to manage the media interface of a device which has been
fully or partially closed.

> The best thing is to call it only in context _free_ of spinlocks:
> i.e. open(), close(), ioctl(). See?

Is this what you mean?


timer_func()
{
        spin_lock(some_lock);      <<<< Deadlock
        ...                               |
        spin_unlock(some_lock);           |
}                                         |
                                          |
some_func()                               |
{                                         |
        spin_lock(some_lock);             |
         <interrupt, timer run>    <<<<<<<+ 
        del_timer_sync(some_timer);
}


> That's why open()/close()/ioctl()
> are called by top level without any locks held, though there are
> not so much of drivers really sleeping there. Lots of operations
> like to be done in normal process context. [ Including cli(), which
> does not synchronize to timers, when called from under spinlock. ]

I don't understand your point about cli().  Do you meant that timer
handlers can run on other CPUs (presumably via ret_from_syscall()) while
the global IRQ lock is held???



-- 
-akpm-

<Prev in Thread] Current Thread [Next in Thread>