[Top] [All Lists]

Re: tx_timeout and timer serialisation

To: kuznet@xxxxxxxxxxxxx
Subject: Re: tx_timeout and timer serialisation
From: Donald Becker <becker@xxxxxxxxx>
Date: Fri, 5 May 2000 01:15:25 -0400 (EDT)
Cc: Andrew Morton <andrewm@xxxxxxxxxx>, netdev@xxxxxxxxxxx
In-reply-to: <200005021334.RAA14118@xxxxxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
On Tue, 2 May 2000 kuznet@xxxxxxxxxxxxx wrote:

> Subject: Re: tx_timeout and timer serialisation

Here is the write-up I have in pci-skeleton.c v2.** from

IIId. SMP semantics

The following are serialized with respect to each other via the "xmit_lock".
  dev->hard_start_xmit()        Transmit a packet
  dev->tx_timeout()                     Transmit watchdog for stuck Tx
  dev->set_multicast_list()     Set the recieve filter.
Note: The Tx timeout watchdog code is implemented by the timer routine in
kernels up to 2.2.*.  In 2.4.* and later the timeout code is part of the
driver interface.

The following fall under the global kernel lock.  The module will not be
unloaded during the call, unless a call with a potential reschedule e.g.
kmalloc() is called.  No other synchronization assertion is made.
Caution: The lock for dev->open() is commonly broken with request_irq() or
kmalloc().  It is best to avoid any lock-breaking call in do_ioctl() and
get_stats(), or additional module locking code must be implemented.

The following is self-serialized (no simultaneous entry)
  An handler registered with request_irq().

> > 2: timer functions, ioctl() and get_stats() have no guarantees.
> Media timer is separate item, it is inivisible from outside at all.
> ioctl() is allowed to sleep, hence top level cannot do anything
> to serialize it wrt class 1. But it serializes it wrt itself
> and open(), close(), which is not so easy by the way. 8)

It's important for code simplicity to keep this assurance.  An ioctl()
usually examines or sets something to do with chip operation, just the sort
of action that open() is close() are trying to do.

> get_stats() is very special thing. The problem with this is
> that it is broken by design, returning pointer to some static structure
> to code which knows nothing about its protection.

To reword slightly: it returns a pointer to an internally maintained
structure.  That structure consists of unsigned, aligned, word-long values
which typically remained fixed in value, but may be monotonically
increasing.  When reporting dynamically increasing values, any consistent
value is permitted to be reported e.g. the user may see either N or N+1.

(The phrase "returning a pointer to some static structure" usually implies
a 'static struct foo' value.)

> It is pretty
> clear that if get_stats() does not modify counters, it needs
> no protection. But if it _does_ modify (== touch hardware registers),
> as most of ethernet drivers do, top level cannot do anything with this
> either in 2.2 or in 2.3. Protection wrt IRQs is out of area of its expertise.

Most hardware has good semantics: reading a value zeros the register.  Thus
the only bad thing that can happen is a increment-race.

> I would advise to avoid touching statistics from get_stats, when it
> is possible and to modify counters only in context, where they
> are naturally serialized. F.e. tulip get_stats() changes only
> rx_missed_errors. Does it really deserve mud with irq safe locks around?

Some of the drivers note that they have an potential SMP race, but that it's
mostly harmless.  A race is extremely unlikely, and almost never impacts the
count -- who cares in what order you add 0 to the error count. The
statistics are generally non-critical values used primarily for getting a
idea of the system behavior.  Few people care if they have 9876 or 9877 CRC
error, the problem is that it's not zero errors.

> > 3: The ISR can be called during
       Generally not a conflict, except on chips with register windows e.g.
        8390 and 3Coms
       This frequently doesn't need a lock, if the cur_tx and dirty_tx
       values are incremented atomically at the proper point.
       Generally not a problem
       The media timers used to not do very much.  They just checked link
       beat.  Now they check duplex, flow control, timeouts, etc.  and
       should be checked carefully at each change.
    tx_timeout(), etc.
       In some places the assumption is made that this is only called if the
       chip has stopped working some time ago.  So no IRQ lock is put in

> > speedo_timer does mdio_read()s.  speedo_tx_timeout() does mdio_read()s
> > and mdio_write()'s.  mdio functions are stateful.  Race.

The original eepro100 code didn't do mdio_read() in the timer routine.  The
i82557 handled duplex switching itself.  With the i82559 it must now check
for negotiated flow control to avoid a bug with the chip sending flow
control packets on half duplex links.

This change does result in a race with ioctl() calls.

> Are they touched in normal rx/tx path and/or irq? If they are not,
> it is easy to repair with _separate_ mdio bh protected spinlock.

Only the MDIO access register needs to be protected.

> > Another issue: del_timer_sync().  It deletes a timer, but if that timer
> > happens to be running, del_timer_sync() blocks until the timer's handler
> > returns.
> Do some drivers really use this?

Not that I know of.  They use the regular del_timer().

Donald Becker                           becker@xxxxxxxxx
Scyld Computing Corporation
410 Severn Ave. Suite 210
Annapolis MD 21403

<Prev in Thread] Current Thread [Next in Thread>