On Fri, 1 Jun 2001, Bogdan Costescu wrote:
> On Fri, 1 Jun 2001, jamal wrote:
> > One idea i have been toying with is to maintain hysteris or threshold of
> > some form in dev_watchdog;
> AFAIK, dev_watchdog is right now used only for Tx (if I'm wrong, please
> correct me!). So how do you sense link loss if you expect only high Rx
> traffic ?
Good question. Makes me think. Thoughts further below.
> > example: if watchdog timer expires threshold times, you declare the link
> > dead and send netif_carrier_off netlink message.
> > On recovery, you send netif_carrier_on
> I assume that you mean "on recovery" as in "first succesful hard_start_xmit".
> > Assumption:
> > If the tx path is blocked, more than likely the link is down.
> Yes, but is this a good approximation ? I'm not saying that it's not, I'm
> merely asking for counter-arguments.
It is an indirect approximation. Note that if the system data is very
asymetrical as in the case you pointed out, notification will take a long
long time. You need a plan B. Still, the tx watchdogs are a good source of
fault detection in the case of non-availabilty of MII detection and even
with the presence of MII.
I hate making this more complex than it should be:
Since we already have a messaging system within the kernel and
user<->kernel space aka "netlink" -- one could easily add a protocol in
user space which "dynamically heartbeats" the devices. Control should come
from user space; it would be a great idea to avoid ioctls.
"Dynamic" in the above sense means trying to totaly avoid making it a
synchronous poll. The poll rate is a function of how many packets go out
that device per average measurement time. Basically, the period that the
user space app dumps "hello" netlink packets to the kernel is a variable.