[Top] [All Lists]

Re: [patch 4/10] s390: network driver.

To: Thomas Spatzier <thomas.spatzier@xxxxxxxxxx>
Subject: Re: [patch 4/10] s390: network driver.
From: jamal <hadi@xxxxxxxxxx>
Date: 15 Dec 2004 08:50:27 -0500
Cc: Paul Jakma <paul@xxxxxxxx>, Hasso Tepper <hasso@xxxxxxxxx>, Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>, jgarzik@xxxxxxxxx, netdev@xxxxxxxxxxx, "David S. Miller" <davem@xxxxxxxxxxxxx>
In-reply-to: <OF818F5ECF.239B7010-ONC1256F6A.0029BA10-C1256F6A.002A27C3@xxxxxxxxxx>
Organization: jamalopolous
References: <OF818F5ECF.239B7010-ONC1256F6A.0029BA10-C1256F6A.002A27C3@xxxxxxxxxx>
Reply-to: hadi@xxxxxxxxxx
Sender: netdev-bounce@xxxxxxxxxxx
On Tue, 2004-12-14 at 02:40, Thomas Spatzier wrote:
> Paul Jakma <paul@xxxxxxxx> wrote on 10.12.2004 16:37:15:
> > Thomas' original patch was to address this problem. I wonder could he
> > recap the kernel side of this problem?
> Here is why we submitted the original patch: We got reports from
> several customers that their dynamic routing daemons got hung when
> one network interface lost its physical connection. Some debugging
> showed that the write queues of sockets went full and got blocked.
> This was because we issued a netif_stop_queue when we detect a
> cable pull or something.

I did some more thinking in the background and i wish to change my
opinion. What you see is Very Odd. I think there may be a bug upstream
at the socket layer or even before that - but doesnt sound like a device
level bug. Wasnt someone supposed to send a small proggie to Herbert?
When you netif_stop_queue you should never receive packets anymore
at the device level. If you receive any its a bug and you should drop
them and bitch violently. In other words i think what you have at the
moment is bandaid not the solution.

> As a solution, we removed the netif_stop_queue calls and just dropped
> the packets + we increment the respective error counts in the
> net_device_stats and call netif_carrier_off.
> This solved the customer problems and seems to be right thing for
> zebra etc.

We need to Fix this issue. Either your driver is doing something wrong
or something is broken upstackstream.
Can you describe how your driver uses the netif_start/stop/wake
Whoever promised to send that program to Herbert - please do.


<Prev in Thread] Current Thread [Next in Thread>