netdev
[Top] [All Lists]

Re: [eepro100] Re: True on TRANSMIT ERROR TIMEOUT (Tulip too)

To: netdev <netdev@xxxxxxxxxxx>
Subject: Re: [eepro100] Re: True on TRANSMIT ERROR TIMEOUT (Tulip too)
From: Ben Greear <greearb@xxxxxxxxxxxxxxx>
Date: Wed, 14 Jun 2000 21:20:19 -0700
Organization: Candela Technologies
References: <39461BB3.299DD807@xxxxxxxxxx> <Pine.LNX.4.10.10006141420380.370-100000@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <39483426.B07E4BA@xxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
Andrew Morton wrote:

> Well, I didn't say "let's put in lots of bugs" :)
> 
> My point is very simple:
> 
> - Drivers and/or NICs are hanging
> - The hangs are fixed by down+up or rmmod+insmod
> 
> Hence, the hangs _could_ be unhung by appropriate action
> in the tx timeout!
> 
> This would be a great step forward.  A sub-second hiccup and
> a few dropped packets versus a complete system outage.
> 
> It's not pretty, it shouldn't be necessary, but geeze, it's
> better than what we have now.  Without access to the ASIC
> designers and everything else, we may never resolve these
> problems.  And can we be sure that the closed-source
> vendor-developed drivers don't _currently_ reset the
> crap out of the NIC when it goes unresponsive?

I'll be happy to test anything you can think of to fix my problem
with ZNYX four-ports / Tulip hangs.  Right now I'm thinking about
basically tailing the /var/messages and using ifconfig when I see
the errors...it should be obvious that there are much better ways
to do this!

I believe it's Tulip driver related too, because I can also hang
another tulip based card I have (the chip is different, if tulip compatible)...

> These failures appear to correlate with high traffic which,
> I suggest, points the finger at driver/hardware bugs rather
> than media selection probs.

I think that's true..I'm running the cards back to back across
a cross-over cable.  I can't imagine why they would be having
probe problems only after I crank up the volume of traffic.

> 
> > I don't think that flushing the queue (as somebody else from this thread
> > suggested) is a good ideea as you loose a lot of packets (usually 16).

I can easily lose millions with the current state of affairs...I'd love to
only lose 16, especially if they are recored (tx_fifo or something).

> Have you played with the tx timeout interval?  Even with
> 14 packet tx interrupt mitigation, it's very, very hard
> to force a tx timeout on a 10baseT LAN with the timeout
> set to just 20 milliseconds.  We're using 400!

I always thought they were rock solid, untill I actually started
beating on them unmercifully!  The press has generally been kind to
Linux as a server machine...it'd suck to blow that now!

Ben

-- 
Ben Greear (greearb@xxxxxxxxxxxxxxx)  http://www.candelatech.com
Author of ScryMUD:  scry.wanfear.com 4444        (Released under GPL)
http://scry.wanfear.com               http://scry.wanfear.com/~greear

<Prev in Thread] Current Thread [Next in Thread>