Andrew Morton wrote:
> Well, I didn't say "let's put in lots of bugs" :)
>
> My point is very simple:
>
> - Drivers and/or NICs are hanging
> - The hangs are fixed by down+up or rmmod+insmod
>
> Hence, the hangs _could_ be unhung by appropriate action
> in the tx timeout!
>
> This would be a great step forward. A sub-second hiccup and
> a few dropped packets versus a complete system outage.
>
> It's not pretty, it shouldn't be necessary, but geeze, it's
> better than what we have now. Without access to the ASIC
> designers and everything else, we may never resolve these
> problems. And can we be sure that the closed-source
> vendor-developed drivers don't _currently_ reset the
> crap out of the NIC when it goes unresponsive?
I'll be happy to test anything you can think of to fix my problem
with ZNYX four-ports / Tulip hangs. Right now I'm thinking about
basically tailing the /var/messages and using ifconfig when I see
the errors...it should be obvious that there are much better ways
to do this!
I believe it's Tulip driver related too, because I can also hang
another tulip based card I have (the chip is different, if tulip compatible)...
> These failures appear to correlate with high traffic which,
> I suggest, points the finger at driver/hardware bugs rather
> than media selection probs.
I think that's true..I'm running the cards back to back across
a cross-over cable. I can't imagine why they would be having
probe problems only after I crank up the volume of traffic.
>
> > I don't think that flushing the queue (as somebody else from this thread
> > suggested) is a good ideea as you loose a lot of packets (usually 16).
I can easily lose millions with the current state of affairs...I'd love to
only lose 16, especially if they are recored (tx_fifo or something).
> Have you played with the tx timeout interval? Even with
> 14 packet tx interrupt mitigation, it's very, very hard
> to force a tx timeout on a 10baseT LAN with the timeout
> set to just 20 milliseconds. We're using 400!
I always thought they were rock solid, untill I actually started
beating on them unmercifully! The press has generally been kind to
Linux as a server machine...it'd suck to blow that now!
Ben
--
Ben Greear (greearb@xxxxxxxxxxxxxxx) http://www.candelatech.com
Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL)
http://scry.wanfear.com http://scry.wanfear.com/~greear
|