netdev
[Top] [All Lists]

Re: [PATCH] fix BUG in tg3_tx

To: "David S. Miller" <davem@xxxxxxxxxx>
Subject: Re: [PATCH] fix BUG in tg3_tx
From: Greg Banks <gnb@xxxxxxx>
Date: Wed, 26 May 2004 10:12:17 +1000
Cc: netdev@xxxxxxxxxxx, mchan@xxxxxxxxxxxx
In-reply-to: <20040525105101.2da85469.davem@xxxxxxxxxx>
References: <20040524072657.GC27177@xxxxxxx> <20040524004045.58b3eb44.davem@xxxxxxxxxx> <20040524080431.GD27177@xxxxxxx> <20040524100634.1349295d.davem@xxxxxxxxxx> <20040525010434.GA31134@xxxxxxx> <20040525105101.2da85469.davem@xxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mutt/1.3.27i
On Tue, May 25, 2004 at 10:51:01AM -0700, David S. Miller wrote:
> On Tue, 25 May 2004 11:04:34 +1000
> Greg Banks <gnb@xxxxxxx> wrote:
> 
> > I agree that this code appears to implictly rely on always getting
> > complete send ring updates.
> 
> Greg, did you see Micahel Chan's response?  A Broadcom engineer
> is telling us "the hardware does not ACK partial TX packets."

Yes I did.  I've been working towards gathering data for a reply.

> I can't think of a more reliable source for this kind of information,
> can you?

I can think of one: actual observation of the card in action in the
field.  Experiment trumps theory.

To this end, I instrumented the driver + my patch to BUG() out if
the tx_ring_info.index is not a predicted value, i.e. if the tg3_tx()
ever starts partway through a packet.  It's been running overnight
under >200 MB/s of NFS read load, nothing yet.

> I don't argue that you aren't seeing something strange, but perhaps
> that is due to corruption occuring elsewhere, or perhaps something
> peculiar about your system hardware (perhaps the PCI controller
> mis-orders PCI transactions or something silly like that)?

There are many things peculiar about our hardware.  Otherwise we'd
be "the world stops at 4 processors" Dell.

> Have you reproduced this on some system other than these huge SGI
> ones?

I haven't tried; my job is first and foremost to make SGI hardware
work.  However I did point you to a report on lkml where someone on
non-SGI hardware has seen what appears to be the same problem.  I'm not
yet willing to consign this to the "wacky SGI PCI hardware" bucket.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

<Prev in Thread] Current Thread [Next in Thread>