On Fri, Jun 03, 2005 at 01:35:22PM -0700, Michael Chan wrote:
> > > Yes, in tg3, rx buffers are replenished and put back into the ring
> > > as completed packets are taken off the ring. But we don't tell the
> > > chip about these new buffers until we get to the end of the loop,
> > > potentially after a full quota of packets.
> > Which makes a lot more sense, since you'd rather do one MMIO write
> > at the end of the loop than one per iteration, especially if your
> > MMIO read (flush) latency is high. (Any subsequent MMIO read will
> > have to flush out all pending writes, which'll be slow if there's
> > a lot of writes still in the queue.)
> I agree on the merit of issuing only one IO at the end. What I'm saying
> is that doing so will make it similar to e1000 where all the buffers are
> replenished at the end. Isn't that so or am I missing something?
I think you're right: for e1000 as well as tg3, the NIC cannot use
the new RX buffers until the CPU breaks out of the poll loop.
I don't understand why reducing the weight apparently makes the e1000
go faster. Perhaps as Robert said, the RX ring is not big enough and
that's why feeding RX buffers back to the chip more agressively might
help prevent overruns?
I would say that running with a N+64-entry RX ring and a weight of 64
should not show any worse behavior than running with a N+16-entry RX
ring with a weight of 16. If anything, weight=64 should show _better_
performance than weight=16. Something else must be going on.