netdev
[Top] [All Lists]

Re: zerocopy results on GigE

To: davem@xxxxxxxxxx (David S. Miller)
Subject: Re: zerocopy results on GigE
From: kuznet@xxxxxxxxxxxxx
Date: Wed, 7 Feb 2001 22:39:58 +0300 (MSK)
Cc: netdev@xxxxxxxxxxx
In-reply-to: <14976.28256.593782.781889@pizda.ninka.net> from "David S. Miller" at Feb 7, 1 00:45:00 am
Sender: owner-netdev@xxxxxxxxxxx
Hello!

Dave writes:
> Strange, cpu usage is close to nothing for sendfile cases yet full
> bandwidth is not obtained.

He-he-he... It was the first puzzle, which I observed after zerocopy
started to work. Throughput on Intel increses insignificantly,
increases on alpha, but a lot of room for further increase remained
(about 20% of cpu). Actually, even without zerocopy cpu at sender
is underloaded a bit. Until now I have no idea, why this happens
and how to fight this. Actually, this smells like a bug in TCP.
But tcpdump does not discover anything pathological.



Andrew writes:
> Is it possible that the receiver is going into discard,
> and TCP is backing off?

In this case we would see not numbers ~90MB/sec, but something
more spectacular. Even Jamal's 68MB/sec is not enough bad number
for losses. 8)

No, this is impossible. TCP has no such pathologies.


Dave writes:
> Or even worse, the Gige cards are emitting flow control frames.

This does not happen. This can be diagnosed with acestat though.


> What would trigger that behaviour?  Presumably, the lack
> of any Rx DMA descriptors.

Shortage of MAC descriptors inside NIC, which can happen
not only when there are not enough of RX descriptors.

In practice flow control is triggered only if dma is slower than
nic (I have never seen this on intel, but had to fight with this on alpha),
or if one side emits stream of small packets with rate >100Kpps.

This does not happen with TCP, especially, with jumbo mtu.
Disabling flow control does not change TCP behaviour, by the way.


Jes wrote:
> One thing that might be worth investigating is that the AceNIC has
> a high latency for reading buffer descriptors. One of the plans I have
> is to linearize small skb's before handing them to the NIC.

Small skbs in these tests are ACKs, they are linear.

Also, even with host ring, all the fragment descriptors are read
in one DMA transaction. Or do you mean reading data chunks,
not descriptors?

In any case, maximal latency is 5-7usec, which is not a big number
for TCP with jumbo mtu, where latency is dominated by bulk dma.
But, if my arithmetics is correct, this really puts theoretical
limit on transmission of 1500 byte frames: ~90MB/sec.
(BTW, Jes, you enabled tx host ring in the latest driver.
 Did you notice that it increases latency by ~1 usec?)

Alexey

<Prev in Thread] Current Thread [Next in Thread>