netdev
[Top] [All Lists]

Re: zerocopy results on GigE

To: jes@xxxxxxxxxxxxx (Jes Sorensen)
Subject: Re: zerocopy results on GigE
From: kuznet@xxxxxxxxxxxxx
Date: Thu, 8 Feb 2001 21:59:09 +0300 (MSK)
Cc: davem@xxxxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <d3ofwep0yz.fsf@xxxxxxxxxxxxxxxxx> from "Jes Sorensen" at Feb 7, 1 11:44:36 pm
Sender: owner-netdev@xxxxxxxxxxx
Hello!

> I don't remember all the details, I just remembe Ted Schroeder (one of
> the Alteon founders) recommending me to linearize small transfers as
> loading buffer descriptors could cost up to 5us.

It is even > 5usec.

~5 usec is plain featureless mode.
Each feature adds ~1sec: tx host ring? +1usec. tx checksumming? +1usec

Well, driver should not be bothered about this: it is problem
of protocol not to generate silly packets shredded to small pieces.
F.e. current TCP _does_ generate telnet or lat_tcp packets
with 1 byte fragment. So what? Latency does not change, these
5usecs are something ridiculous comparing to latency caused
by broken mitigation timer in acenic. Thoughput is bogus in any case.


> ANK> dma.  But, if my arithmetics is correct, this really puts
> ANK> theoretical limit on transmission of 1500 byte frames: ~90MB/sec.
...
> The numbers for Jumbo MTU's are not all that exciting, what really
> matters if how we perform on 1.5K packets. 95% of the switches on the
> market don't do 9K packets hence very very few people use it ;-(

I said exactly about 1500 mtu. To sense these 5usecs on this, we
should reach such rates, where it becomes important for beginning.
We did not. We are still bounded by software latencies.


> No I didn't notice the 1us extra latency, I made the change to reduce
> the slow writes to PCI shared mem which are becoming even more
> significant now with the increase in host memory speed and no increase
> in PCI speed. If it becomes a real issue we can stick the non host
> ring support back in.

Well, I reminded this because it were you who bothered about latency. 8)8)

Actually, this feature was added to tux some time ago exactly by the same
reason. 8) Plus, it allows to load whole set of fragment descriptors
at one DMA transaction. Not big win, if to believe to Ingo's results,
but something yet.

tx host ring reduces maximal pps, reached with acenic by 20%. That's all.
So what? No problems, we are not going to compete with XXMegapps switches.

Alexey

<Prev in Thread] Current Thread [Next in Thread>