[Top] [All Lists]

Re: [E1000-devel] Transmission limit

To: hadi@xxxxxxxxxx
Subject: Re: [E1000-devel] Transmission limit
From: P@xxxxxxxxxxxxxx
Date: Mon, 29 Nov 2004 10:19:31 +0000
Cc: mellia@xxxxxxxxxxxxxxxxxxxx, Robert Olsson <Robert.Olsson@xxxxxxxxxxx>, e1000-devel@xxxxxxxxxxxxxxxxxxxxx, Jorge Manuel Finochietto <jorge.finochietto@xxxxxxxxx>, Giulio Galante <galante@xxxxxxxxx>, netdev@xxxxxxxxxxx
In-reply-to: <1101499285.1079.45.camel@xxxxxxxxxxxxxxxx>
References: <1101467291.24742.70.camel@xxxxxxxxxxxxxxxxxxxxxx> <41A73826.3000109@xxxxxxxxxxxxxx> <16807.20052.569125.686158@xxxxxxxxxxxx> <1101484740.24742.213.camel@xxxxxxxxxxxxxxxxxxxxxx> <41A76085.7000105@xxxxxxxxxxxxxx> <1101499285.1079.45.camel@xxxxxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040124
jamal wrote:
On Fri, 2004-11-26 at 11:57, P@xxxxxxxxxxxxxx wrote:

skb are de/allocated using standard kernel memory management. Still,
without touching the packet, we can receive 100% of them.

I was doing some playing in this area this week.
I changed the alloc per packet to a "realloc" per packet.
I.E. the e1000 driver owns the packets. I noticed a
very nice speedup from this. In summary a userspace
app was able to receive 2x250Kpps without this patch,
and 2x490Kpps with it. The patch is here:

A very angry gorilla on that url ;->

feck. Add a .gz

Note 99% of that patch is just upgrading from
e1000 V4.4.12-k1 to V5.2.52 (which doesn't affect
the performance).

Wow I just read you're excellent paper, and noticed
you used this approach also :-)

Have to read the paper - When Robert was last visiting here; we did some
tests and packet recycling is not very valuable as far as SMP is
concerned (given that packets can be alloced on one CPU and freed on
another). There a clear win on single CPU machines.

Well for my app, I am just monitoring, so I use
IRQ and process affinity. You could split the
skb heads across CPUs also I guess.

Small packet performance is dependent on low latency. Higher bus speed
gives shorter latency but also on higher speed buses there use to be bridges that adds latency.

That's true. We suspect that the limit is due to bus latency. But still,
we are surprised, since the bus allows to receive 100%, but to transmit
up to ~50%. Moreover the raw aggerate bandwidth of the buffer is _far_
larger (133MHz*64bit ~ 8gbit/s

Well there definitely could be an asymmetry wrt bus latency.
Saying that though, in my tests with much the same hardware
as you, I could only get 800Kpps into the driver.

Yep, thats about the number i was seeing as well in both pieces of
hardware i used in the tests in my SUCON presentation.

check this again when I have time. Note also that as I understand
it the PCI control bus is running at a much lower rate,
and that is used to arbitrate the bus for each packet.
I.E. the 8Gb/s number above is not the bottleneck.

An lspci -vvv for your ethernet devices would be useful
Also to view the burst size: setpci -d 8086:1010 e6.b
(where 8086:1010 is the ethernet device PCI id).

Can you talk a little about this PCI control bus? I have heard you
mention it before ... I am trying to visualize where it fits in PCI

Basically the bus is arbitrated per packet. See secion 3.5 in:
This also has lots of nice PCI info:

Pádraig Brady -

<Prev in Thread] Current Thread [Next in Thread>