I forgot a smilely on my previous post
about not beleiving you. So here's 2: :-) :-)
Marco Mellia wrote:
It a pleasure to hear from you.
Touching the packet-data givs a major impact. See eth_type_trans
in all profiles.
Notice the e1000 sets up the alignment for IP by default.
skb are de/allocated using standard kernel memory management. Still,
without touching the packet, we can receive 100% of them.
I was doing some playing in this area this week.
I changed the alloc per packet to a "realloc" per packet.
I.E. the e1000 driver owns the packets. I noticed a
very nice speedup from this. In summary a userspace
app was able to receive 2x250Kpps without this patch,
and 2x490Kpps with it. The patch is here:
Note 99% of that patch is just upgrading from
e1000 V4.4.12-k1 to V5.2.52 (which doesn't affect
Wow I just read you're excellent paper, and noticed
you used this approach also :-)
Small packet performance is dependent on low latency. Higher bus speed
gives shorter latency but also on higher speed buses there use to be
bridges that adds latency.
That's true. We suspect that the limit is due to bus latency. But still,
we are surprised, since the bus allows to receive 100%, but to transmit
up to ~50%. Moreover the raw aggerate bandwidth of the buffer is _far_
larger (133MHz*64bit ~ 8gbit/s
Well there definitely could be an asymmetry wrt bus latency.
Saying that though, in my tests with much the same hardware
as you, I could only get 800Kpps into the driver. I'll
check this again when I have time. Note also that as I understand
it the PCI control bus is running at a much lower rate,
and that is used to arbitrate the bus for each packet.
I.E. the 8Gb/s number above is not the bottleneck.
An lspci -vvv for your ethernet devices would be useful
Also to view the burst size: setpci -d 8086:1010 e6.b
(where 8086:1010 is the ethernet device PCI id).