Robert Olsson wrote:
> Harald Welte wrote:
> > I'm currently trying to help Robert Olsson improving the performance of
> > the Linux in-kernel packet generator (pktgen.c). At the moment, we seem
> > to be unable to get more than 760kpps from a single port of a 82546,
> > (or any other PCI-X MAC supported by e1000) - that's a bit more than 51%
> > wirespeed at 64byte packet sizes.
Yes it seems intel adapters work better in BSD as they claim to route
1 Mpps and we cannot even send more ~750 kpps even with feeding the
adapter only. :-)
> In my experience anything around 750Kpps is a PCI limitation,
> specifically PCI bus arbitration latency. Note the clock speed of
> the control signal used for bus arbitration has not increased
> in proportion to the PCI data clock speed.
Yes data from an Opteron @ 1.6 GHz w. e1000 82546EB 64 byte pkts.
133 MHz 830 pps
100 MHz 721 pps
66 MHz 561 pps
Interesting info thanks!
It would be very interesting to see the performance of PCI express
which should not have the bus arbitration issues.
So higher bus bandwidth could increase the small packet rate.
So is there a difference in PCI-tuning BSD versus Linux?
And even more general can we measure the maximum numbers
of transactions on a PCI-bus?
Chip should be able to transfer 64 packets in single burst I don't now
how set/verify this.
Well from the intel docs they say "The devices include a PCI interface
that maximizes the use of bursts for efficient bus usage.
The controllers are able to cache up to 64 packet descriptors in
a single burst for efficient PCI bandwidth usage."
So I'm guessing that increasing the PCI-X burst size setting
(MMRBC) will automatically get more packets sent per transfer?
I said previously in this thread to google for setpci and MMRBC,
but what I know about it is...
To return the current setting(s):
setpci -d 8086:1010 e6.b
The MMRBC is the upper two bits of the lower nibble, where:
0 = 512 byte bursts
1 = 1024 byte bursts
2 = 2048 byte bursts
3 = 4096 byte bursts
For me to set 4KiB bursts I do:
setpci -d 8086:1010 e6.b=0e
The following measured a 30% throughput improvement (on 10G)
from setting the burst size to 4KiB: