[Top] [All Lists]

Re: TX performance of Intel 82546

To: Robert Olsson <Robert.Olsson@xxxxxxxxxxx>
Subject: Re: TX performance of Intel 82546
From: P@xxxxxxxxxxxxxx
Date: Wed, 15 Sep 2004 14:59:30 +0100
Cc: Harald Welte <laforge@xxxxxxxxxxxxx>, netdev@xxxxxxxxxxx
In-reply-to: <16712.14153.683690.710955@xxxxxxxxxxxx>
References: <20040915081439.GA27038@xxxxxxxxxxxxxxxxxxxxxxx> <414808F3.70104@xxxxxxxxxxxxxx> <16712.14153.683690.710955@xxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040124
Robert Olsson wrote:
P@xxxxxxxxxxxxxx writes:
 > Harald Welte wrote:

 > > I'm currently trying to help Robert Olsson improving the performance of
 > > the Linux in-kernel packet generator (pktgen.c).  At the moment, we seem
 > > to be unable to get more than 760kpps from a single port of a 82546,
 > > (or any other PCI-X MAC supported by e1000) - that's a bit more than 51%
 > > wirespeed at 64byte packet sizes.

 Yes it seems intel adapters work better in BSD as they claim to route
 1 Mpps and we cannot even send more ~750 kpps even with feeding the
 adapter only. :-)

 > In my experience anything around 750Kpps is a PCI limitation,
 > specifically PCI bus arbitration latency. Note the clock speed of
 > the control signal used for bus arbitration has not increased
 > in proportion to the PCI data clock speed.

 Yes data from an Opteron @ 1.6 GHz w. e1000 82546EB 64 byte pkts.

 133 MHz 830 pps
 100 MHz 721 pps
  66 MHz 561 pps

Interesting info thanks!
It would be very interesting to see the performance of PCI express
which should not have the bus arbitration issues.

 So higher bus bandwidth could increase the small packet rate.

So is there a difference in PCI-tuning BSD versus Linux? And even more general can we measure the maximum numbers
 of transactions on a PCI-bus?

 Chip should be able to transfer 64 packets in single burst I don't now
 how set/verify this.

Well from the intel docs they say "The devices include a PCI interface
that maximizes the use of bursts for efficient bus usage.
The controllers are able to cache up to 64 packet descriptors in
a single burst for efficient PCI bandwidth usage."

So I'm guessing that increasing the PCI-X burst size setting
(MMRBC) will automatically get more packets sent per transfer?
I said previously in this thread to google for setpci and MMRBC,
but what I know about it is...

To return the current setting(s):

setpci -d 8086:1010 e6.b

The MMRBC is the upper two bits of the lower nibble, where:

0 = 512 byte bursts
1 = 1024 byte bursts
2 = 2048 byte bursts
3 = 4096 byte bursts

For me to set 4KiB bursts I do:

setpci -d 8086:1010 e6.b=0e

The following measured a 30% throughput improvement (on 10G)
from setting the burst size to 4KiB:


<Prev in Thread] Current Thread [Next in Thread>