Let's compare 3coms and eepro100s. And the effects of MMIO
versus PIO, and other stuff.
3c905C 3c905C 3c905C
3c905C 3c905C eepro100 eepro100 eepro100
CPU affine ints rx
pps tx pps MMIO I/O ops ints
2.4.1-pre10+zerocopy, sendfile(): 9.6% 4395 4106
8146 15.3%
2.4.1-pre10+zerocopy, send(): 24.1% 4449 4163
8196 20.2%
2.4.1-pre10+zerocopy, receiving: 18.7% 12332 8156
4189 17.6%
2.4.1-pre10+zerocopy, sendfile(), no xsum/SG: 16.2%
(15.3%)
2.4.1-pre10+zerocopy, send(), no xsum/SG: 21.5%
(20.2%)
2.4.1-pre10-vanilla, using sendfile(): 17.1% 17.9% 5729 5296
8214 16.1% 16.8%
2.4.1-pre10-vanilla, using send(): 21.1% 4629 4152
8191 20.3% 20.6% 6310
2.4.1-pre10-vanilla, receiving: 18.3% 12333 8156
4188 17.1% 18.2% 12335
Lots of interesting things here.
- eepro100 generates more interrupts doing TCP Tx, but not
TCP Rx. I assume it doesn't do Tx mitigation?
- Changing eepro100 to use IO operations instead of MMIO slows
down this dual 500MHz machine by less than one percent at
100 mbps. At 12,000 interrupts per second. Why all the fuss
about MMIO?
- Bonding the 905's interrupt to CPU0 slows things down slightly.
(This is contrary to other measurements I've previously taken.
Don't pay any attention to this).
- Without the zc patch, there is a significant increase (25%) in
the number of Rx packets (acks, persumably) when data is sent
using sendfile() as opposed to when the same data is sent
with send().
Workload: 62 files, average size 350k.
sendfile() tries to send the entire file in one hit
send() breaks it up into 64kbyte chunks.
When the zerocopy patch is applied, the Rx packet rate during
sendfile() is the same as the rate during send().
Why is this?
If this *alone* were fixed in 2.4.1, I'd expect the performance
gain to be ~10% of system capacity on this NIC.
- I see a consistent 12-13% slowdown on send() with the zerocopy
patch. Can this be fixed?
|