"David S. Miller" wrote:
> Finally, please do some tests on loopback. It is usually a great
> way to get "pure software overhead" measurements of our TCP stack.
Here we are. TCP and NFS/UDP over lo.
Machine is a dual-PII. I didn't bother running CPU utilisation
testing while benchmarking loopback, although this may be of
some interest for SMP. I just looked at the throughput.
Machine is a dual 500MHz PII (again). Memory read bandwidth
is 320 meg/sec. Write b/w is 130 meg/sec. The working set
is 60 ~300k files, everything cached. We run the following
1: sendfile() to localhost, sender and receiver pinned to
2: sendfile() to localhost, sender and receiver pinned to
the same CPU
3: sendfile() to localhost, no explicit pinning.
4, 5, 6: same as above, except we use send() in 8kbyte
Repeat with and without zerocopy patch 2.4.1-2.
The receiver reads 64k hunks and throws them away. sendfile()
sends the entire file.
Also, do an NFS mount of localhost, rsize=wsize=8192, see how
long it takes to `cp' a 100 meg file from the "server" to
/dev/null. The file is cached on the "server". Do this for
the three pinning cases as well - all the NFS kernel processes
were pinned as a group and `cp' was the other group.
sendfile() send(8k) NFS
Mbyte/s Mbyte/s Mbyte/s
No explicit bonding
2.4.1: 66600 70000 25600
2.4.1-zc: 208000 69000 25000
Bond client and server to separate CPUs
2.4.1: 66700 68000 27800
2.4.1-zc: 213047 66000 25700
Bond client and server to same CPU:
2.4.1: 56000 57000 23300
2.4.1-zc: 176000 55000 22100
Much the same story. Big increase in sendfile() efficiency,
small drop in send() and NFS unchanged.
The relative increase in sendfile() efficiency is much higher
than with a real NIC, presumably because we've factored out
the constant (and large) cost of the device driver.
All the bits and pieces to reproduce this are at