In message <email@example.com>, > : "Martin J. Bligh" writes:
> > I would think shoving the data down the NIC
> > and avoid the fragmentation shouldnt give you that much significant
> > CPU savings.
> > It's the DMA bandwidth saved, most of the specweb runs on x86 hardware
> > is limited by the DMA throughput of the PCI host controller. In
> > particular some controllers are limited to smaller DMA bursts to
> > work around hardware bugs.
> I think we're CPU limited (there's no idle time on this machine),
> which is odd for an 8 CPU 900MHz P3 Xeon, but still, this is Apache,
> not Tux. You mentioned CPU load as another advantage of TSO ...
> anything we've done to reduce CPU load enables us to run more and
> more connections (I think we started at about 260 or something, so
> 2900 ain't too bad ;-)).
Troy, is there any chance you could post an oprofile from any sort
of reasonably conformant run? I think that might help enlighten
people a bit as to what we are fighting with. The last numbers I
remember seemed to indicate that we were spending about 1.25 CPUs
in network/e1000 code with 100% CPU utilization and crappy SpecWeb
One of our goals is to actually take the next generation of the most
common "large system" web server and get it to scale along the lines
of Tux or some of the other servers which are more common on the
small machines. For some reasons, big corporate customers want lots
of features that are in a web server like apache and would also like
the performance on their 8-CPU or 16-CPU machine to not suck at the
same time. High ideals, I know, wanting all features *and* performance
from the same tool... Next thing you know they'll want reliability
or some such thing.