> > The box has 130 mbyte/sec memory write bandwidth, so saving
> > a copy should save 10% of this. (Wanders away, scratching
> > head...)
> Did you hope to get negative load? It is unlikely. 8)
> You had nic->skb mem->page mem->user mem and saved only one copy,
> moreover that copy which happens back-to-back through cache.
Well, it's an interesting problem. How do we
define "system load"? It's a combination of
CPU cycles, memory bandwidth and I/O bandwidth
Given that, how do we measure it?
My approach is to generate a mix of CPU load
and memory traffic, and see how much this
is slowed down by networking. Run the dummy
load "in the background" so it doesn't affect the
thing being tested. (Even this is questionable,
because it's not real-world. Perhaps the
dummy load should run with equal priority).
That's why I'm scratching my head. Interesting problem.
Perhaps it would make more sense to not look for a
single percantage figure, but measure percentage
of CPU cycles, percentage of memory bandwidth separately.
Saving a single copy of a 100 mbps stream should save
11 mbytes/sec of memory write bandwidth. I assume
the saving in read bandwidth is insignificant because
it's already in CPU cache, or will be soon.
I tried changing the dummy loop so it reads 1,000,000
cachelines/sec/CPU and dirties 250,000 lines/sec/CPU.
Dual CPU. It made a negligible difference to all measurements.
> BTW no need to scratch head, profiler exists to help to answer
> such questions.
I didn't try profiling NFS reads. Profiling sendfile()
and send() activity didn't show up anything very interesting,
but we're looking for a pretty small delta.
May I ask: have you tried to do any quantitative performance