Hi, Ken -
> >Sorry, if this was premature. I had read Ken's responses to Frank's
> >performance data as positive and I also read that Ken had pulled these
> >changes and reviewed them favourably.
>
> Bit of a miscommunication here I think [...]
Yeah, thanks all for handling it gracefully.
> ... Frank did provide performance data for his pmwebd test cases
> (which looked encouraging early on in the work), we don't have any
> performance analysis for the pmcd-side cases at this stage AFAIK.
Yeah. So from the pmcd-side live-use kind of case, the pdubuf
machinery is not used heavily in the sense of many operations per unit
time, so small differences in bulk throughput performance as indicated
by the pmwebd / pmlogextract tests wouldn't show up. Indeed, running
perf-stat on pmlogger & pmie under high-rate operations doesn't
indicate big differences. CPU consumption is a bit less with the new
code, despite more branches:
--- old code:
% rm /tmp/foo.*; perf stat /usr/bin/pmlogger -t0.01 -s5seconds -h localhost -c
/etc/pcp/pmlogger/config.default /tmp/foo
Performance counter stats for '/usr/bin/pmlogger -t0.01 -s5seconds -h
localhost -c /etc/pcp/pmlogger/config.default /tmp/foo':
46.183299 task-clock (msec) # 0.009 CPUs utilized
1,560 context-switches # 0.034 M/sec
21 cpu-migrations # 0.455 K/sec
328 page-faults # 0.007 M/sec
164,300,252 cycles # 3.558 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
156,719,367 instructions # 0.95 insns per cycle
36,039,298 branches # 780.353 M/sec
983,947 branch-misses # 2.73% of all branches
5.005151483 seconds time elapsed
--- new code:
% rm /tmp/foo.*; LD_LIBRARY_PATH=`pwd` perf stat /usr/bin/pmlogger -t0.01
-s5seconds -h localhost -c /etc/pcp/pmlogger/config.default /tmp/foo
Performance counter stats for '/usr/bin/pmlogger -t0.01 -s5seconds -h
localhost -c /etc/pcp/pmlogger/config.default /tmp/foo':
45.142242 task-clock (msec) # 0.009 CPUs utilized
1,562 context-switches # 0.035 M/sec
15 cpu-migrations # 0.332 K/sec
324 page-faults # 0.007 M/sec
161,460,453 cycles # 3.577 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
159,689,168 instructions # 0.99 insns per cycle
36,660,521 branches # 812.111 M/sec
991,129 branch-misses # 2.70% of all branches
5.005701424 seconds time elapsed
What might show up are -latency- differences, but I haven't been able
to measure any. We appear to lack performance benchmarking facilities
of pcp itself, which is not too surprising considering the toolset is
supposed to be very lightweight.
> [...] I have not reviewed the main pdubuf changes at all ... I was
> waiting for a "done" flag from Frank. [...] Maybe a case for a
> post-commit review here.
The pdubuf.c code is "done" in the sense that I don't have any pending
work. I'd appreciate a closer review.
Re. your earlier question:
> So you're freeing buffers when the pin count goes to zero?
Yes, trusting libc's memory manager to do a good job with free lists.
I'll take a peek in a few hours at the two qa .bad's nathans forwarded
earlier.
- FChE
|