> [...]
> I noticed pmdapapi timed out for me on fetch[...]
> $ time sudo pminfo -v papi > /dev/null
> real 0m5.316s
> [...]
> and then subsequently, too ...
> $ time sudo pminfo -v papi > /dev/null
> real 0m1.765s
> [...]
The two commits on pcpfans.git fche/papi improve on this situation by
moving code from the papi-pmda "inner loop" (the papi_fetchCallBack
function) into the higher level papi_fetch one. With default pminfo
and papi-pmda batching values, both those pminfo calls should finish
in much less than a second. The knobs exposed by the code let you
benchmark the improvements on your own hardware too.
Before:
# pmstore papi.control.batch 9999
# pmstore papi.control.reset ""
# /bin/time pminfo -f papi >/dev/null
0.00user 0.00system 0:02.68elapsed
# /bin/time pminfo -b1 -f papi >/dev/null
0.00user 0.01system 0:03.30elapsed
After:
# pmstore papi.control.batch 10
# pmstore papi.control.reset ""
# /bin/time pminfo -f papi >/dev/null
0.00user 0.00system 0:00.19elapsed
# /bin/time pminfo -f papi >/dev/null
0.00user 0.00system 0:00.11elapsed
commit 31cf3bc00d58b5f9e28b33d46614e993050b66b9 (HEAD, origin/fche/papi,
fche/papi)
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Mon Jan 26 21:20:13 2015 -0500
papi pmda: add papi-read batching
On an ordinary pcp fetch of an already-running set of papi counters,
previous code was doing a PAPI_read(3) operation for every individual
pmid. With the prior code & default pminfo-batch size of ~20, each
pcp fetch took well below 1 second, but still...
We now batch by doing a single PAPI_read at the pcp-fetch level. One
can compare the before & after results by running this new code with
% pminfo -f papi
to get many counters auto-enabled, then
% pminfo -b1 -f papi >/dev/null
to fetch with no batching (as before), and
% pminfo -b999 -f papi >/dev/null
to fetch in apprx. one big batch (the ideal case).
commit 07350e06314c39bec16adc502a1d2ba37e0dfd40 (origin/fche/papi)
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Fri Jan 23 21:19:59 2015 -0500
papi pmda: add papi-refresh batching
It has been reported that in some cases, a "pminfo -f papi" operation
(causing auto-enable of all avilable counters) can take a noticeable
amount of time. This was due to incremental PAPI-level regeneration
for each new added metric/counter. That permitted precise errors, but
costs time.
This patch introduces a "papi.control.batch" parameter, which sets a
limit for the maximum number of pmid's per fetch that permits the old
behavior. Abvove that limit, incremental PAPI regenerations are
temporarily suppressed until the end of the fetch operation, so as to
have one big PAPI bang.
The default value (10) makes "pminfo -f papi" take <1 second. Setting
it to a large number (100s) can let it take up to multiple seconds,
depending on machine load, perhaps even triggering pmcd's "timeout,
you're gone!" treatment of the pmda. Testing this aspect (stressing
papi to timeout-failure) has not been included, but otherwise basic
qa smoke-testing is present.
In passing, the papi.c file is retabified, and an earlier pmUnits
error introduced in commit ecfacf3ff for papi.control.auto_enable is
corrected.
- FChE
|