pcp
[Top] [All Lists]

Re: Slow pmdapapi fetching

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: Slow pmdapapi fetching
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Mon, 26 Jan 2015 21:41:59 -0500
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <765513326.15612778.1421976592499.JavaMail.zimbra@xxxxxxxxxx> (Nathan Scott's message of "Thu, 22 Jan 2015 20:29:52 -0500 (EST)")
References: <258293690.15607144.1421974332709.JavaMail.zimbra@xxxxxxxxxx> <765513326.15612778.1421976592499.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
> [...]
> I noticed pmdapapi timed out for me on fetch[...]
> $ time sudo pminfo -v papi > /dev/null
> real 0m5.316s
> [...]
> and then subsequently, too ...
> $ time sudo pminfo -v papi > /dev/null
> real 0m1.765s
> [...]

The two commits on pcpfans.git fche/papi improve on this situation by
moving code from the papi-pmda "inner loop" (the papi_fetchCallBack
function) into the higher level papi_fetch one.  With default pminfo
and papi-pmda batching values, both those pminfo calls should finish
in much less than a second.  The knobs exposed by the code let you
benchmark the improvements on your own hardware too.

Before:
  # pmstore papi.control.batch 9999
  # pmstore papi.control.reset ""
  # /bin/time pminfo -f papi >/dev/null
0.00user 0.00system 0:02.68elapsed
  # /bin/time pminfo -b1 -f papi >/dev/null
0.00user 0.01system 0:03.30elapsed

After:
  # pmstore papi.control.batch 10
  # pmstore papi.control.reset ""
  # /bin/time pminfo -f papi >/dev/null
0.00user 0.00system 0:00.19elapsed
  # /bin/time pminfo -f papi >/dev/null
0.00user 0.00system 0:00.11elapsed


commit 31cf3bc00d58b5f9e28b33d46614e993050b66b9 (HEAD, origin/fche/papi, 
fche/papi)
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Mon Jan 26 21:20:13 2015 -0500

    papi pmda: add papi-read batching
    
    On an ordinary pcp fetch of an already-running set of papi counters,
    previous code was doing a PAPI_read(3) operation for every individual
    pmid.  With the prior code & default pminfo-batch size of ~20, each
    pcp fetch took well below 1 second, but still...
    
    We now batch by doing a single PAPI_read at the pcp-fetch level.  One
    can compare the before & after results by running this new code with
       % pminfo -f papi
    to get many counters auto-enabled, then
       % pminfo -b1 -f papi >/dev/null
    to fetch with no batching (as before), and
       % pminfo -b999 -f papi >/dev/null
    to fetch in apprx. one big batch (the ideal case).

commit 07350e06314c39bec16adc502a1d2ba37e0dfd40 (origin/fche/papi)
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Fri Jan 23 21:19:59 2015 -0500

    papi pmda: add papi-refresh batching
    
    It has been reported that in some cases, a "pminfo -f papi" operation
    (causing auto-enable of all avilable counters) can take a noticeable
    amount of time.  This was due to incremental PAPI-level regeneration
    for each new added metric/counter.  That permitted precise errors, but
    costs time.
    
    This patch introduces a "papi.control.batch" parameter, which sets a
    limit for the maximum number of pmid's per fetch that permits the old
    behavior.  Abvove that limit, incremental PAPI regenerations are
    temporarily suppressed until the end of the fetch operation, so as to
    have one big PAPI bang.
    
    The default value (10) makes "pminfo -f papi" take <1 second.  Setting
    it to a large number (100s) can let it take up to multiple seconds,
    depending on machine load, perhaps even triggering pmcd's "timeout,
    you're gone!" treatment of the pmda.  Testing this aspect (stressing
    papi to timeout-failure) has not been included, but otherwise basic
    qa smoke-testing is present.
    
    In passing, the papi.c file is retabified, and an earlier pmUnits
    error introduced in commit ecfacf3ff for papi.control.auto_enable is
    corrected.


- FChE

<Prev in Thread] Current Thread [Next in Thread>