pcp
[Top] [All Lists]

Re: [pcp] PAPI pmda Note

To: Lukas Berk <lberk@xxxxxxxxxx>
Subject: Re: [pcp] PAPI pmda Note
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Tue, 16 Sep 2014 04:55:25 -0400 (EDT)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20140905200756.GA31071@xxxxxxxxxx>
References: <20140905200756.GA31071@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: HiMhfhjBWoH+FkfnZjrfW4/J2Jk37w==
Thread-topic: PAPI pmda Note
Hi Lukas,

----- Original Message -----
> Hey folks,
> 
> With the release going out earlier today I just wanted to demo a bit of
> the PAPI pmda functionality.
> 
> The pmda features a host of papi.control metrics used for
> enabling/disabling counters and administration of the pmda.  The metrics
> themselves are papi.METRIC, where METRIC is the suffix of the PAPI event
> code (ie, PAPI_TOT_INS becomes TOT_INS).  Currently the metrics are
> system wide, with process/thread specific metrics (hopefully) coming
> soon.

*nod* - all good, this is looking neat - thanks!  Reading your note just
triggered a synapse-firing related to this:

> If I wanted to compare the number of system wide Level 1 Total Cache
> Misses to Level 1 Total Cache Hits I would;
> 
> sudo pmstore papi.control.enable "L1_TCM L1_TCH" (enable/disable being a
> space or ',' sperated list)
> sudo pmval papi.T1_TCH (view subsequent output)
> sudo pmval papi.T1_TCM (view more output)
> sudo pmstore papi.control.disable "L1_TCM,L1_TCH"

Often this'd be documented in the man pages (see pmdagluster(1), or
pmdagfs2(1) for an example of their control metrics use - possibly a
bit of boilerplate doc could be lifted from those?) or in the README
file for a PMDA if it has one (see the pmdashping(1) control file
metric discussion in its README) - nothing major, just a quick demo
doc like you've done here that can live in the git tree & installed
files.

> A few other TODO's I have lined up for the pmda: I'd like to eventually
> make the enable/disable control metrics a bit more flexible (ie, pmstore
> papi.control.disable "*" or "all" would be nice).  We're also working

I like it, I think those would be neat extensions.  Franks suggestion
(I think it was Franks?  sorry, seems like I've been on a different
planet for a week) of a more general regex model (e.g. "L1*") is also
a good one IMO - that would then cover matching on "*" and is unlikely
to conflict with the PAPI metric names I'd guess.

> papi.control.{enable,disable} metrics. (and of course, more qa as
> needed)

Yes please.  :)  So, we ran out of time for valgrind checking, that
would be a delight to see (as discussed end of last release) and we
had chatted about improving that initial coverage of pmstore error
handling - I'd love to see more in that area if you have time (the
little I did there didn't really attack the harder problems - like
enabling hardware counters which are incompatible with each other,
such that PAPI errors out when they're enabled together - that kind
of thing).  Tricky cases like that are an area of on-going, wider
interest, so a reproducible test case of known-whacky PAPI/hardware
scenarios would be most excellent to have up our sleeves.


Thanks Lukas!  Also, you mentioned earlier today those per-process
metrics were proceeding nicely, and that the next update was around
the corner - please include in that one the spec file fix to switch
this on by default for Fedora builds, and a back-port of the s390(x)
fixes - these'll live in build/rpm/fedora.spec for the next release.

(that is, if you don't mind - else, I'll find some time to backport/
enable those bits-and-pieces before the next release rolls around)

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>