pcp
[Top] [All Lists]

Re: [pcp] pcp updates - pmdapapi update

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] pcp updates - pmdapapi update
From: Lukas Berk <lberk@xxxxxxxxxx>
Date: Thu, 13 Nov 2014 18:02:08 -0500
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <462023254.13251879.1415861264683.JavaMail.zimbra@xxxxxxxxxx> (Nathan Scott's message of "Thu, 13 Nov 2014 01:47:44 -0500 (EST)")
References: <87oascow3f.fsf@xxxxxxxxxx> <462023254.13251879.1415861264683.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
Hey,

Nathan Scott <nathans@xxxxxxxxxx> writes:
[...]

> Fabulous - awesome effort esp. on the QA front.  I ran out of time to
> review it all today (maybe someone else will?), but did sneak a quick
> background QA run in on RHEL 6 today.  I'm seeing a few new failures
> there - see attached .bad files - any ideas on possible root causes?

Thanks for looking things over.  I've pushed fixes for 3/4 of the
failures (diffstat and change updates below).  For 967 and 813 the issue
was a difference in pmid's for the TOT_INS metric.  This was due to the
dynamic pmns.  Functionally, the tests were identical (and running
properly), to fix this I added an additional regex to match the
papi.system.* pmid's to output them as 126.0.NUMBER.

Testcase 903 was failing partly due to the dynamic pmns as well.
Apparently the number of metrics on that box was much lower, so it
didn't trigger the regex (which would have swapped the number for an
'X').  After testing it on a vm with no papi metrics available, I
lowered the regex to match 7 or greater.  This provides matches for the
5 papi.control metrics, 1 papi.available metric, and at least one actual
papi.system.* metric.

Testcase 799 failed for the same reason I mentioned in my original
email, and I'd be open to advice on how to fix it.  The metrics I used
to force a ECNFLCT (if multiplexing is disable) on my machine, may not
exist on other machines.  Being able to find a combination of metrics
which would cause such an error, programatically, on the host qa
machine, is something I'm not sure how to do yet.

> Only other general piece of advice I can offer would be "release early,
> release often" - the first commit here is >1 month old, and it probably
> coulda been merged right away?  *shrug* ... either way is fine, but I'd
> go for quicker, smaller merges every day.

Understood, I'll try to do so more often.  Diffstat and commit updates
relevant to above posted below.

Cheers,

Lukas

--------------------------------------------------------------------------

 qa/813     |    1 +
 qa/813.out |    6 +++---
 qa/903     |    2 +-
 qa/967     |    1 +
 qa/967.out |   26 +++++++++++++-------------
 5 files changed, 19 insertions(+), 17 deletions(-)

Author: Lukas Berk <lberk@xxxxxxxxxx>
Date:   Thu Nov 13 16:12:37 2014 -0500

    Alter qa/903 awk statement to account for lower possible metric counts
    
    The number of available papi metrics varies based on the system being
    run on.  Previously there would be a pmid for each possible metric, so
    we could set the awk regex much higher.  At this point, limit it to 7
    or greater, (one for each papi.control and one papi.available).

commit b44c3c0decfcfbff9b4ca315bea2f9cf354fdfae
Author: Lukas Berk <lberk@xxxxxxxxxx>
Date:   Thu Nov 13 16:10:42 2014 -0500

    Update qa testcases to account for dynamic papi pmns
    
    The papi.system.TOT_INS metric is used in both qa/813 and qa/967
    testcases.  With the dynamic pmid's used, this metric may change
    based on the hardware it's run on.  Due to this, add a new
    regex to that matches 126.0.NUMBER, instead of a specific pmid

commit 629bc4ccaf3328c50d3d8b87cb176a60e3dcccb6
Author: Lukas Berk <lberk@xxxxxxxxxx>
Date:   Wed Nov 12 18:55:21 2014 -0500

    Add additional qa test that papi.control overrides auto_enable timeout
    
    qa/967 tests that we can disable the auto_enable metric and use pmdapapi
    as previously expected.  We now add that despite having a timeout (for
    the testcase's purposes we use a small one), papi.control.{enable,disable}
    takes higher priority and will allow counters to remain active even after
    the auto_enable timeout has been hit.

<Prev in Thread] Current Thread [Next in Thread>