pcp
[Top] [All Lists]

pcp updates: pmlogger one is important

To: pcp@xxxxxxxxxxx
Subject: pcp updates: pmlogger one is important
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 22 Jul 2016 06:59:26 +1000
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0
Changes committed to git://git.pcp.io/kenj/pcp master

Ken McDonell (10):
      qa/709: notrun for any PCP_PLATFORM other than Linux (pmcollectl)
      qa/666 & qa/common.check: handle broken Debian valgrind
      qa/admin/pcp-daily: re-enable valgrind group on Debian stretch hosts
      qa/578: increase tolerance for expected openfd values
      qa/914: notrun if there are no real hardware counters here
      qa/870: (new) test integrity of pmlogger control files
      qa/381: additional diagnositics for debugging
      qa/956: additional diagnositics for debugging
      src/include/pcp.env: Mac OS X change
      src/pmlogger/src/ports.c: fix broken logic for primary control file

 qa/381                   |   14 ++-
 qa/578                   |   21 +++--
 qa/578.out               |   12 +--
 qa/666                   |    3 
 qa/709                   |   10 ++
 qa/870                   |  173 +++++++++++++++++++++++++++++++++++++++++++++++
 qa/870.out               |    7 +
 qa/914                   |    8 +-------
 qa/956                   |    4 -
 qa/admin/pcp-daily       |    5 -
 qa/common.check          |   12 ++-
 qa/group                 |    1 
 src/include/pcp.env      |    9 +-
 src/pmlogger/src/ports.c |  119 ++++++++++++++++++++++++++------
 14 files changed, 350 insertions(+), 48 deletions(-)

Details ...

commit e607bbc64a18e7ad8c50503341dd3119231804e7
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri Jul 22 06:48:38 2016 +1000

    src/pmlogger/src/ports.c: fix broken logic for primary control file
    
    This was the root cause of the qa/1108 failures.
    
    The logic that checked for and stopped more than one primary pmlogger
    from running was broken.  Specifically using stat() instead of
    lstat() to check for a symbolic link will always fail, which drove
    us down the "old-style hardlink" path and unconditionally removed
    $PCP_TMP_DIR/pmlogger/primary before the existance check that was
    intended to stop multiple primary loggers from running.
    
    This error seems to have been introduced in commit 7148bf11 (almost
    12 months ago) ... sigh.
    
    And to compound the problem, a primary pmlogger was conditionally
    removing $PCP_TMP_DIR/pmlogger/primary at exit, meaning that if we
    ever got 2 (or more!) primary pmloggers running and either of them
    exited the control files would be removed and pmlogger_check would
    stumble along later and start another primary pmlogger running.
    
    So now we are checking the pid from the symlink and only removing
    the primary control file if this instance of pmlogger created it.
    
    Also cleaned up some misleading diagnostics.

commit 7ca4c81e25425aa592a0b853e1bebb55843031e2
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri Jul 22 06:46:15 2016 +1000

    src/include/pcp.env: Mac OS X change
    
    In _get_pids_by_name() we need to also accommodate ps(1) output that
    has the executable name enclosed in () ... this was causing QA failures
    for qa/956 on Mac OS X.

commit d4858c9de1ff9dc86601cbc42f5633e94ed17f58
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri Jul 22 06:45:03 2016 +1000

    qa/956: additional diagnositics for debugging

commit dc6dfd1ff23b5102f147e8a87f09502ffe4f6150
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri Jul 22 06:44:30 2016 +1000

    qa/381: additional diagnositics for debugging

commit 4a9298eab7b86504f3287c2386483efde17fa663
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri Jul 22 06:32:40 2016 +1000

    qa/870: (new) test integrity of pmlogger control files
    
    These are the ones in $PCP_TMP_DIR/pmlogger.  And getting this test
    to pass will address the root cause of the non-deterministic qa/1108
    failures.
    
    This test can be run with a --check argument which silently (if all
    is well) runs the integrity check without any of the test cases.
    In this form, could be used with check.callback to run the check
    after every test to help identify any test that leaves the control
    files in a bad state.

commit 00ae066eedfaa1ef971a15266ffb00733e997b9b
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Wed Jul 20 11:12:33 2016 +1000

    qa/914: notrun if there are no real hardware counters here
    
    The PAPI PMDA may have been built, but the platform may be lame
    hardware or a crippled VM with no support for hardware counters.

commit 6c58b9e89dbf04d67d991831a1f61e4ed24281fd
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Wed Jul 20 09:58:28 2016 +1000

    qa/578: increase tolerance for expected openfd values
    
    Based on a suggestion from Nathan that the failures in this test
    may be related to non-determinism coming from the recently added
    parallelism in the socket connection code, change the filtering to
    accept +/-1 from the (previously) expected value.

commit fe6f79f6af659b63e105413ed8d8e472b5c54ebe
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Wed Jul 20 09:39:10 2016 +1000

    qa/admin/pcp-daily: re-enable valgrind group on Debian stretch hosts

commit 3156256a4b85eeefde4b515f3ed1b38c85c4b098
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Wed Jul 20 09:37:49 2016 +1000

    qa/666 & qa/common.check: handle broken Debian valgrind
    
    Filter out bogus lines from the current Debian stretch version
    of valgrind.

commit f22f7a9d60a381ce8e647f798d2ed139b5437a97
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Tue Jul 19 20:12:27 2016 +1000

    qa/709: notrun for any PCP_PLATFORM other than Linux (pmcollectl)

<Prev in Thread] Current Thread [Next in Thread>