pcp
[Top] [All Lists]

pcp updates

To: pcp@xxxxxxxxxxx
Subject: pcp updates
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 6 May 2016 17:47:45 +1000
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2
git has got a bit confused here ... most of Frank's commit I've pushed earlier, 
but these are still WIP as qa/449 hangs and there are more changes to come in 
this area.

My commits are trying to get the Debian builds back on the rails.


Changes committed to git://git.pcp.io/kenj/pcp master

Frank Ch. Eigler (11):
      multithreading qa/4751
      PR1055: handle some multithreaded deadlocks & race conditions
      multithreaded testing: ipc debugging messages
      qa/4751 multithread: create new PCP_DEBUG subtest
      libpcp multithreading: un-nest tz_lock
      pmmgr pcpqa/666: robustify, run unprivileged
      pmwebd speedup: libmicrohttpd TURBO mode
      qa/4751 reactivate
      libpcp multithread cont'd: lock ordering in __pmMultiThreaded and 
pmgetconfig
      qa/4751 reactivate
      libpcp multithread cont'd: lock ordering in __pmMultiThreaded and 
pmgetconfig

Ken McDonell (6):
      debian/rules: disable dbgsym packages
      qa/admin/check-vm: more Debian changes
      debian/control: changes for optional prereqs
      debian/control: deleted
      debian/libpcp3-dev.install: add missing /usr/include/pcp targets
      qa/admin/check-vm: tweak some Debian rules

 debian/control               |  420 -------------------------------------------
 debian/control.master        |    2 
 debian/fixcontrol.master     |   28 +-
 debian/libpcp3-dev.install   |    3 
 debian/rules                 |    6 
 qa/4751                      |   38 +++
 qa/4751.out                  |  258 ++++++++++++++++++++++----
 qa/666                       |   58 ++---
 qa/admin/check-vm            |    9 
 qa/group                     |    4 
 src/include/pcp/impl.h       |    2 
 src/libpcp/src/check-statics |   59 +++---
 src/libpcp/src/config.c      |   32 ++-
 src/libpcp/src/context.c     |   54 +++--
 src/libpcp/src/ipc.c         |   75 ++++---
 src/libpcp/src/lock.c        |   24 +-
 src/libpcp/src/logutil.c     |   43 ++--
 src/libpcp/src/pdu.c         |   28 ++
 src/libpcp/src/pdubuf.c      |   47 ++--
 src/libpcp/src/tz.c          |   61 +++---
 src/pmwebapi/main.cxx        |   11 -
 21 files changed, 573 insertions(+), 689 deletions(-)

Details ...

commit 0dd558c82d0cfbce021eb403277f13fa758055ef
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri May 6 17:29:02 2016 +1000

    qa/admin/check-vm: tweak some Debian rules

commit 78b8c9aa671bad6176fd5c33c0f61a6165f098e7
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri May 6 17:27:18 2016 +1000

    debian/libpcp3-dev.install: add missing /usr/include/pcp targets
    
    platformsz.h, platform32.h and platform64.h appear to have all been
    added recently, but changes needed here to avoid build breakage on
    every single Debian-based system.

commit 1cbf380f688a4f6e40b3996cdfa1e6393731f695
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri May 6 17:26:46 2016 +1000

    debian/control: deleted

commit cfcbf73cc14ae59a5ff0c6ce6f4d1beb1b58fb87
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri May 6 17:23:18 2016 +1000

    debian/control: changes for optional prereqs
    
    1. rm control ... should be rebuild from control.master
    2. change dh-python to ${dh-python} to match the logic in fixcontrol.master
    3. change libpapi-dev and libpfm4-dev to ?{libpapi-dev} and ?{libpfm4-dev}
    4. update logic in fixcontrol.master ... this is a WIP and expect more
       changes as the build breakage on other platforms is fixed

commit 283b8598c1e3d3b43fbacb40e7611e493cf90784
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Thu May 5 11:24:44 2016 -0400

    libpcp multithread cont'd: lock ordering in __pmMultiThreaded and 
pmgetconfig
    
    Replacing libpcp lock in __pmMultiThreaded with a small non-nested
    lock, and similarly pmgetconfig().  Corrects occasional deadlock seen
    in qa/449 (src/multithread1).  Code still not helgrind-clean: the
    pmns.c code may need to go next.

commit 687c36881628dcb95f51d8dd6a1c3e0afead4bd2
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Mon May 2 13:20:00 2016 -0400

    qa/4751 reactivate
    
    After the recent libpcp fixes, this test seems repeatable and
    a good stressor for libpcp multithreading.

commit d25599dd7bbc13bcd2bf7b18d632c113cbb4d1d6
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri May 6 07:26:37 2016 +1000

    qa/admin/check-vm: more Debian changes
    
    Need python and python3.
    Need time package (which is not default installed in latest Debian version).

commit ba916746ae5b979a18797410627710a0a6b631b1
Author: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date:   Fri May 6 07:25:09 2016 +1000

    debian/rules: disable dbgsym packages
    
    Turned on by default in post-jessie Debian versions, but does not
    play well with existing PCP packaging.

commit d664a9d82aad11d64f8e3948fe1d51a7359ec3da
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Thu May 5 11:24:44 2016 -0400

    libpcp multithread cont'd: lock ordering in __pmMultiThreaded and 
pmgetconfig
    
    Replacing libpcp lock in __pmMultiThreaded with a small non-nested
    lock, and similarly pmgetconfig().  Corrects occasional deadlock seen
    in qa/449 (src/multithread1).  Code still not helgrind-clean: the
    pmns.c code may need to go next.

commit a9764809b468d02f6e00763ced6b42f9abd75380
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Mon May 2 13:20:00 2016 -0400

    qa/4751 reactivate
    
    After the recent libpcp fixes, this test seems repeatable and
    a good stressor for libpcp multithreading.

commit 2bf81a70ec5dff787fb4077c448b305fffd1c4c0
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Mon May 2 11:34:52 2016 -0400

    pmwebd speedup: libmicrohttpd TURBO mode
    
    An implementation artifact in libmicrohttpd prior to svn commit r37105
    meant that concurrent requests into pmwebd are batched in the sense
    that the response to one is not sent until the response to all are
    finished.  This means more perceived waiting for e.g. pmwebd grafana
    dashboards with multiple charts, because the empty screen lasts
    longer.
    
    The MHD_USE_EPOLL_TURBO flag for MHD_start_daemon activates
    performance tweaks, including an improvement in the above behavior.
    It's harmless in older libmicrohttpd, and is transparent to qa.

commit 81935689d077b8d30f666dec46907c38c23af336
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Mon May 2 11:22:25 2016 -0400

    pmmgr pcpqa/666: robustify, run unprivileged
    
    The 666 test case is sometimes reported flaky.  Some experiments
    suggest one factor is the sloth of pmlogconf, especially on virtual
    machines.  It can take some 90 seconds (!) for a simple kvm guest, for
    reasons not yet understood.  This can lead the 666 script's
    pmlogconf-awaiting logic to time out, since history waits for no man -
    longer than 60 seconds.  This timeout is bumped up to 300 seconds.
    
    Synchronization via pmcd.* metrics is also a bit flaky, so we switch
    to running pmmgr and its subordinate daemons unprivileged, and monitor
    the output files [-s $FILE] directly.  Not using $sudo all over also
    simplifies the valgrind supervision logic.

commit 836fd5ea1b3939f9f60d55f9b30a4e6efc8c5698
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Mon Apr 25 09:44:45 2016 -0400

    libpcp multithreading: un-nest tz_lock
    
    libpcp's historical use of recursive libpcp lock has allowed patterns
    of carefree intercalling of lock-taking functions.  With normal
    non-recursive locks, that's instant deadlock.  Remove nested locking
    in purely unnecessary cases.

commit 0a5caba663cbbf7420b189e20387bf36f39c30e7
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun Apr 24 19:23:26 2016 -0400

    qa/4751 multithread: create new PCP_DEBUG subtest
    
    Running the big final test with PCP_DEBUG=-1 can slow it down
    enough to occasionally fail.  Add an intermediate length test
    that runs quicker but still covers a swath of context types.
    
    Some higher values of PCP_DEBUG invoke taking locks in a nested,
    order-violating fashion.  This patch brings local lock goodness to
    libpcp/src/tz.c, moves dumping outside locking in pdubuf.c, and
    extends qa/4751 to test two sets of PCP_DEBUG runs.  DBG_TRACE_PDU
    is particularly vulnerable because it does (locky) PMNS ops.

commit 169b018477648e0b25bd7ccfa7b1f47f03b93e9f
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun Apr 24 18:35:31 2016 -0400

    multithreaded testing: ipc debugging messages
    
    Similar to commit c7e9299f6a03, the ipc.c tracing operations also need
    to be moved outside the new non-recursive locks.  qa/4751 runs the
    last test with PCP_DEBUG=-1 to try to stress this aspect.

commit 4da610ef287e6841046eb0822766f9bd3c658198
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun Apr 24 15:17:00 2016 -0400

    PR1055: handle some multithreaded deadlocks & race conditions
    
    While running the qa/4751 test case at full scale, deadlocks reliably
    occur.  (In fact, the 4751.out file was initially checked in truncated
    due to an alarm() catching the deadlocked run, producing no output.)
    The same type of deadlock is also easily demonstrated on stock
    previous-version libpcp, so it exculpates the recent pmNewContext
    multithreading changes.
    
    The valgrind "helgrind" tool is good at identifying problems of this
    nature, and should be routinely used for verifying code that deals
    with PM_*LOCK.
    
    The gist of one problem is inconsistent lock ordering.  The libpcp
    lock is sometimes taken nested within a context c_lock; and sometimes
    vice versa.  Two threads can easily lock each other out.  helgrind
    showed multiple different scenarios where the libpcp lock was taken
    unnecessarily by lower level code - where a smaller lock was
    sufficient.  This patchset adds a handful of small, non-recursive
    locks for these.
    
    This patch also includes a fix to a nastier race condition in
    __pmHandleToPtr(), whereby a context-destruction could race against
    context-structure lookup.  Some work remains in the multi-archive code
    and elsewhere to avoid two mildly racy functions (__pmPtrToHandle and
    the new __pmHandleToPtr_unlocked).
    
    qa/4751 and all other prexisting thread-group test cases look good
    now, no more deadlocks or lock-ordering-error reports there at least.
    (There are likely more hiding in the code: the libpcp lock is way
    overused.)

commit 2a7e146b5400736801a8daaff8bf0f3213d962dd
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun Apr 24 14:55:25 2016 -0400

    multithreading qa/4751
    
    Tweak the qa/4751 test case so that different unreachable-host type
    error codes are mapped to a uniform one.  Generate an actual proper
    output for the last test (the one with some 156 contexts/threads).

<Prev in Thread] Current Thread [Next in Thread>