pcp
[Top] [All Lists]

pcp updates: more multithreaded fixes and then some

To: pcp developers <pcp@xxxxxxxxxxx>
Subject: pcp updates: more multithreaded fixes and then some
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Sun, 8 May 2016 16:54:32 -0400
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mutt/1.4.2.2i
Hi -

A mixture of core libpcp multithreading fixes and independent
scaling/robustness patches for other stuff are on the pcpfans.git
fche/multithread branch [freshly rebased]:


commit 17a67d2fcc9e39fb94ce536e3664dc1ce450d873 (HEAD -> fche/multithread)
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun May 8 16:30:57 2016 -0400

    pmmgr: tune logging batching
    
    When pmmgr runs pmlogcheck on an archive, this can produce voluminous
    warning traffic (e.g. for SGI PR1142) that's not helpful for a pmmgr
    admin.  We now redirect that output also to /dev/null.  Since there is
    now less output, tweak the obatched(stream) code to issue an explicit
    ostream::flush(), so that whether the stream is default-buffered or
    not, the log file will be current.

commit f8af410a6aa6a5185c54e959fa900f7147a8824a
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun May 8 16:16:30 2016 -0400

    libpcp multithreading: context.c, derive.c, pmns.c lock order corrections
    
    More instances of inconsistent lock orderings are corrected.
    
    context.c: pmDupContext() removes unnecessary nesting entirely.
    
    derive.c: trades the possibility of data races for the elimination of
              deadlocks, by briefly releasing the registered.mutex around
              reentrant PMAPI calls like pmLookup*
    
    pmns.c: Introduces pmns_lock.
            Removes recursive locking from __pmFixPMNSHashTab() and 
TraversePMNS.
    
    The results are that all the thread-group test cases run reliably
    here, with no remaining helgrind lock-ordering warnings in any of the
    449-invoked multithread* tests, nor 4751.

commit 2a3815f65cf173070c840ce5798611eb7054ceb8
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun May 8 11:29:11 2016 -0400

    unresponsive-pmda pmie message: identify host
    
    For remotely monitored hosts that have suffered PMDA failure, the pmie
    message should identify the host.  Adding @%h to the message, as per
    many other pmieconf examples.  (No QA impact, as this message does not
    appear in QA at all.)

commit 547da9b379d6cbccd6233134005fb30fc8a90456
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun May 8 10:50:06 2016 -0400

    crash-resilience for systemd pmmgr/pmwebd
    
    Switch to using Unit=forking Restart=always for these services.
    They now get auto-restarted by systemd if they crash or are kill-9'd.
    The same treatment is probably appropriate for pmcd.

commit 399bbaec4d8dd2b89892f383da2095599f59ec52
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun May 8 09:05:06 2016 -0400

    pmmgr scaling: don't cry on a SIGPIPE
    
    It has been reported that on some heavily loaded systems, pmmgr
    can intermittently die with a "too many interrupts" message.  Analysis
    with systemtap indicates that these events come from SIGPIPE's being
    sent by the kernel from within a
     __pmSend
     __pmXmitPDU
     __pmSendNameList
     pmLookupName
     ....
     __dmopencontext
     pmNewContext
    call chain.  Presumably, a remote pmcd died mid-conversation, and
    pdu.c's SIGPIPE ignoring logic didn't help enough.
    
    pmmgr should not look for SIGPIPE anyway as a termination signal - we
    don't produce output on stdout like a pipeable UNIX tool.  We now
    SIG_IGN it.

commit 00a20c48964b2cbb74696ef77ad09d24b60ec3e2
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sun May 8 08:10:57 2016 -0400

    pmmgr target-threads: tolerate OSs that return <0 for 
sysconf(_SC_NPROCESSORS_ONLN)
    
    It's theoretically possible for the online-cpu-count to come back
    negative.  Map that to zero instead of propagating to a negative
    number of target threads.


Older commits f96eecd etc. were already reported back on May 5 under
different commit hashes.


- FChE

<Prev in Thread] Current Thread [Next in Thread>