Hi -
A mixture of core libpcp multithreading fixes and independent
scaling/robustness patches for other stuff are on the pcpfans.git
fche/multithread branch [freshly rebased]:
commit 17a67d2fcc9e39fb94ce536e3664dc1ce450d873 (HEAD -> fche/multithread)
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Sun May 8 16:30:57 2016 -0400
pmmgr: tune logging batching
When pmmgr runs pmlogcheck on an archive, this can produce voluminous
warning traffic (e.g. for SGI PR1142) that's not helpful for a pmmgr
admin. We now redirect that output also to /dev/null. Since there is
now less output, tweak the obatched(stream) code to issue an explicit
ostream::flush(), so that whether the stream is default-buffered or
not, the log file will be current.
commit f8af410a6aa6a5185c54e959fa900f7147a8824a
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Sun May 8 16:16:30 2016 -0400
libpcp multithreading: context.c, derive.c, pmns.c lock order corrections
More instances of inconsistent lock orderings are corrected.
context.c: pmDupContext() removes unnecessary nesting entirely.
derive.c: trades the possibility of data races for the elimination of
deadlocks, by briefly releasing the registered.mutex around
reentrant PMAPI calls like pmLookup*
pmns.c: Introduces pmns_lock.
Removes recursive locking from __pmFixPMNSHashTab() and
TraversePMNS.
The results are that all the thread-group test cases run reliably
here, with no remaining helgrind lock-ordering warnings in any of the
449-invoked multithread* tests, nor 4751.
commit 2a3815f65cf173070c840ce5798611eb7054ceb8
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Sun May 8 11:29:11 2016 -0400
unresponsive-pmda pmie message: identify host
For remotely monitored hosts that have suffered PMDA failure, the pmie
message should identify the host. Adding @%h to the message, as per
many other pmieconf examples. (No QA impact, as this message does not
appear in QA at all.)
commit 547da9b379d6cbccd6233134005fb30fc8a90456
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Sun May 8 10:50:06 2016 -0400
crash-resilience for systemd pmmgr/pmwebd
Switch to using Unit=forking Restart=always for these services.
They now get auto-restarted by systemd if they crash or are kill-9'd.
The same treatment is probably appropriate for pmcd.
commit 399bbaec4d8dd2b89892f383da2095599f59ec52
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Sun May 8 09:05:06 2016 -0400
pmmgr scaling: don't cry on a SIGPIPE
It has been reported that on some heavily loaded systems, pmmgr
can intermittently die with a "too many interrupts" message. Analysis
with systemtap indicates that these events come from SIGPIPE's being
sent by the kernel from within a
__pmSend
__pmXmitPDU
__pmSendNameList
pmLookupName
....
__dmopencontext
pmNewContext
call chain. Presumably, a remote pmcd died mid-conversation, and
pdu.c's SIGPIPE ignoring logic didn't help enough.
pmmgr should not look for SIGPIPE anyway as a termination signal - we
don't produce output on stdout like a pipeable UNIX tool. We now
SIG_IGN it.
commit 00a20c48964b2cbb74696ef77ad09d24b60ec3e2
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Sun May 8 08:10:57 2016 -0400
pmmgr target-threads: tolerate OSs that return <0 for
sysconf(_SC_NPROCESSORS_ONLN)
It's theoretically possible for the online-cpu-count to come back
negative. Map that to zero instead of propagating to a negative
number of target threads.
Older commits f96eecd etc. were already reported back on May 5 under
different commit hashes.
- FChE
|