Changes committed to git://git.pcp.io/pcp master
Frank Ch. Eigler (8):
libpcp fetchgroups: match docs & implementation
RHBZ1325363: multithreaded pmNewContext
pmmgr: parallelize potential target pmcd analysis & daemon shutdown
pmmgr: make foreground mode less magic
pmNewContext multithreading: defer derived-metric initialization
pmmgr: tweak threading and verbosity
multithreaded pmNewContext: tweak locks and errors
multithreaded pmNewContext: delay -Dcontext printing till after
commit
man/man1/pmmgr.1 | 22 +
man/man3/pmfetchgroup.3 | 4
qa/4751 | 97 +++++++
qa/4751.out | 53 ++++
qa/666 | 8
qa/666.out | 1
qa/group | 1
qa/src/.gitignore | 1
qa/src/GNUlocaldefs | 10
qa/src/multithread10.c | 105 ++++++++
src/include/pcp/impl.h | 5
src/libpcp/src/check-statics | 1
src/libpcp/src/context.c | 186 ++++++++++-----
src/libpcp/src/fetchgroup.c | 27 --
src/pmmgr/pmmgr.cxx | 533
++++++++++++++++++++++++++++++++-----------
src/pmmgr/pmmgr.h | 19 -
src/pmmgr/pmmgr.options | 2
17 files changed, 846 insertions(+), 229 deletions(-)
Details ...
commit c7e9299f6a03b019908d0d4502987501720dd381
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Fri Apr 15 14:57:19 2016 -0400
multithreaded pmNewContext: delay -Dcontext printing till after commit
In order to dump the new context data properly for -Dcontext, this
needs to be attempted after the contexts[] slot replaces the
being_initialized stub with the real context. Move -Dcontext
treatment after this point.
commit 412a4e4120468afd176475826258251f3c158ecc
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Thu Apr 14 15:30:31 2016 -0400
multithreaded pmNewContext: tweak locks and errors
Eagle-eyed brolley found a few places where PM_UNLOCKs mismatched
PM_LOCKs in the new code. Fixed those. In addition, tweaked
pmReconnectContext, pmDupContext, pmDestroyContext, pmUseContext
to more vigorously detect & reject FREE/INIT state contexts.
commit ae0d529079a791db58390099b9debe3c15808de1
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Tue Apr 12 08:26:34 2016 -0400
pmmgr: tweak threading and verbosity
The recent pmcd-search multithreading work spun off threads up to a
calculated or configured limit, where that limit was independent of
the amount of work available for the threads. This could waste time &
momentary memory. We now limit multithreading to the actual number of
input work items.
While in the vicitinity, tweak message-verbosity so that pmmgr -v
prints a good bare-essential level of information (remote pmcds
found, daemons started), which is a good default. -v -v prints
much more detail.
commit ad40e392f0556efe1221ac101734b2269a6508f3
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Mon Apr 11 10:27:14 2016 -0400
pmNewContext multithreading: defer derived-metric initialization
lberk reported that a $PCP_DERIVED_CONFIG-laden pcp app segv's due to
__dmopencontext() running general pmapi functions on the
being_initialized
context structure. We defer this until after the context[] slot is
set,
marking the beginning of its pmapi usability.
commit 997cba78b43bac679de06b0afb3ebea7aa76e21b
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Sat Apr 9 19:20:28 2016 -0400
pmmgr: make foreground mode less magic
Just as for pmwebd back in commit 9c82cf68a, don't mandate -U `whoami`
if one simply wants to run pmmgr under one's own unprivileged userid.
Only attempt __pmSetProcessIdentity() if we're root to start with.
commit 2084af3416dfb47adc7286e69310457fa61d6008
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Sat Apr 9 18:49:40 2016 -0400
pmmgr: parallelize potential target pmcd analysis & daemon shutdown
It was reported that if pmmgr was given a target pmcd list containing
numerous hosts that are at times unreachable, then a delay of up to
$PMCD_CONNECT_TIMEOUT (10s!) may be absorbed - per unreachable host -
during the hostid-calculation phase.
So now we parallelize a couple of more things, to let pmmgr scale out
to a much larger number of target daemons:
- pcp contexts are opened in parallel to the potential pmcd list
already gathered from target-host and target-discovery
- container subtargets are searched in parallel for surviving
live pmcds
- eventually, pmmgr daemons are shut down in parallel, in separate
threads that issue the SIGTERM / SIGKILL)
qa/666 updated. Other scale testing with hundreds of
always-unreachable hosts (e.g., the RFC5737 TEST-NET 192.0.2.0/24
range) indicates proper parallelization and tolerance of timeouts.
Amongst some tasty coding treats:
- a "locker" class to embody automatic {}-block-lifespan mutex
holding, instead of explicit pthread_mutex_[un]lock ops
- an "obatched" ostream-like class to let output-streaming <<
operations accumulate in a stringstream, so concurrent cerr
output is not interleaved
- a "parallel_do" function that launches N threads against a shared
(usually embedded-lock-carrying) work-queue structure
commit efc0173ad84b555e0819a8fa2219ac06acd70326
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Fri Apr 8 20:07:57 2016 -0400
RHBZ1325363: multithreaded pmNewContext
While parallelizing pmmgr, it was discovered that the core
pmNewContext function is a bottleneck when trying to connect to a
large number of servers. Prior to this patch, it held the big libpcp
lock throughout the entire context-creation process, which can last
10+ seconds (e.g., if a remote pmcd host is unreachable). That locks
out many other pmapi operations, and serializes connections to
multiple hosts.
Detailed analysis of pmNewContext and its callees showed that it is
possible to relax holding the big libpcp lock to much shorter time
periods, and specifically to exclude indefinite-length operations like
the socket connection to a remote pmcd, and even the analysis of
archives. This is partly done by introducing a special
PM_CONTEXT_INIT c_type placeholder object into the context[] array
during initialization, and tweaking timing & locking sequences.
The result is that pmNewContext calls can almost completely overlap
each other safely. A new test case (4751, a descendant of 475)
stress-tests by opening hundreds of various types of contexts at the
same time, including repeated, unreachable, and
theoretically-shareable ones. The new code precludes sharing of
concurrent connections/archive-control data to the same destinations,
but leaves non-concurrent sharing behaviour is unmodified.
commit 5a53f9ec0ae3ff5bc253ebe60248d435826319ce
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date: Fri Apr 8 18:12:27 2016 -0400
libpcp fetchgroups: match docs & implementation
The pmFetchGroup() function return value was misdocumented (>0 ok).
The pmFetchGroupSetMode() function was removed from the exported /
documented API, so can safely be removed from the implementation..
|