pcp
[Top] [All Lists]

PCP Updates: fche: multithreaded libpcp pmNewContext, pmmgr

To: PCP Mailing List <pcp@xxxxxxxxxxx>
Subject: PCP Updates: fche: multithreaded libpcp pmNewContext, pmmgr
From: Dave Brolley <dave.brolley@xxxxxxxxxx>
Date: Fri, 22 Apr 2016 16:01:04 -0400
Delivered-to: pcp@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rogers.com; s=s2048; t=1461355195; bh=bzGZN9YKj/KR7zQvvglW2TRUfzDR1PqQgxJYFO/pYZE=; h=Date:From:To:Subject:From:Subject; b=rR6/LFrXpbcDHQ7m2gN2bKrKx5EiafkaeVngx1f5Ebdh2Ow4SokWLo7gsPOmU5qvvlSquUgPokBFqxaDNfPP3eK4eBg13Ls5V68X6+7oV+AcYSfYQPlevVgArlfDE1cDo4af1jLM9SZlgCNbOth4AddUclQQY2Ya/rzZf9fFj42auN/Db4GU9B2Vx/KcvRXX8verXCVF+yOPrjCBYMpw+u8Fvc9vOvOBrQeyGruPbKT+pr4/rQtTC3sY54watLBYtqgr3bDIXz428+OsbdQSNIeWxijkeEqeH9+fJOr4Rh1dmzHdogCWu6pp0hvEey1msjTC6nDF7In3wCIsMQCX4A==
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
Changes committed to git://git.pcp.io/pcp master

Frank Ch. Eigler (8):
      libpcp fetchgroups: match docs & implementation
      RHBZ1325363: multithreaded pmNewContext
      pmmgr: parallelize potential target pmcd analysis & daemon shutdown
      pmmgr: make foreground mode less magic
      pmNewContext multithreading: defer derived-metric initialization
      pmmgr: tweak threading and verbosity
      multithreaded pmNewContext: tweak locks and errors
multithreaded pmNewContext: delay -Dcontext printing till after commit

 man/man1/pmmgr.1             |   22 +
 man/man3/pmfetchgroup.3      |    4
 qa/4751                      |   97 +++++++
 qa/4751.out                  |   53 ++++
 qa/666                       |    8
 qa/666.out                   |    1
 qa/group                     |    1
 qa/src/.gitignore            |    1
 qa/src/GNUlocaldefs          |   10
 qa/src/multithread10.c       |  105 ++++++++
 src/include/pcp/impl.h       |    5
 src/libpcp/src/check-statics |    1
 src/libpcp/src/context.c     |  186 ++++++++++-----
 src/libpcp/src/fetchgroup.c  |   27 --
src/pmmgr/pmmgr.cxx | 533 ++++++++++++++++++++++++++++++++-----------
 src/pmmgr/pmmgr.h            |   19 -
 src/pmmgr/pmmgr.options      |    2
 17 files changed, 846 insertions(+), 229 deletions(-)

Details ...

commit c7e9299f6a03b019908d0d4502987501720dd381
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Fri Apr 15 14:57:19 2016 -0400

    multithreaded pmNewContext: delay -Dcontext printing till after commit

    In order to dump the new context data properly for -Dcontext, this
    needs to be attempted after the contexts[] slot replaces the
    being_initialized stub with the real context.  Move -Dcontext
    treatment after this point.

commit 412a4e4120468afd176475826258251f3c158ecc
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Thu Apr 14 15:30:31 2016 -0400

    multithreaded pmNewContext: tweak locks and errors

    Eagle-eyed brolley found a few places where PM_UNLOCKs mismatched
    PM_LOCKs in the new code.  Fixed those.  In addition, tweaked
    pmReconnectContext, pmDupContext, pmDestroyContext, pmUseContext
    to more vigorously detect & reject FREE/INIT state contexts.

commit ae0d529079a791db58390099b9debe3c15808de1
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Tue Apr 12 08:26:34 2016 -0400

    pmmgr: tweak threading and verbosity

    The recent pmcd-search multithreading work spun off threads up to a
    calculated or configured limit, where that limit was independent of
    the amount of work available for the threads.  This could waste time &
    momentary memory.  We now limit multithreading to the actual number of
    input work items.

    While in the vicitinity, tweak message-verbosity so that pmmgr -v
    prints a good bare-essential level of information (remote pmcds
    found, daemons started), which is a good default.  -v -v prints
    much more detail.

commit ad40e392f0556efe1221ac101734b2269a6508f3
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Mon Apr 11 10:27:14 2016 -0400

    pmNewContext multithreading: defer derived-metric initialization

    lberk reported that a $PCP_DERIVED_CONFIG-laden pcp app segv's due to
__dmopencontext() running general pmapi functions on the being_initialized context structure. We defer this until after the context[] slot is set,
    marking the beginning of its pmapi usability.

commit 997cba78b43bac679de06b0afb3ebea7aa76e21b
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sat Apr 9 19:20:28 2016 -0400

    pmmgr: make foreground mode less magic

    Just as for pmwebd back in commit 9c82cf68a, don't mandate -U `whoami`
    if one simply wants to run pmmgr under one's own unprivileged userid.
    Only attempt __pmSetProcessIdentity() if we're root to start with.

commit 2084af3416dfb47adc7286e69310457fa61d6008
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Sat Apr 9 18:49:40 2016 -0400

    pmmgr: parallelize potential target pmcd analysis & daemon shutdown

    It was reported that if pmmgr was given a target pmcd list containing
    numerous hosts that are at times unreachable, then a delay of up to
    $PMCD_CONNECT_TIMEOUT (10s!) may be absorbed - per unreachable host -
    during the hostid-calculation phase.

    So now we parallelize a couple of more things, to let pmmgr scale out
    to a much larger number of target daemons:
    - pcp contexts are opened in parallel to the potential pmcd list
      already gathered from target-host and target-discovery
    - container subtargets are searched in parallel for surviving
      live pmcds
    - eventually, pmmgr daemons are shut down in parallel, in separate
      threads that issue the SIGTERM / SIGKILL)

    qa/666 updated.  Other scale testing with hundreds of
    always-unreachable hosts (e.g., the RFC5737 TEST-NET 192.0.2.0/24
    range) indicates proper parallelization and tolerance of timeouts.

    Amongst some tasty coding treats:
    - a "locker" class to embody automatic {}-block-lifespan mutex
      holding, instead of explicit pthread_mutex_[un]lock ops
    - an "obatched" ostream-like class to let output-streaming <<
      operations accumulate in a stringstream, so concurrent cerr
      output is not interleaved
    - a "parallel_do" function that launches N threads against a shared
      (usually embedded-lock-carrying) work-queue structure

commit efc0173ad84b555e0819a8fa2219ac06acd70326
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Fri Apr 8 20:07:57 2016 -0400

    RHBZ1325363: multithreaded pmNewContext

    While parallelizing pmmgr, it was discovered that the core
    pmNewContext function is a bottleneck when trying to connect to a
    large number of servers.  Prior to this patch, it held the big libpcp
    lock throughout the entire context-creation process, which can last
    10+ seconds (e.g., if a remote pmcd host is unreachable).  That locks
    out many other pmapi operations, and serializes connections to
    multiple hosts.

    Detailed analysis of pmNewContext and its callees showed that it is
    possible to relax holding the big libpcp lock to much shorter time
    periods, and specifically to exclude indefinite-length operations like
    the socket connection to a remote pmcd, and even the analysis of
    archives.  This is partly done by introducing a special
    PM_CONTEXT_INIT c_type placeholder object into the context[] array
    during initialization, and tweaking timing & locking sequences.

    The result is that pmNewContext calls can almost completely overlap
    each other safely.  A new test case (4751, a descendant of 475)
    stress-tests by opening hundreds of various types of contexts at the
    same time, including repeated, unreachable, and
    theoretically-shareable ones.  The new code precludes sharing of
    concurrent connections/archive-control data to the same destinations,
    but leaves non-concurrent sharing behaviour is unmodified.

commit 5a53f9ec0ae3ff5bc253ebe60248d435826319ce
Author: Frank Ch. Eigler <fche@xxxxxxxxxx>
Date:   Fri Apr 8 18:12:27 2016 -0400

    libpcp fetchgroups: match docs & implementation

    The pmFetchGroup() function return value was misdocumented (>0 ok).
    The pmFetchGroupSetMode() function was removed from the exported /
    documented API, so can safely be removed from the implementation..

<Prev in Thread] Current Thread [Next in Thread>
  • PCP Updates: fche: multithreaded libpcp pmNewContext, pmmgr, Dave Brolley <=