pcp
[Top] [All Lists]

Re: [pcp] pmclusterd versus other solutions

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] pmclusterd versus other solutions
From: Mark Goodwin <mgoodwin@xxxxxxxxxx>
Date: Tue, 6 Sep 2016 19:24:53 +1000
Cc: Jeff Hanson <jhanson@xxxxxxx>, PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <154139732.5735882.1473119872822.JavaMail.zimbra@xxxxxxxxxx>
References: <3b551b84-ff74-5b9c-5854-3bdcba1c1212@xxxxxxx> <CAFmffyUkbMi1g3XScEE-XjEHBmdbd5WvHZ6UpGKN_eZtG6pm=g@xxxxxxxxxxxxxx> <49c5d203-5378-5cbb-7092-7ed23035af56@xxxxxxx> <154139732.5735882.1473119872822.JavaMail.zimbra@xxxxxxxxxx>
It was a while ago, but IIRC there is no serial polling; the cluster
nodes register themselves with the PMDA on the head node, and then
periodically send a pmResult. The aggregating PMDA running on the head
node has a modified main loop with a select mask for the pmcd request
file descriptor as well as for every registered cluster node. The pmcd
connection is given priority if it's ready, and the cluster nodes are
processed based on who's ready to send data in ascending fd order. I
guess that might explain missing metrics for some cluster nodes (if
they stop sending for whatever reason).

Jeff, since the code is already GPL, perhaps post the source somewhere
and we can check it out.

Cheers
-- Mark


On Tue, Sep 6, 2016 at 9:57 AM, Nathan Scott <nathans@xxxxxxxxxx> wrote:
> Hi Jeff,
>
> ----- Original Message -----
>> > This is the daemon that aggregates indoms for per-cluster-node CPU
>> > data on the head node, so
>> [...]
>> See the emails from 11 August on Debugging sigpipe in pmda.
>>
>> But the real problem is that although pmclusterd exposes some 100 metrics or
>> so but only 20 of them are actually able to be fetched.
>>
>
> I expect the problem will be due to latency in the polling of remote cluster
> nodes, which IIRC is done in a serial fashion (one node after the other IOW)
> so one slow-reponding node will affect timeliness of all values?
>
> A design which did the remote fetching in parallel would be better suited,
> if so.  You could go with a model where multiple processes fetch then write
> metrics using MMV(5) format - see also mmv_stats_init(3) - so a new PMDA may
> not be needed at all.
>
> cheers.
>
> --
> Nathan

<Prev in Thread] Current Thread [Next in Thread>