pcp
[Top] [All Lists]

Re: [pcp] pmclusterd versus other solutions

To: Mark Goodwin <mgoodwin@xxxxxxxxxx>, Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] pmclusterd versus other solutions
From: Jeff Hanson <jhanson@xxxxxxx>
Date: Thu, 8 Sep 2016 13:00:17 -0400
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <CAFmffyUKP0=JdkUZR_RO1yXGO8MAnW_iBhLG6Qa0E4dH2eywzA@xxxxxxxxxxxxxx>
References: <3b551b84-ff74-5b9c-5854-3bdcba1c1212@xxxxxxx> <CAFmffyUkbMi1g3XScEE-XjEHBmdbd5WvHZ6UpGKN_eZtG6pm=g@xxxxxxxxxxxxxx> <49c5d203-5378-5cbb-7092-7ed23035af56@xxxxxxx> <154139732.5735882.1473119872822.JavaMail.zimbra@xxxxxxxxxx> <CAFmffyUKP0=JdkUZR_RO1yXGO8MAnW_iBhLG6Qa0E4dH2eywzA@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0
On 09/06/2016 05:24 AM, Mark Goodwin wrote:
It was a while ago, but IIRC there is no serial polling; the cluster
nodes register themselves with the PMDA on the head node, and then
periodically send a pmResult. The aggregating PMDA running on the head
node has a modified main loop with a select mask for the pmcd request
file descriptor as well as for every registered cluster node. The pmcd
connection is given priority if it's ready, and the cluster nodes are
processed based on who's ready to send data in ascending fd order. I
guess that might explain missing metrics for some cluster nodes (if
they stop sending for whatever reason).

Jeff, since the code is already GPL, perhaps post the source somewhere
and we can check it out.


Thanks - 
ftp://shell.sgi.com/collect/jhanson/pcp-pmda-cluster-1.0.4-2sgi714r4.rhel6.src.rpm

I checked that I could build (because the exSGI people on this list KNOW what a 
minefield
SGI's build system is) the source rpm and it's fine on RHEL 7.2 with standard
PCP provided (not RHEL provided) pcp rpms.


Cheers
-- Mark


On Tue, Sep 6, 2016 at 9:57 AM, Nathan Scott <nathans@xxxxxxxxxx> wrote:
Hi Jeff,

----- Original Message -----
This is the daemon that aggregates indoms for per-cluster-node CPU
data on the head node, so
[...]
See the emails from 11 August on Debugging sigpipe in pmda.

But the real problem is that although pmclusterd exposes some 100 metrics or
so but only 20 of them are actually able to be fetched.


I expect the problem will be due to latency in the polling of remote cluster
nodes, which IIRC is done in a serial fashion (one node after the other IOW)
so one slow-reponding node will affect timeliness of all values?

A design which did the remote fetching in parallel would be better suited,
if so.  You could go with a model where multiple processes fetch then write
metrics using MMV(5) format - see also mmv_stats_init(3) - so a new PMDA may
not be needed at all.

cheers.

--
Nathan


--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer

You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Peart

<Prev in Thread] Current Thread [Next in Thread>