pcp
[Top] [All Lists]

Re: [pcp] pmclusterd versus other solutions

To: Mark Goodwin <mgoodwin@xxxxxxxxxx>
Subject: Re: [pcp] pmclusterd versus other solutions
From: Jeff Hanson <jhanson@xxxxxxx>
Date: Thu, 1 Sep 2016 07:33:15 -0400
Cc: PCP <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <CAFmffyUkbMi1g3XScEE-XjEHBmdbd5WvHZ6UpGKN_eZtG6pm=g@xxxxxxxxxxxxxx>
References: <3b551b84-ff74-5b9c-5854-3bdcba1c1212@xxxxxxx> <CAFmffyUkbMi1g3XScEE-XjEHBmdbd5WvHZ6UpGKN_eZtG6pm=g@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2
On 09/01/2016 01:59 AM, Mark Goodwin wrote:
Hi Jeff, I don't think we ever open-sourced pmclusterd since it was
(at the time) SGI ICE specific,
so it's unlikely anyone outside SGI will know much about it.


It is open sourced by license (GPL).  If anyone would like to see the source
code I can make this available.

This is the daemon that aggregates indoms for per-cluster-node CPU
data on the head node, so
the client tools just monitor the head node, right? If that's the tool
framework you're referring to,
I always thought it was a bit of an abomination of the indom concept
(even though I wrote it!),
but designed it that way to be more scalable than monitoring every
cluster node individually.
WHat issues are you running in to?


See the emails from 11 August on Debugging sigpipe in pmda.

But the real problem is that although pmclusterd exposes some 100 metrics or
so but only 20 of them are actually able to be fetched.

Example (there is a usually single sample gap while the pmclusterd on the 
cluster
node wakes up to respond)

cluster.hinv.ncpu

metric:    cluster.hinv.ncpu
host:      r1lead
semantics: discrete instantaneous value
units:     none
samples:   4
interval:  1.00 sec

pmval: pmFetch: Missing metric value(s)

     r1i1n0      r1i0n8      r1i0n0      r1i1n8
         24          24          24          24
         24          24          24          24
         24          24          24          24

Versus others which produce
cluster.hinv.cpu.model

pmval: pmGetInDom(65.0): Unknown or illegal instance domain identifier

And regardless of the issues here I was interested in what other people do with
pcp to monitor cluster nodes.

Regards
-- Mark


On Thu, Sep 1, 2016 at 2:43 AM, Jeff Hanson <jhanson@xxxxxxx> wrote:
As we (SGI) explore what to do about the scaling issues with pmclusterd
as it is currently written I am exploring other options.  For cluster
configurations are people generally running pmcd locally on the cluster
nodes
and logging to the node?  Running pmcd locally on the cluster node with
another system as the logger?  Other thoughts?

Thanks.
--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer

You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Peart

_______________________________________________
pcp mailing list
pcp@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/pcp


--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Senior Technical Support Engineer

You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Peart

<Prev in Thread] Current Thread [Next in Thread>