pcp
[Top] [All Lists]

Re: [pcp] Dynamic PMDA cluster - proposal

To: Corneliu Boac <cboac@xxxxxxx>
Subject: Re: [pcp] Dynamic PMDA cluster - proposal
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Mon, 07 Jun 2010 09:27:42 +1000
Cc: pcp@xxxxxxxxxxx
In-reply-to: <4C07DC70.2030700@xxxxxxx>
References: <4C07DC70.2030700@xxxxxxx>
Reply-to: kenj@xxxxxxxxxxxxxxxx
Corneliu,

Let me start by saying the current cluster pmda was written after I left
sgi so my comments are from a position of relative ignorance ...

Comments are in situ below.

On Thu, 2010-06-03 at 11:46 -0500, Corneliu Boac wrote:
> Hello PCP community,
> 
> Customers have been requested us to make the pmda cluster more dynamic:
> 
> 1) Right now we use a static configuration file and the compute nodes
> always push all metrics that are in that file.
> 
> 2) The compute nodes push the metrics to the cluster head node every two
>  seconds even if no client is requesting information.
> 
> After looking at the code and some brainstorming here at SGI,  I wrote
> the following proposal that I need to discuss with you. I value a lot
> your feed-back so I need to ask you the favor to analyze it and provide
> me your thoughts.
> 
> Thank you,
> Cornel
> 
> -----------------------------------------------------
> 
> Overview
> 
> The Cluster PMDA has two components: pmdacluster that runs on the
> cluster head node (the PMDA) and pmclusterd that runs on compute nodes
> (provides the data). The Cluster PMDA configuration files contains the
> list of all metrics that are required for pmclusterd to retrieve every
> two seconds.
> 
> PROPOSAL
> 
> * The PMDA will load the list of supported metrics including the update
> interval in ms for each one of the metrics. 2 seconds can be used as a
> default value when a specific refresh interval is not defined. Some
> metrics will not change their values (e.g. memory size and other metrics
> that are related to hardware). Some other may change very often (e.g.
> network traffic counters).

This seems sensible and is consistent with the way other tools like pmie
and pmlogger handle metrics with different underlying rates of change.

Is the intention that the configuration file could be updated
dynamically once the PMDA is running (like pmlogger), or the
configuration remains unchanged for the life of the PMDA (like pmie)?

> * In general the protocol between pmdacluster and pmclusterd will remain
> the same with the exception that metrics will be pushed only when
> requested based on their refresh interval (not every 2 seconds as right
> now).
> 
> * First time a pm_fetch() call is made (from pmcd), the pmdacluster will
> retrieve the values from pmclusterd and cache them. The time stamp of
> the request will be saved in the cache together with the values for each
> metrics.

How is the latency of this initial fetch handled? pmdacluster presumably
cannot afford to wait for all the pmclusterds to push data?  Does this
use the PM_ERR_PMDANOTREADY/PM_ERR_PMDAREADY handshake protocol between
the pmda and pmcd?

How do the pmclusterds get the configuration file information?

> * When the next pm_fetch() comes, the pmdacluster will check the cache
> and provide the data from there if the data has not expired (saved time
> stamp + refresh interval < current time stamp). If the data has expired
> for one or more metrics, it will retrieve new data for those metrics and
> save the new time stamp.

OK I'm missing something here ... I thought the model was that the
pmclusterds _pushed_ new data to pmdacluster ... if this is the case,
how can pmdacluster retrieve (i.e. _pull_) new data?

> * The pmdacluster will start requesting periodically new data for the
> metrics for which the time between the two pm_fetch() calls is less or
> equal than twice the value of the refresh interval so it always has data
> available when the next pm_fetch() call will be made.

Again, I don't understand the push vs pull semantics here.

But more critically, is pmdacluster maintaining state for every metric
(or metric-instance pair?) to measure the incoming request intervals?
And if so, is this state per pmcd client or state across all pmcd
clients (I think it has to be the latter, because by the time
pmdacluster gets the request from pmcd the identity of the pmcd client
is not visible).

If 2 clients of pmcd are reqesting the cluster.foo metric with an
interval of 1 sec, and the refresh interval for cluster.foo is 0.5 sec
then requests (from different clients) could well arrive less than 0.25
sec apart which would seem to trigger the "periodic request" you mention
above, but for no apparent advantage.

Perhaps if you could complete this table I'd understand better ...

A and B are 2 clients of pmcd requesting cluster.foo at 1 sec intervals.
cluster.foo is the sum of the metric X from each node in the cluster.
I'm using a 2 node cluster to keep it simple.

time    A       B       pmdacluster     X @ node 0      X @ node 1
 0
 0.1    ?               ?               1               10
 0.3            ?       ?               3               30
 1                      ?               10              100
 1.1    ?               ?               11              110
 1.3            ?       ?               13              130
 2                      ?               20              200
                
> * If no pm_fetch() comes for a metric after three refresh interval
> expire, the metric will not be requested anymore.

So "not requested" suggests some feedback from the pmda to the daemon
which may help explain my confusion over push/pull.

> 
> The following are some assumptions, ideas, questions/answers that I
> copied from some emails that I exchanged with some of my colleagues at
> SGI.
> 
> 1) The 2 second default will be tunable.
> 
> 2) Pushes for different metrics will be grouped.

Based on refresh interval?

> 3) We will define the refresh interval for each metric meaning that we
> will specify how long a value can be returned as it is (how long we
> think the value stays fresh). If we get the second request after the
> value has spoiled we will start asking for values so we have a fresh
> value when the third request comes. We will stop collecting data if more
> than twice the time between first request and second request has passed
> and a third request has not come yet. We can make this configurable.

I don't see how the "start asking for values" is really going to
help ... if the push for metric X is defined to be every 2 seconds,
thenand  it will be every 2 seconds and the number of requests that
might arrive from clients between refreshes is almost immaterial.

There is the pathological case of the client's sample interval and the
daemon's refresh interval being the same and the client request arriving
right on the boundary of the value being marked stale ... but short of
refreshing more frequently (which this proposal does not seem to
suggest), this boundary condition is unavoidable in the worst (and
unlikely) case.

The refresh interval and pushing applies to all the instances of a
particular metric?

> 4) I will monitor all requests (possible from multiple clients) and
> refresh the data to accommodate the slowest pulling client.

Sorry, I don't understand this at all ... "you" (being the pmda) cannot
see the pmcd clients (as per my comment above), so who/what are these
clients?

In what sense is a client "slowest"?

> 5) To not disrupt PMCD, pmdacluster will return "no value available"
> error the first time we are queried if it takes too much time to get
> the metrics from pmclusterd (e.g. longer than 5 sec).

Really this should be PM_ERR_PMDANOTREADY/PM_ERR_PMDAREADY.

> 6) The API, and some tools, allow querying individual instances. We
> still need to get instance domain info from each client when they
> connect.  This volume of initial setup traffic could be an argument for
> limiting intentional client disconnects, which is currently used as a
> method to change the list of metrics that should be pushed.

Which clients are these?

Hope this helps, rather than confuses, the issue.

<Prev in Thread] Current Thread [Next in Thread>