On Fri, 2012-12-14 at 03:03 -0800, Jun Wang wrote:
> Nathan,
>
> Thanks for the great and detailed explanation on the multi-host
> support.
>
> I assume that pmchart ... isn't designed to graph the average/sum/etc
> of those metrics, collected on a multiple host, over time series.
Correct.
> Is there any other way, or workaround, that can be used for PCP to
> graph the average/sum/etc of a metrics across multiple, in 100s for
> example, of hosts?
Derived metrics (see the pmRegisterDerived(3) man page) provide a way of
doing customized aggregation for metrics ... unfortunately this is
limited to metrics from a single source (host or archive).
> Can summary PMDA be used for that?
It can indeed. A fragment of pmie suitable for use with the summary
PMDA and a 4 node cluster is as follows:
hosts = ":node1 :node2 :node3 :node4";
summary.ncpu.sum = sum_host hinv.ncpu $hosts;
summary.ncpu.avg = avg_host hinv.ncpu $hosts;
summary.ncpu.max = max_host hinv.ncpu $hosts;
Now, this is untested (to my knowledge) for hundreds of nodes, so it
would be interesting to see if it works. One issue here may be that
pmie cannot evaluate sum_host() (or any of the other *_host()
aggregates) if pmcd cannot be contacted on _any_ of the listed hosts.
We don't have anything that I can think of that will handle the
"average" of a metric across N available hosts from a pool of M possible
hosts (for N <= M).
> I noticed a summary PMDA process, together with a pmie daemon, that
> can do expression of multiple metrics.
Yep.
> At first, for a single collector host, this means the following steps,
> all on the collector host except for the last step, based on my
> understanding.
> * metrics foo.X is collected by PMDA foo;
> * metrics bar.Y is collected by PMDA bar;
> * the pmie daemon talks to PMCD and get metrics foo.X and metrics
> bar.Y via the corresponding PMDAs, then use both of them to calculate
> metrics summay.Z.
> * the summary PMDA talks to pmie daemon periodically and fetch
> summary.Z;
> * the monitoring tools retrieve summary.Z from PMCD, and summary
> PMDA, via PMAPI and graph it.
All sounds pretty correct to me. The summary PMDA does not have to be
on the same host as the one providing the foo.X and bar.Y metrics ...
although this is commonly the case for the single host deployment, it
won't/can't be the case for the hundreds of nodes case.
> Does PMCD cache the metrics foo.X and bar.Y? or does PMCD run
> completely stateless without caching any metrics? I think that PMCD
> shouldn't cache any metrics. That, however, means that time-wise the
> summary metrics is always running a step, or a sampling period, behind
> the foo.X and bar.Y as it uses the values of X and Y at the last time
> when the pmie talks to PMCD.
There is no data cacheing in pmcd ... that is a deliberate and important
design decision. And indeed there is very little client state
maintained by pmcd.
The summary PMDA is operating on a (usually) constant timing loop,
evaluating _all_ the expressions it has been asked to support every N
seconds.
When pmcd asks the summary PMDA for metric values, the summay PMDA
returns the most recently evaluated expression results.
So there is indeed a lag ... but if you're talking about hundreds of
nodes then any lag is unlikely to be an issue in practice ... this
system would have the response time of a Titanic-sized ship, statistical
dampening across a large data set will mask individual changes and
Heisenberg is always confusing the issue.
> Secondly, can summary PMDA be used to do sum/avg across multiple
> hosts? Say we run a pmie process on a server and have it talk to PMCDs
> on multiple hosts to collect h1:foo.X, h2:foo.X, h3:foo.X and ... and
> calculate the value of the expression. Then we also run summary PMDA
> and a PMCD on this server and have the summary PMDA talks to the pmie
> process to collect the the of summary.FOO-X metrics. Now we can have
> any PCP monitor tools to, either remotely or locally, graph or analyze
> the sum/avg/etc of foo.X across multiple hosts.
Yes this will all work, subject to my earlier comments on moving in
uncharted waters and the "one host down" Achilles heel.
One thing to consider is the timing loop used by the summary PMDA ...
making this short increases the load on all nodes, especially where the
summary PMDA is running. Making it longer decreases the load, but
increases the lag between changes happening and being exported in the
summary metrics. Consider you're likely sampling interval from pmchart
and pmlogger and friends and tune the summary PMDA sampling interval to
be about the same.
> Does this make sense?
Yep and good luck ... please let us know how it turns out.
|