Hi Alec,
----- Original Message -----
> Hi,
>
> We're using the elasticsearch PMDA to grab metrics on a small cluster.
> By default, the PMDA grabs metrics for every node, even when PCP is only
> running on one node. This is very slow; the PMDA has an internal timeout
> of 1 second, but a request to get metrics from all nodes usually takes
> 2-3 seconds on our cluster.
>
> My question is: is it default for PMDAs to grab metrics for as many
> nodes as possible, or only on the node that PCP is running on?
The latter is definitely preferable and not only because of the additional
latency, but also things like the hostname in PCP archives being incorrect
and so on. Ideally we would have separate PCP collectors running on each
system exporting just local metrics, and use PCP protocols (via the client
tools) to address the need to extract metrics from multiple systems.
> There is
> an easy way to patch the PMDA to fix our problem, but it would break the
> behavior the PMDA has historically had. What is the best way to go about
> fixing this?
Can you describe this change a bit further? There are definitely ways we
can tackle this, such as using different-named metrics for the local node
case, renaming the "elasticsearch.nodes" metrics & indom (has good error
handling characteristics for anyone using the old metrics).
cheers.
--
Nathan
|