pcp
[Top] [All Lists]

Re: [question] PCP UI FrontEnd

To: Aurelien Gonnay <aurelien.gonnay@xxxxxxxxx>
Subject: Re: [question] PCP UI FrontEnd
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Mon, 27 Jul 2015 17:30:34 -0400
Cc: "pcp@xxxxxxxxxxx" <pcp@xxxxxxxxxxx>, TED-DEV-CSP <TED-DEV-CSP@xxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <f8fde1169e264460bc2e250d2c9d7df0@xxxxxxxxxxxxxxxxxxxxxxxxx> (Aurelien Gonnay's message of "Mon, 27 Jul 2015 12:49:26 +0000")
References: <f8fde1169e264460bc2e250d2c9d7df0@xxxxxxxxxxxxxxxxxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
aurelien.gonnay wrote:

> [...]
> 1.       Metric name encoding: our servers are using ?-? in their name, which
> do not play well with the metric name encoding.

Hyphens are used as an escape code for generic punctuation that can be
present in pcp file names / metrics / instance names, but not in
graphite name components.  So host names like "foo-bar" will be
represented with something like "foo-2E-bar".  It's a necessary evil,
considering the need to have a bijective mapping between the two
namespaces.

> [...]
> Moreover pcp metric definition are more comprehensive than graphite can cope
> with, and I'm feeling like we are not making the most out of the collected
> metrics.

Sorry, I'm not sure what you mean.  Maybe just that the graphite
information is lossy, like no events / strings / metadata being
propagated from PCP?  That's true, but somewhat implicit in the use of
graphite web interfaces.  (We could do more with graphite "events"
though.)

> What solutions are used by seasoned PCP users to visualize realtime/historical
> metrics, for
>
> 1.       dashboards used on a daily basis,

Within the pcp web-ui space, one way is to assemble dashboards in
interactively in grafana, save them to .json files, then arrange to
serve those from pmwebd (see grafana/app/dashboards/FOO.json).  Heck,
we'd be happy to include yours in the pcp-webjs packages if they are
applicable generally.


> 2.       deep-dive solution to investigate / correlate events when a given
> production issue arises

That's such a big area.  I think the clinching technical complication
there is PCP's limitations in logging being governed by a static
configuration: a set of metrics and a fixed polling interval.  When a
production issue arises, someone would have to notice, and reconfigure
a pmlogger instance to do more logging, and/or eyeball extra live data
interactively.  It would be better if pmlogger could react dynamically.


- FChE

<Prev in Thread] Current Thread [Next in Thread>