aurelien.gonnay wrote:
> [...]
> 1. Metric name encoding: our servers are using ?-? in their name, which
> do not play well with the metric name encoding.
Hyphens are used as an escape code for generic punctuation that can be
present in pcp file names / metrics / instance names, but not in
graphite name components. So host names like "foo-bar" will be
represented with something like "foo-2E-bar". It's a necessary evil,
considering the need to have a bijective mapping between the two
namespaces.
> [...]
> Moreover pcp metric definition are more comprehensive than graphite can cope
> with, and I'm feeling like we are not making the most out of the collected
> metrics.
Sorry, I'm not sure what you mean. Maybe just that the graphite
information is lossy, like no events / strings / metadata being
propagated from PCP? That's true, but somewhat implicit in the use of
graphite web interfaces. (We could do more with graphite "events"
though.)
> What solutions are used by seasoned PCP users to visualize realtime/historical
> metrics, for
>
> 1. dashboards used on a daily basis,
Within the pcp web-ui space, one way is to assemble dashboards in
interactively in grafana, save them to .json files, then arrange to
serve those from pmwebd (see grafana/app/dashboards/FOO.json). Heck,
we'd be happy to include yours in the pcp-webjs packages if they are
applicable generally.
> 2. deep-dive solution to investigate / correlate events when a given
> production issue arises
That's such a big area. I think the clinching technical complication
there is PCP's limitations in logging being governed by a static
configuration: a set of metrics and a fixed polling interval. When a
production issue arises, someone would have to notice, and reconfigure
a pmlogger instance to do more logging, and/or eyeball extra live data
interactively. It would be better if pmlogger could react dynamically.
- FChE
|