Hi,
I’m currently collecting a bunch of metrics from several servers using PCP.
It has been so far a pretty convincing experience from a monitoring / collection of metrics point of view.
However, on the UI front, I’m having a rather frustrating experience.
When using graphite (either plain graphite or grafana) and I’m facing 2 issues:
1.
Metric name encoding: our servers are using ‘-‘ in their name, which do not play well with the metric name encoding.
2.
Performance of pmwebd graphite api: the api is fairly slow, namely to browse the instance domains in archives. (I’ll send a separate email to discuss more specifically the perf issue.)
Moreover pcp metric definition are more comprehensive than graphite can cope with, and I’m feeling like we are not making the most out of the collected metrics.
What solutions are used by seasoned PCP users to visualize realtime/historical metrics, for
1.
dashboards used on a daily basis,
2.
deep-dive solution to investigate / correlate events when a given production issue arises
Thanks in advance for your feedback,
Aurelien Gonnay