pcp
[Top] [All Lists]

Re: [pcp] Use pmie to monitor a large number of hosts

To: Rares Vernica <rvernica@xxxxxxxxx>
Subject: Re: [pcp] Use pmie to monitor a large number of hosts
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Wed, 4 Nov 2015 20:51:49 -0500 (EST)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <CALQ9KxDjFhj7WGHHCpgJ-_QOz+o9yMbVWngv3Gu9uG-K+1GzJQ@xxxxxxxxxxxxxx>
References: <CALQ9KxDjFhj7WGHHCpgJ-_QOz+o9yMbVWngv3Gu9uG-K+1GzJQ@xxxxxxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: skfzyZm870onyzqsL+ywn/Mtm6Cjng==
Thread-topic: Use pmie to monitor a large number of hosts
Hi there,

----- Original Message -----
> Hello,
> 
> I have a PCP setup where pmcd runs on multiple hosts and there is a server
> where pmlogger collects metrics from these hosts.
> 
> I understand how I can use pmie to analyze logs or live data from one or a
> few hosts. What are my options if I have a large number of hosts, say 1,000
> or 10,000 hosts. What kind of setup would I need to do so I detect if any of
> my 1,000 or 10,000 hosts has, for example, "high disk i/o"?

Deployments I have been involved in with pmie use at this kind of scale tend
to have a federated architecture - usually with a central pmie server within
each data centre (or each rack perhaps), feeding alarms/notifications to a
global Nagios server (or several Nagios servers).   Substitute Nagios for the
monitoring system of your preference.

Analyzing logs from that many machines is a difficult problem - I've seen a
fairly good data warehousing solution implemented, where logs from multiple
data centres are collated in a batch fashion, and reporting/analysis is done
from a pre-populated, pre-computed warehouse cube.  But this is not something
we have historically attempted to tackle directly in PCP - would make for an
interesting project though!

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>