Hi there,
----- Original Message -----
> Hello,
>
> I have a PCP setup where pmcd runs on multiple hosts and there is a server
> where pmlogger collects metrics from these hosts.
>
> I understand how I can use pmie to analyze logs or live data from one or a
> few hosts. What are my options if I have a large number of hosts, say 1,000
> or 10,000 hosts. What kind of setup would I need to do so I detect if any of
> my 1,000 or 10,000 hosts has, for example, "high disk i/o"?
Deployments I have been involved in with pmie use at this kind of scale tend
to have a federated architecture - usually with a central pmie server within
each data centre (or each rack perhaps), feeding alarms/notifications to a
global Nagios server (or several Nagios servers). Substitute Nagios for the
monitoring system of your preference.
Analyzing logs from that many machines is a difficult problem - I've seen a
fairly good data warehousing solution implemented, where logs from multiple
data centres are collated in a batch fashion, and reporting/analysis is done
from a pre-populated, pre-computed warehouse cube. But this is not something
we have historically attempted to tackle directly in PCP - would make for an
interesting project though!
cheers.
--
Nathan
|