pcp
[Top] [All Lists]

RE: [pcp] pmmgr pmlogger default behaviour

To: "'Nathan Scott'" <nathans@xxxxxxxxxx>, "'Frank Ch. Eigler'" <fche@xxxxxxxxxx>
Subject: RE: [pcp] pmmgr pmlogger default behaviour
From: "Ken McDonell" <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 31 Jan 2014 06:52:32 +1100
Cc: "'pcp developers'" <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1583484726.15908616.1391037005341.JavaMail.root@xxxxxxxxxx>
References: <2108905700.15892281.1391033827018.JavaMail.root@xxxxxxxxxx> <1583484726.15908616.1391037005341.JavaMail.root@xxxxxxxxxx>
Thread-index: AQKxzoa+dzclHQ0B5M2jZlF3/e143pjYRkqQ
G'day Nathan and Frank.

I've read Nathan's original mail and Frank's response, and wish to add my 2
cents worth ...

I don't like the one big log model.  Ignoring the corruption issue, I think
the archive (especially with rsync) considerations favour only
updating/creating files for today's data each day.

The pmlogrewrite issue is real and needs to be elevated from the TODO to the
DONE list.

Also, the long-term semantic issues associated with mark records, counter
wraps, system reboots, and PMDA restarts are all made worse in  the one big
log model.

I think we need to look at the use cases for PCP archives.

1. What the hell happened today? - the classical approach works fine
(although we could do a better job of helping Mary the Analyst find the
archives for  "today").

2. What the hell happened on 17 Jan 2014 (and was it the same as today)? -
again the classical approach works because the dates are known

3. When did the load average start going over 100?  Or when in the last
month did the load average go over 100? - here we want to ITERATE over a set
of archives (not process a CONCATENATED archive) and we could do a much
better job of providing a tool that can find and list all of the archives of
interest, but the classical approach provides all the data needed in
appropriate bundles.

4. Show me trends over time, capacity planning graphs, etc. - this is the
place where we've historically failed and this has been on the PCP TODO list
for a decade (which suggests it may not be a simple problem to solve).
There are 2 key parts to addressing this need (a) concatenating the data
into a single set, and (b) temporal data reduction (the daily sampling
rates, typically of the order of tens of seconds, need to be extended to
sampling rates of the order of tens of minutes or hours).  pmlogreduce was
the tool that was intended to solve (a) and (b) and the associated data
semantic issues (most of which involve turning counters into rates in the
archive to avoid reboots, resets, mark records, counter wraps, etc.).
Unfortunately the current incarnation of pmlogreduce is neither robust nor
semantically correct all the time.

If there are other generic use cases, I'd like to hear about them.

But with this set, I suggest that the status quo is close to the mark, and
we should focus on the iteration tool for 3. and mount a serious attack on
fixing pmlogreduce for 4.

<Prev in Thread] Current Thread [Next in Thread>