pcp
[Top] [All Lists]

Re: [pcp] pmmgr pmlogger default behaviour

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>, "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: [pcp] pmmgr pmlogger default behaviour
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Thu, 30 Jan 2014 17:33:50 -0500 (EST)
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <06a301cf1df4$d18984b0$749c8e10$@internode.on.net>
References: <2108905700.15892281.1391033827018.JavaMail.root@xxxxxxxxxx> <1583484726.15908616.1391037005341.JavaMail.root@xxxxxxxxxx> <06a301cf1df4$d18984b0$749c8e10$@internode.on.net>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: AQKxzoa+dzclHQ0B5M2jZlF3/e143pjYRkqQFH+EpeE=
Thread-topic: pmmgr pmlogger default behaviour

----- Original Message -----
> G'day Nathan and Frank.
> 
> I've read Nathan's original mail and Frank's response, and wish to add my 2
> cents worth ...
> 
> I don't like the one big log model.  Ignoring the corruption issue, I think

(noo, don't ignore it ... its super important! ;)

> [...]
> 3. When did the load average start going over 100?  Or when in the last
> month did the load average go over 100? - here we want to ITERATE over a set
> of archives (not process a CONCATENATED archive) and we could do a much
> better job of providing a tool that can find and list all of the archives of
> interest, but the classical approach provides all the data needed in
> appropriate bundles.

I question this iterate-over-several vs serial-smash-through-one angle,
like Frank - I think it depends on the size of the data set as to which
is most optimal.  For a small dataset, serial-smash-through of a single
(possibly concat) works best for me - but as one moves to data sizes in
the order of many gigabytes, iteration (possibly involving archives on
multiple hosts) caters better for parallelism.  In turn that makes some
types of analysis feasible that aren't otherwise.

As one looks to scale up PCP archives to huge data sets, to me the best
approach is small equal sized "tiles" of data (of, say, one days worth)
that lend themselves to distribution across hosts (of different sizes,
wrt storage, memory, cpu) each working on a subset of the data.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>