pcp
[Top] [All Lists]

Re: [pcp] braindump on unified-context / live-logging

To: Max Matveev <makc@xxxxxxxxx>
Subject: Re: [pcp] braindump on unified-context / live-logging
From: Greg Banks <gbanks@xxxxxxx>
Date: Mon, 13 Jan 2014 14:19:26 -0800
Cc: "Frank Ch. Eigler" <fche@xxxxxxxxxx>, pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <0a923e$520gar@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <20140108013956.GG15448@xxxxxxxxxx> <21198.38090.179929.552608@xxxxxxxxxxxx> <20140110190525.GA28062@xxxxxxxxxx> <0a923e$520gar@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6
On 13/01/14 13:19, Max Matveev wrote:
On Fri, 10 Jan 2014 14:05:25 -0500, Frank Ch Eigler wrote:

  fche> 1.1) pmlogger needs to learn to write its output with what IIRC kenj
  fche> has referred to as "semantic units", ie., proper use sequencing of
  fche> write(2), fdatasync(), to put interdependent data on disk correctly.
  >>
  >> This was the biggest bugbear of NAS Manager (for those who remember)
  >> which tried to provide historical and live data using archives as main
  >> source of information. [...]

  fche> (A little more of the history would be great.)

NAS Manager had to provide "historical" data for IO performance
aggregated over time invervals, e.g. number of IO requests for each
hour during last 24 hours. And it had to be "true averages", not
decaying averages. And it had to be displayed on a web page which
meant single-shot requests (AJAX wasn't an option for various
non-technical reasons).

You're conflating two different data paths, see below.


We wrote a daemon which was generating graphs (static images)
based on the information which has been scrapped from the PCP
archives. It was working OK for historical data but doing the
equivalent of "tail -f" with libpcp and growing archives didn't work.
After few attempts the idea has been canned and we switched to a PMDA
which provided historical data based on the time intervals encoded
in instance names. The PMDA would pre-load the data from historical
archives" and then switch to polling pmcd for the data it needed (it
was known as HUTA mode - Head Up The ...).

The whole thing may even exists assuming NAS Manager exists (hi, gnb).

Yes, this thing still exists and is shipping today (although NAS Manager itself is in the dustbin of history). While I designed and wrote the thing, I was never happy with any of the iterations of the architecture and I wouldn't recommend to anyone that they copy it. Some of the problems were:

* it was both a client of pmcd and a PMDA, which led to interesting deadlocks with the single-threaded pmcd

* configuration (time periods, and which metrics were available in each) was entirely static, in a config file, which was ok for specific use case but not very general

 * it was never good at handling instance domains which changed frequently

* because of the silly requirement to have rolling averages rather than decaying ones, memory usage was often extreme

* the only way to query this thing was via a normal pmFetch() which meant we had to encode all the parameters into the namespace and instances, which exploded the visible namespace

* another effect was that the namespace was more dynamic than pmcd expected, which resulted in the code having to do a bulk replace of a PMNS subtree, including sending a signal to pmcd

If I were designing something like this again, I would look seriously at adding a "time machine" feature to pmcd, with a configurable timed fetch loop and new PDUs and new APIs to explore the time dimension for the existing namespace.

However, this thing only ever calculated averages over one of a small number of predefined periods where each period ends at "now". It was used to provide textual numbers and indicators on graphs. The "number of IO operations for each hour over the last 24 hours" Max refers to was actually calculated separately by trawling the archives directly, even if that information was already in memory.

--
Greg.

<Prev in Thread] Current Thread [Next in Thread>