[pcp] Multi-Volume Archive + Live Data Playback for PCP Client Tools

Dave Brolley brolley at redhat.com
Wed Oct 1 10:45:50 CDT 2014


Hi,

I've been learning about pmlogger archives, their file formats, the 
related archive management tools and the pcp clients which support the 
'archive' context with the goal of coming up with a design for allowing 
the extraction of PCP metrics across archive boundaries. I first want to 
write down what I think I've learned followed by a couple of ideas for 
how this could be done from the user's point of view as well as from a 
technical point of view.

Of course, I know that others among you have been thinking about this 
and have much more expertise, especially Ken, so please correct me where 
I have it wrong and add your own thoughts, ideas and comments!

The current situation as I understand it:

  * PCP archives are created by pmlogger in distinct volumes due to
    various constraints, such as a maximum file size of 2GB, the desire
    to allow organization of the collected data, the desire to be able
    to manage data retention (i.e. log rotation) and, undoubtedly, for
    other reasons as well.

  * Some multi-volume support exists in the form of archive folios.
    These can be created by mkaf(1) but are also created by some other
    tools, such as pmchart(1) and pmview(1). Archive volumes in a folio
    may be examined using pmafm(1) using its 'replay', 'repeat' or 'run'
    commands. The latter two commands allow for repeated application of
    PCP client tools against one or more archives in the folio.

  * The archive management tool, pmlogextract and indirectly,
    pmlogger_daily and pmlogger_merge, provide the ability to extract
    data from multiple archives and combine that data into a single
    archive volume.

  * Otherwise, PCP client tools are currently restricted to extracting
    metrics from a single archive volume via PM_CONTEXT_ARCHIVE (the -a
    option). A single archive volume and an option time window is
    specified, which is applied against that single archive volume.

What we would like have is for PCP client tools to have the ability to 
easily extract metrics from multiple archive volumes. Ultimately, we 
would also like tail-like following of an active archive volume with 
seamless transition from archived data to live data.

Here are a few ideas for realizing these goals:

*Client/tool interface:*
Currently only a single archive volume may be specified by its base name 
(via PM_CONTEXT_ARCHIVE or -a). We could allow the specification of 
multiple archive specs, each of which could be:

  * an archive volume file base name -- same as now
  * the name of a directory containing a collection of PCP archive volumes
  * wildcarded names which resolve to a list of the above items

For example,

    pminfo -a 20140930.0 -a 201408*.* -a /some/path/archives -a 
/another/path/archive*

PM_CONTEXT_ARCHIVE could be extended to support more than one archive 
volume.

*Extracting multi-volume data:*
For PCP tools, one very simple idea for extracting data from multiple 
existing archive volumes would be to use pmlogextract(1) to consolidate 
the specified volumes into a single temporary one and then to operate 
against the temporary archive. Because pmlogextract(1) supports the -S 
and -T options, a time window which spans archive volumes would 
automatically be supported. I imagine that this is already done manually 
in order to consolidate metrics from multiple sources. This would just 
be a way to automate this process.

If we were to implement this within libpcp, then no changes to the 
client tools would be necessary.  PM_CONTEXT_ARCHIVE could do it under 
the covers, or could use internal logic similar to that used by 
pmlogextract(1) in order to consolidate the specified archive volumes.

*Streaming live data:*
While the above could get us very quickly to multi-volume support 
against existing archive volumes, it may not be helpful in reaching the 
subsequent goals of live-archive tailing and transitioning from archived 
to live data. For these, we need some way of streaming new data as it is 
generated. In order to make the transition from archived to live data, 
we must be able to identify the following:

  * When the archive volume we're reading from is live
  * When it becomes no longer live
  * What the next live archive volume is (if any), otherwise, which pmcd
    is the source of the live data.

We could try to implement conventions within the archive file system for 
providing this information via new metadata or some similar mechanism 
and then write client-side code to handle polling for new data or 
transitioning to a pmcd once the end of a live archive has been reached. 
However, there is already a PCP component which knows about and manages 
all of this information. It is pmlogger(1). pmlogger(1) already knows 
the location of the archives it is creating, which one is the live one, 
whether there will be a new live archive once the current one is ended 
and which pmcd is the source of live data.

One way to make all of this available to a client tool in a seamless way 
would be to allow pmlogger(1) to be a source of metrics in the same way 
that pmcd is, in addition to its logging function. That is, given a 
pmlogger(1) instance, a client/tool could connect in the same way that 
it would to a pmcd instance (call it PM_CONTEXT_LOGGER?). The 
client/tool could then specify a time window. If the time window reaches 
into the past then pmlogger(1) would then access the appropriate archive 
volume(s), as needed to extract the requested metrics up until the 
specified end time. If no end time is given by the client, or if the end 
time is in the future, then pmlogger(1) would automatically transition 
to relaying live data to the client/tool, in addition to logging it, 
once the end of the current live archive has been reached. If logging 
were to be terminated, then pmlogger(1) could continue to relay metrics 
to the client/tool without logging them.

PM_CONTEXT_ARCHIVE would not be obsolete, since there may not be a 
pmlogger(1) instance running and the client may not want archive tailing 
or live data. For those cases an active pmlogger(1) instance would be 
necessary in any case.

*Implementation:*
The retrieval of archived metrics could be done on separate threads 
within pmlogger(1), one for each connecting client. Relaying of tailed 
or live data could be done on the main thread. There would be a list of 
fd's to write the data to, one of which could be the one for the 
currently active archive log file (if any).

One choice to be made would be how to handle the case of a client 
connecting with no start time. This could either mean "extract metrics 
from the beginning of the known logs" or it could mean "live data only". 
I propose having it mean the former and having some special time value 
which means "now" (perhaps there is already one) which could be used as 
a start time to indicate "live data only". Similarly there could be a 
special end time value which means "forever" which could be explicitly 
used instead of omitting an end time.

The above is only a rough sketch of how we could implement 
multi-volume+live metric playback with little impact on the existing tools.

Thoughts, comments, ideas please!
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/pcp/attachments/20141001/a710bb46/attachment-0001.html>


More information about the pcp mailing list