pcp
[Top] [All Lists]

Multi-Volume Archive + Live Data Playback for PCP Client Tools

To: PCP Mailing List <pcp@xxxxxxxxxxx>
Subject: Multi-Volume Archive + Live Data Playback for PCP Client Tools
From: Dave Brolley <brolley@xxxxxxxxxx>
Date: Wed, 01 Oct 2014 11:45:50 -0400
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0
Hi,

I've been learning about pmlogger archives, their file formats, the related archive management tools and the pcp clients which support the 'archive' context with the goal of coming up with a design for allowing the extraction of PCP metrics across archive boundaries. I first want to write down what I think I've learned followed by a couple of ideas for how this could be done from the user's point of view as well as from a technical point of view.

Of course, I know that others among you have been thinking about this and have much more expertise, especially Ken, so please correct me where I have it wrong and add your own thoughts, ideas and comments!

The current situation as I understand it:
  • PCP archives are created by pmlogger in distinct volumes due to various constraints, such as a maximum file size of 2GB, the desire to allow organization of the collected data, the desire to be able to manage data retention (i.e. log rotation) and, undoubtedly, for other reasons as well.

  • Some multi-volume support exists in the form of archive folios. These can be created by mkaf(1) but are also created by some other tools, such as pmchart(1) and pmview(1). Archive volumes in a folio may be examined using pmafm(1) using its 'replay', 'repeat' or 'run' commands. The latter two commands allow for repeated application of PCP client tools against one or more archives in the folio.

  • The archive management tool, pmlogextract and indirectly, pmlogger_daily and pmlogger_merge, provide the ability to extract data from multiple archives and combine that data into a single archive volume.

  • Otherwise, PCP client tools are currently restricted to extracting metrics from a single archive volume via PM_CONTEXT_ARCHIVE (the -a option). A single archive volume and an option time window is specified, which is applied against that single archive volume.
What we would like have is for PCP client tools to have the ability to easily extract metrics from multiple archive volumes. Ultimately, we would also like tail-like following of an active archive volume with seamless transition from archived data to live data.

Here are a few ideas for realizing these goals:

Client/tool interface:
Currently only a single archive volume may be specified by its base name (via PM_CONTEXT_ARCHIVE or -a). We could allow the specification of multiple archive specs, each of which could be:
  • an archive volume file base name -- same as now
  • the name of a directory containing a collection of PCP archive volumes
  • wildcarded names which resolve to a list of the above items
For example,

   pminfo -a 20140930.0 -a 201408*.* -a /some/path/archives -a /another/path/archive*

PM_CONTEXT_ARCHIVE could be extended to support more than one archive volume.

Extracting multi-volume data:
For PCP tools, one very simple idea for extracting data from multiple existing archive volumes would be to use pmlogextract(1) to consolidate the specified volumes into a single temporary one and then to operate against the temporary archive. Because pmlogextract(1) supports the -S and -T options, a time window which spans archive volumes would automatically be supported. I imagine that this is already done manually in order to consolidate metrics from multiple sources. This would just be a way to automate this process.

If we were to implement this within libpcp, then no changes to the client tools would be necessary.  PM_CONTEXT_ARCHIVE could do it under the covers, or could use internal logic similar to that used by pmlogextract(1) in order to consolidate the specified archive volumes.

Streaming live data:
While the above could get us very quickly to multi-volume support against existing archive volumes, it may not be helpful in reaching the subsequent goals of live-archive tailing and transitioning from archived to live data. For these, we need some way of streaming new data as it is generated. In order to make the transition from archived to live data, we must be able to identify the following:
  • When the archive volume we're reading from is live
  • When it becomes no longer live
  • What the next live archive volume is (if any), otherwise, which pmcd is the source of the live data.
We could try to implement conventions within the archive file system for providing this information via new metadata or some similar mechanism and then write client-side code to handle polling for new data or transitioning to a pmcd once the end of a live archive has been reached. However, there is already a PCP component which knows about and manages all of this information. It is pmlogger(1). pmlogger(1) already knows the location of the archives it is creating, which one is the live one, whether there will be a new live archive once the current one is ended and which pmcd is the source of live data.

One way to make all of this available to a client tool in a seamless way would be to allow pmlogger(1) to be a source of metrics in the same way that pmcd is, in addition to its logging function. That is, given a pmlogger(1) instance, a client/tool could connect in the same way that it would to a pmcd instance (call it PM_CONTEXT_LOGGER?). The client/tool could then specify a time window. If the time window reaches into the past then pmlogger(1) would then access the appropriate archive volume(s), as needed to extract the requested metrics up until the specified end time. If no end time is given by the client, or if the end time is in the future, then pmlogger(1) would automatically transition to relaying live data to the client/tool, in addition to logging it, once the end of the current live archive has been reached. If logging were to be terminated, then pmlogger(1) could continue to relay metrics to the client/tool without logging them.

PM_CONTEXT_ARCHIVE would not be obsolete, since there may not be a pmlogger(1) instance running and the client may not want archive tailing or live data. For those cases an active pmlogger(1) instance would be necessary in any case.

Implementation:
The retrieval of archived metrics could be done on separate threads within pmlogger(1), one for each connecting client. Relaying of tailed or live data could be done on the main thread. There would be a list of fd's to write the data to, one of which could be the one for the currently active archive log file (if any).

One choice to be made would be how to handle the case of a client connecting with no start time. This could either mean "extract metrics from the beginning of the known logs" or it could mean "live data only". I propose having it mean the former and having some special time value which means "now" (perhaps there is already one) which could be used as a start time to indicate "live data only". Similarly there could be a special end time value which means "forever" which could be explicitly used instead of omitting an end time.

The above is only a rough sketch of how we could implement multi-volume+live metric playback with little impact on the existing tools.

Thoughts, comments, ideas please!
Dave
<Prev in Thread] Current Thread [Next in Thread>