[pcp] Multi-Volume Archive + Live Data Playback for PCP Client Tools
Dave Brolley
brolley at redhat.com
Wed Oct 1 10:45:50 CDT 2014
Hi,
I've been learning about pmlogger archives, their file formats, the
related archive management tools and the pcp clients which support the
'archive' context with the goal of coming up with a design for allowing
the extraction of PCP metrics across archive boundaries. I first want to
write down what I think I've learned followed by a couple of ideas for
how this could be done from the user's point of view as well as from a
technical point of view.
Of course, I know that others among you have been thinking about this
and have much more expertise, especially Ken, so please correct me where
I have it wrong and add your own thoughts, ideas and comments!
The current situation as I understand it:
* PCP archives are created by pmlogger in distinct volumes due to
various constraints, such as a maximum file size of 2GB, the desire
to allow organization of the collected data, the desire to be able
to manage data retention (i.e. log rotation) and, undoubtedly, for
other reasons as well.
* Some multi-volume support exists in the form of archive folios.
These can be created by mkaf(1) but are also created by some other
tools, such as pmchart(1) and pmview(1). Archive volumes in a folio
may be examined using pmafm(1) using its 'replay', 'repeat' or 'run'
commands. The latter two commands allow for repeated application of
PCP client tools against one or more archives in the folio.
* The archive management tool, pmlogextract and indirectly,
pmlogger_daily and pmlogger_merge, provide the ability to extract
data from multiple archives and combine that data into a single
archive volume.
* Otherwise, PCP client tools are currently restricted to extracting
metrics from a single archive volume via PM_CONTEXT_ARCHIVE (the -a
option). A single archive volume and an option time window is
specified, which is applied against that single archive volume.
What we would like have is for PCP client tools to have the ability to
easily extract metrics from multiple archive volumes. Ultimately, we
would also like tail-like following of an active archive volume with
seamless transition from archived data to live data.
Here are a few ideas for realizing these goals:
*Client/tool interface:*
Currently only a single archive volume may be specified by its base name
(via PM_CONTEXT_ARCHIVE or -a). We could allow the specification of
multiple archive specs, each of which could be:
* an archive volume file base name -- same as now
* the name of a directory containing a collection of PCP archive volumes
* wildcarded names which resolve to a list of the above items
For example,
pminfo -a 20140930.0 -a 201408*.* -a /some/path/archives -a
/another/path/archive*
PM_CONTEXT_ARCHIVE could be extended to support more than one archive
volume.
*Extracting multi-volume data:*
For PCP tools, one very simple idea for extracting data from multiple
existing archive volumes would be to use pmlogextract(1) to consolidate
the specified volumes into a single temporary one and then to operate
against the temporary archive. Because pmlogextract(1) supports the -S
and -T options, a time window which spans archive volumes would
automatically be supported. I imagine that this is already done manually
in order to consolidate metrics from multiple sources. This would just
be a way to automate this process.
If we were to implement this within libpcp, then no changes to the
client tools would be necessary. PM_CONTEXT_ARCHIVE could do it under
the covers, or could use internal logic similar to that used by
pmlogextract(1) in order to consolidate the specified archive volumes.
*Streaming live data:*
While the above could get us very quickly to multi-volume support
against existing archive volumes, it may not be helpful in reaching the
subsequent goals of live-archive tailing and transitioning from archived
to live data. For these, we need some way of streaming new data as it is
generated. In order to make the transition from archived to live data,
we must be able to identify the following:
* When the archive volume we're reading from is live
* When it becomes no longer live
* What the next live archive volume is (if any), otherwise, which pmcd
is the source of the live data.
We could try to implement conventions within the archive file system for
providing this information via new metadata or some similar mechanism
and then write client-side code to handle polling for new data or
transitioning to a pmcd once the end of a live archive has been reached.
However, there is already a PCP component which knows about and manages
all of this information. It is pmlogger(1). pmlogger(1) already knows
the location of the archives it is creating, which one is the live one,
whether there will be a new live archive once the current one is ended
and which pmcd is the source of live data.
One way to make all of this available to a client tool in a seamless way
would be to allow pmlogger(1) to be a source of metrics in the same way
that pmcd is, in addition to its logging function. That is, given a
pmlogger(1) instance, a client/tool could connect in the same way that
it would to a pmcd instance (call it PM_CONTEXT_LOGGER?). The
client/tool could then specify a time window. If the time window reaches
into the past then pmlogger(1) would then access the appropriate archive
volume(s), as needed to extract the requested metrics up until the
specified end time. If no end time is given by the client, or if the end
time is in the future, then pmlogger(1) would automatically transition
to relaying live data to the client/tool, in addition to logging it,
once the end of the current live archive has been reached. If logging
were to be terminated, then pmlogger(1) could continue to relay metrics
to the client/tool without logging them.
PM_CONTEXT_ARCHIVE would not be obsolete, since there may not be a
pmlogger(1) instance running and the client may not want archive tailing
or live data. For those cases an active pmlogger(1) instance would be
necessary in any case.
*Implementation:*
The retrieval of archived metrics could be done on separate threads
within pmlogger(1), one for each connecting client. Relaying of tailed
or live data could be done on the main thread. There would be a list of
fd's to write the data to, one of which could be the one for the
currently active archive log file (if any).
One choice to be made would be how to handle the case of a client
connecting with no start time. This could either mean "extract metrics
from the beginning of the known logs" or it could mean "live data only".
I propose having it mean the former and having some special time value
which means "now" (perhaps there is already one) which could be used as
a start time to indicate "live data only". Similarly there could be a
special end time value which means "forever" which could be explicitly
used instead of omitting an end time.
The above is only a rough sketch of how we could implement
multi-volume+live metric playback with little impact on the existing tools.
Thoughts, comments, ideas please!
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/pcp/attachments/20141001/a710bb46/attachment-0001.html>
More information about the pcp
mailing list