Hi, Dave -
> I've been learning about pmlogger archives, their file formats, the
> related archive management tools and the pcp clients [...]
Thanks!
> * PCP archives are created by pmlogger in distinct volumes due to various
> constraints, such as a maximum file size of 2GB, the desire to allow
> organization of the collected data, the desire to be able to manage data
> retention (i.e. log rotation) and, undoubtedly, for other reasons as well.
Correction: archive *volumes* are just the .0 / .1 / .2 / .3 ... files
that logically constitute a single *archive*. These are split only
for the 2GB file limit reason (due to the 32-bit size of offsets in
the meta/index files).
Archives consisting of multiple volumes (.0-.N files) are already
handled transparently in libpcp, for both reading and writing
purposes. So where you use "multi-volume", you probably (should) mean
"multi-archive" in the following.
> * Some multi-volume support exists in the form of archive folios. [...]
> [...]
Folios are for grouping multiple archives together (each of which may have
one or more .0-.N volumes).
> What we would like have is for PCP client tools to have the ability
> to easily extract metrics from multiple archive volumes. Ultimately,
> we would also like tail-like following of an active archive volume
> with seamless transition from archived data to live data.
These are distinct steps in the 'grand unification' process; we can do
one at a time.
> Here are a few ideas for realizing these goals:
>
> Client/tool interface:
> Currently only a single archive volume may be specified by its base name (via
> PM_CONTEXT_ARCHIVE or -a). We could allow the specification of multiple
> archive
> specs, each of which could be:
>
> * an archive volume file base name -- same as now
> * the name of a directory containing a collection of PCP archive volumes
> * wildcarded names which resolve to a list of the above items
>
> For example,
>
> pminfo -a 20140930.0 -a 201408*.* -a /some/path/archives -a /another/path/
> archive*
This is possible, but we may get a long way without requiring
extension of the user interface of the pmapi tools (namely, it's
undesirable to have to use multiple -a flags, ie. multiple explicit
contexts within the PMAPI client code.) Even just
pminfo -a 'GLOB*' # note quoting
pminfo -a /path/to/directory
would be a big step forward, and can be done entirely within libpcp
(no modification to pminfo etc.).
> PM_CONTEXT_ARCHIVE could be extended to support more than one archive volume.
> [...]
> For PCP tools, one very simple idea for extracting data from multiple existing
> archive volumes would be to use pmlogextract(1) to consolidate the specified
> volumes into a single temporary one and then to operate against the temporary
> archive.
This is possible, but causes a potentially tragic amount I/O. What
would be desirable is to extend the libpcp code for PM_CONTEXT_ARCHIVE
handling to transparently jump from archive to archive, in much the
same way it already jumps from volume to volume, as the current "time"
changes.
> Streaming live data:
> [...]
Lots of good ideas in there, but how about we leave this part until
later?
- FChE
|