pcp
[Top] [All Lists]

Re: Multi-Volume Archive + Live Data Playback for PCP Client Tools

To: Dave Brolley <brolley@xxxxxxxxxx>
Subject: Re: Multi-Volume Archive + Live Data Playback for PCP Client Tools
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Wed, 01 Oct 2014 14:32:04 -0400
Cc: PCP Mailing List <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <542C21AE.1010504@xxxxxxxxxx> (Dave Brolley's message of "Wed, 01 Oct 2014 11:45:50 -0400")
References: <542C21AE.1010504@xxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
Hi, Dave -

> I've been learning about pmlogger archives, their file formats, the
> related archive management tools and the pcp clients [...]

Thanks!

>   * PCP archives are created by pmlogger in distinct volumes due to various
>     constraints, such as a maximum file size of 2GB, the desire to allow
>     organization of the collected data, the desire to be able to manage data
>     retention (i.e. log rotation) and, undoubtedly, for other reasons as well.

Correction: archive *volumes* are just the .0 / .1 / .2 / .3 ... files
that logically constitute a single *archive*.  These are split only
for the 2GB file limit reason (due to the 32-bit size of offsets in
the meta/index files).

Archives consisting of multiple volumes (.0-.N files) are already
handled transparently in libpcp, for both reading and writing
purposes.  So where you use "multi-volume", you probably (should) mean
"multi-archive" in the following.


>   * Some multi-volume support exists in the form of archive folios. [...]
> [...]

Folios are for grouping multiple archives together (each of which may have
one or more .0-.N volumes).


> What we would like have is for PCP client tools to have the ability
> to easily extract metrics from multiple archive volumes. Ultimately,
> we would also like tail-like following of an active archive volume
> with seamless transition from archived data to live data.

These are distinct steps in the 'grand unification' process; we can do
one at a time.


> Here are a few ideas for realizing these goals:
>
> Client/tool interface:
> Currently only a single archive volume may be specified by its base name (via
> PM_CONTEXT_ARCHIVE or -a). We could allow the specification of multiple 
> archive
> specs, each of which could be:
>
>   * an archive volume file base name -- same as now
>   * the name of a directory containing a collection of PCP archive volumes
>   * wildcarded names which resolve to a list of the above items
>
> For example,
>
>    pminfo -a 20140930.0 -a 201408*.* -a /some/path/archives -a /another/path/
> archive*

This is possible, but we may get a long way without requiring
extension of the user interface of the pmapi tools (namely, it's
undesirable to have to use multiple -a flags, ie. multiple explicit
contexts within the PMAPI client code.)  Even just

    pminfo -a 'GLOB*'              # note quoting
    pminfo -a /path/to/directory

would be a big step forward, and can be done entirely within libpcp
(no modification to pminfo etc.).



> PM_CONTEXT_ARCHIVE could be extended to support more than one archive volume.
> [...]
> For PCP tools, one very simple idea for extracting data from multiple existing
> archive volumes would be to use pmlogextract(1) to consolidate the specified
> volumes into a single temporary one and then to operate against the temporary
> archive. 

This is possible, but causes a potentially tragic amount I/O.  What
would be desirable is to extend the libpcp code for PM_CONTEXT_ARCHIVE
handling to transparently jump from archive to archive, in much the
same way it already jumps from volume to volume, as the current "time"
changes.


> Streaming live data:
> [...]

Lots of good ideas in there, but how about we leave this part until
later?


- FChE

<Prev in Thread] Current Thread [Next in Thread>