pcp
[Top] [All Lists]

Re: Multi-Volume Archive + Live Data Playback for PCP Client Tools

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: Multi-Volume Archive + Live Data Playback for PCP Client Tools
From: Dave Brolley <brolley@xxxxxxxxxx>
Date: Wed, 01 Oct 2014 14:58:09 -0400
Cc: PCP Mailing List <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <y0megurfxp7.fsf@xxxxxxxx>
References: <542C21AE.1010504@xxxxxxxxxx> <y0megurfxp7.fsf@xxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0
On 10/01/2014 02:32 PM, Frank Ch. Eigler wrote:
Thanks!
Thanks for the quick feedback!
   * PCP archives are created by pmlogger in distinct volumes due to various
     constraints, such as a maximum file size of 2GB, the desire to allow
     organization of the collected data, the desire to be able to manage data
     retention (i.e. log rotation) and, undoubtedly, for other reasons as well.
Correction: archive *volumes* are just the .0 / .1 / .2 / .3 ... files
that logically constitute a single *archive*.  These are split only
for the 2GB file limit reason (due to the 32-bit size of offsets in
the meta/index files).

Archives consisting of multiple volumes (.0-.N files) are already
handled transparently in libpcp, for both reading and writing
purposes.  So where you use "multi-volume", you probably (should) mean
"multi-archive" in the following.
Oh, ok. I didn't realize that there was a distinction and I didn't know that libpcp already supported treating a collection of .0 - .N volumes as a single archive (didn't get that deep into the code). So part of what I thought we didn't support is already supported. Great!
   * Some multi-volume support exists in the form of archive folios. [...]
[...]
Folios are for grouping multiple archives together (each of which may have
one or more .0-.N volumes).
Got it.
What we would like have is for PCP client tools to have the ability
to easily extract metrics from multiple archive volumes. Ultimately,
we would also like tail-like following of an active archive volume
with seamless transition from archived data to live data.
These are distinct steps in the 'grand unification' process; we can do
one at a time.
Yes, I not only see them as separate steps, but necessarily separate functionality all together, as can be seen by the way I proposed separate schemes for each.
Here are a few ideas for realizing these goals:

Client/tool interface:
Currently only a single archive volume may be specified by its base name (via
PM_CONTEXT_ARCHIVE or -a). We could allow the specification of multiple archive
specs, each of which could be:

   * an archive volume file base name -- same as now
   * the name of a directory containing a collection of PCP archive volumes
   * wildcarded names which resolve to a list of the above items

For example,

    pminfo -a 20140930.0 -a 201408*.* -a /some/path/archives -a /another/path/
archive*
This is possible, but we may get a long way without requiring
extension of the user interface of the pmapi tools (namely, it's
undesirable to have to use multiple -a flags, ie. multiple explicit
contexts within the PMAPI client code.)  Even just

     pminfo -a 'GLOB*'              # note quoting
     pminfo -a /path/to/directory

would be a big step forward, and can be done entirely within libpcp
(no modification to pminfo etc.).

PM_CONTEXT_ARCHIVE could be extended to support more than one archive volume.
I wasn't actually proposing multiple explicit contexts within pmapi code, but rather extending a single PM_CONTEXT_ARCHIVE to be able to handle more than one archive. It would be done in a way that would allow existing clients to continue to work without changes. Given that clarification, do you still see multiple -a flags as undesirable?
[...]
For PCP tools, one very simple idea for extracting data from multiple existing
archive volumes would be to use pmlogextract(1) to consolidate the specified
volumes into a single temporary one and then to operate against the temporary
archive.
This is possible, but causes a potentially tragic amount I/O.  What
would be desirable is to extend the libpcp code for PM_CONTEXT_ARCHIVE
handling to transparently jump from archive to archive, in much the
same way it already jumps from volume to volume, as the current "time"
changes.
Seems doable.
Streaming live data:
[...]
Lots of good ideas in there, but how about we leave this part until
later?
Why wait to start developing ideas?

Thanks,
Dave

<Prev in Thread] Current Thread [Next in Thread>