pcp
[Top] [All Lists]

Multi-Archive Contexts: Scaling and Consistency

To: PCP Mailing List <pcp@xxxxxxxxxxx>
Subject: Multi-Archive Contexts: Scaling and Consistency
From: Dave Brolley <brolley@xxxxxxxxxx>
Date: Tue, 10 Nov 2015 15:52:05 -0500
Delivered-to: pcp@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
Hello All,

Most of you are probably aware that I have been (slowly) working on multi-archive contexts for some time now. If so, then you are also probably aware of my branch at git://git.pcp.io/brolley/multi-archive where the current prototype resides.

For those not aware, on the branch, most tools which formerly allowed only one -a option to specify an archive context now accept:
  • more than one -a option
  • comma-separated lists of archives on individual -a options
  • specification of a directory as an archive (all archives within the directory are identified)
  • any combination of the above
The set of archives defined using these methods are then treated as a single archive within the context with no additional effort required on the part of the tool.

From a PMAPI point of view, __pmOptAddArchive() now accumulates the names provided on each call into a comma-separated list and pmNewContext(3) now accepts a comma-separated list of archives.

Tools which use (opts->flags & PM_OPTFLAG_MULTI) continue to work as before.

At this point, the prototype then treats the list of archives as one single one within the context and handles ordering them and transitioning from one to the other with no additional intervention required by the PMAPI client.

Feedback on the existing prototype has been received from a few (thanks!) and is still welcome. The comments fall mainly into these categories:
  1. Scaling for large PCP installations, which may retain large number of related archives
  2. The need to handle sets of archives which may be dynamically changing. i.e. new archives may be appearing (via an active pmlogger) and perhaps disappearing within a specified directory while the context is open.
  3. The requirement of some PMAPI functions to examine the entire PMNS of a given context.
In addition to addressing the feedback received, there is additional work to be done which primarily consists of implementing consistency checking among the PMNS of each individual archive to ensure that they are consistent such that they can represent a single context in a manageable way.

The following is an outline of a high level design which is intended to address all of the above. It is intended to describe the high level approach to representing the multi-archive context as well as some high level implementation details. There are also some alternatives which will be chosen based on feedback concerning the required level of performance for various use cases.

Single PMNS for the entire context
This is needed for those APIs which have a need to examine the entire PMNS of the context. Examples include pmTraversePMNS(3) and pmLookup*(3).

I propose that this PMNS be built up as each individual archive is accessed. The main reason is that consistency checking can then also be performed as each archive is accessed. In the case of a consistency issue, it is then possible (even probable) that useful data will have been provided to the client before the problem is encountered. The label of each archive still needs to be examined when the context is opened, in order to determine their ordering in overall the time line, but it is not necessary to examine the PMNS (.meta) of each until the metrics within them are to be examined.

In the case of an API call, like pmTraversePMNS(3), we can bite the bullet and complete the PMNS of the entire archive set as needed.

From an implementation point of view the __pmLogCtl->l_pmns of each individual archive will reference the global PMNS instead of each maintaining their own PMNS as is done today.

Scaling
The primary issue is resource management when scaling to PCP installations for which individual directories may contain extremely large numbers of related archives. In particular, we don't want to keep large numbers of file descriptors open simultaneously. Of secondary concern is a build up of data in memory, for each archive which has been accessed which, for some usage scenarios, is unlikely to be referenced again. These concerns apply mainly to directories of archives. Lists of individual archives are unlikely to present a problem.

The design of data structures and policies for retention could potentially depend on what kinds of usage scenarios we envision. We must also keep in mind that, in the case of directories of archives, new archives could be dynamically appearing via an active pmlogger or via some other means. They could also be dynamically disappearing, however this is just as easy to detect and should probably be treated as an error situation.

In all use scenarios, we need to maintain the entire active set of archives for the purpose of maintaining their order within the time line. In particular, no two archives can overlap in time. We must also be able to insert new archives into the correct position in the time line. All of this requires that we, at a minimum store the start and end time of each archive in the active set.

Scaling Possibilities
  1. Keep one archive open at a time with no caching of any data from previously accessed archives
    • must re-read .index and .meta each time we return to the same archive
      • can still avoid redoing consistency checks
    • no danger of potentially unused resource build up
    • optimized for single direction traversal
    • potentially slow for tools which transition back and forth between archives
      • but not slower than the initial transition or than each transition in a uni-directional traversal

  2. Keep one archive open at a time but retain all .index, .meta, caches and all other __pmLogCtl data for previously accessed archives
    • need only re-open .index and .meta files, no need to re-read
    • optimized for traversal back and forth between archives from beginning to end
    • prone to build up of large amounts of potentially never-to-be-used-again data

  3. Keep one archive open at a time but retain limited .index, .meta, caches and all other __pmLogCtl data for previously accessed archives
    • keep a cache of this data for the most recently accessed archives
      • 2 or 3 previous archives might be sufficient
    • could leave archives in the cache open, including fds
      Â OR
      need only re-open .index and .meta files, no need to re-read
    • optimized for traversal back and forth between recent archives but would not slow down a uni-directional traversal
    • not prone to build up of large amounts of potentially never-to-be-used-again data.
My feeling is that 1) is the simplest and is optimized for what I believe is the most common use case, which is to read the archives in one direction from beginning to end or vice-versa. Changing direction across archive boundaries would be slower than if we were to cache some data, but no slower than a single direction traversal for any of the 3 suggestions. 1) could also be easily extended to become 3) should we discover that the performance of re-crossing archive boundaries is inadequate.

Dynamic Archive Management
Here is an outline of how the individual archives would be managed, regardless of which scaling option is chosen.

I feel that we should cater to the possibility of new archives being created within directories but only at the end of the time line for each directory (if any). pmlogger(1) would create new archives in this way. I believe that handing the creation of new archives at random points in the time line whenever an arbitrary archive boundary is crossed, would be a waste of time. The algorithms below will check for new archives within a directory only when a request for data is made for a time just after the end of the time line of a given directory. If archives disappear while the context is open, then I believe that the errors which occur if/when we attempt to read the files will be sufficient.

Here are some algorithms for handling various events associated with a multi-archive context:

When a new PM_CONTEXT_ARCHIVE context is opened (pmNewContext(3)):
The list of names (which may contain only one item) is examined
For each item in the list
 If it is an individual archive, it is added to the active set (see below)
 else If it is a directory
ÂÂÂ For each archive in the directory
ÂÂÂÂÂÂ add it to the active set (see below)
ÂÂÂ Mark the final archive in the time line of each directory
 else
ÂÂÂÂÂÂ error
Adding an archive to the active set entails:
reading the label to discover the start and end times
adding the archive to the active set in temporal order while checking for overlaps
ÂÂ any temporal overlap is an error
When we need to change archives in order to fulfil a request:
If the request is for a time just beyond the end of the final archive within a directory (marked above)
ÂÂ re-check the directory for new archives and add them to the active set (see above)
ÂÂ re-discover which archive is the final one in the time line of the directory and mark it
Determine which archive spans the time of the request
ÂÂ optimized by searching the time line beginning at the currently active archive
Switch to that archive
ÂÂ if it has not been previously done
ÂÂÂ ÂÂÂ check the consistency of the PMNS of the new archive with the
ÂÂÂ ÂÂÂÂÂ existing global PMNS
ÂÂ ÂÂÂÂ unmanageable differences are an error
If a request requires the entire PMNS of the context:
re-check all directories (if any) for new archives
read and check the PMNS of all previously unaccessed archives
If a request can be fulfilled within the currently active archive, then there is no multi-archive overhead.

These algorithms should minimize and even eliminate multi-archive overhead for most requests which are usually for data in the same temporal neighbourhood as the previous request. Overhead only occures when we need to transition to another archive in the set. They should also minimize the overhead associated with managing dynamic archive creation by tools like pmlogger. These are accomplished by using the fact that each archive in the set is temporally distinct and that we only need to check for new archives when traversing past the end of the final archive in any given directory.

Questions, concerns, ideas, comments ..... please!

Dave
<Prev in Thread] Current Thread [Next in Thread>