Hello All,
Most of you are probably aware that I have been (slowly) working on
multi-archive contexts for some time now. If so, then you are also
probably aware of my branch at
git://git.pcp.io/brolley/multi-archive where the current prototype
resides.
For those not aware, on the branch, most tools which formerly
allowed only one -a option to specify an archive context now accept:
- more than one -a option
- comma-separated lists of archives on individual -a options
- specification of a directory as an archive (all archives
within the directory are identified)
- any combination of the above
The set of archives defined using these methods are then treated as
a single archive within the context with no additional effort
required on the part of the tool.
From a PMAPI point of view, __pmOptAddArchive() now accumulates the
names provided on each call into a comma-separated list and
pmNewContext(3) now accepts a comma-separated list of archives.
Tools which use (opts->flags & PM_OPTFLAG_MULTI) continue to
work as before.
At this point, the prototype then treats the list of archives as one
single one within the context and handles ordering them and
transitioning from one to the other with no additional intervention
required by the PMAPI client.
Feedback on the existing prototype has been received from a few
(thanks!) and is still welcome. The comments fall mainly into these
categories:
- Scaling for large PCP installations, which may retain large
number of related archives
- The need to handle sets of archives which may be dynamically
changing. i.e. new archives may be appearing (via an active
pmlogger) and perhaps disappearing within a specified directory
while the context is open.
- The requirement of some PMAPI functions to examine the entire
PMNS of a given context.
In addition to addressing the feedback received, there is additional
work to be done which primarily consists of implementing consistency
checking among the PMNS of each individual archive to ensure that
they are consistent such that they can represent a single context in
a manageable way.
The following is an outline of a high level design which is intended
to address all of the above. It is intended to describe the high
level approach to representing the multi-archive context as well as
some high level implementation details. There are also some
alternatives which will be chosen based on feedback concerning the
required level of performance for various use cases.
Single PMNS for the entire context
This is needed for those APIs which have a need to examine the
entire PMNS of the context. Examples include pmTraversePMNS(3) and
pmLookup*(3).
I propose that this PMNS be built up as each individual archive is
accessed. The main reason is that consistency checking can then also
be performed as each archive is accessed. In the case of a
consistency issue, it is then possible (even probable) that useful
data will have been provided to the client before the problem is
encountered. The label of each archive still needs to be examined
when the context is opened, in order to determine their ordering in
overall the time line, but it is not necessary to examine the PMNS
(.meta) of each until the metrics within them are to be examined.
In the case of an API call, like pmTraversePMNS(3), we can bite the
bullet and complete the PMNS of the entire archive set as needed.
From an implementation point of view the __pmLogCtl->l_pmns of
each individual archive will reference the global PMNS instead of
each maintaining their own PMNS as is done today.
Scaling
The primary issue is resource management when scaling to PCP
installations for which individual directories may contain extremely
large numbers of related archives. In particular, we don't want to
keep large numbers of file descriptors open simultaneously. Of
secondary concern is a build up of data in memory, for each archive
which has been accessed which, for some usage scenarios, is unlikely
to be referenced again. These concerns apply mainly to directories
of archives. Lists of individual archives are unlikely to present a
problem.
The design of data structures and policies for retention could
potentially depend on what kinds of usage scenarios we envision. We
must also keep in mind that, in the case of directories of archives,
new archives could be dynamically appearing via an active pmlogger
or via some other means. They could also be dynamically
disappearing, however this is just as easy to detect and should
probably be treated as an error situation.
In all use scenarios, we need to maintain the entire active set of
archives for the purpose of maintaining their order within the time
line. In particular, no two archives can overlap in time. We must
also be able to insert new archives into the correct position in the
time line. All of this requires that we, at a minimum store the
start and end time of each archive in the active set.
Scaling Possibilities
- Keep one archive open at a time with no caching of any data
from previously accessed archives
- must re-read .index and .meta each time we return to the
same archive
- can still avoid redoing consistency checks
- no danger of potentially unused resource build up
- optimized for single direction traversal
- potentially slow for tools which transition back and forth
between archives
- but not slower than the initial transition or than each
transition in a uni-directional traversal
- Keep one archive open at a time but retain all .index, .meta,
caches and all other __pmLogCtl data for previously accessed
archives
- need only re-open .index and .meta files, no need to re-read
- optimized for traversal back and forth between archives from
beginning to end
- prone to build up of large amounts of potentially
never-to-be-used-again data
- Keep one archive open at a time but retain limited .index,
.meta, caches and all other __pmLogCtl data for previously
accessed archives
- keep a cache of this data for the most recently accessed
archives
- 2 or 3 previous archives might be sufficient
- could leave archives in the cache open, including fds
 OR
need only re-open .index and .meta files, no need to re-read
- optimized for traversal back and forth between recent
archives but would not slow down a uni-directional traversal
- not prone to build up of large amounts of potentially
never-to-be-used-again data.
My feeling is that 1) is the simplest and is optimized for what I
believe is the most common use case, which is to read the archives
in one direction from beginning to end or vice-versa. Changing
direction across archive boundaries would be slower than if we were
to cache some data, but no slower than a single direction traversal
for any of the 3 suggestions. 1) could also be easily extended to
become 3) should we discover that the performance of re-crossing
archive boundaries is inadequate.
Dynamic Archive Management
Here is an outline of how the individual archives would be managed,
regardless of which scaling option is chosen.
I feel that we should cater to the possibility of new archives being
created within directories but only at the end of the time line for
each directory (if any). pmlogger(1) would create new archives in
this way. I believe that handing the creation of new archives at
random points in the time line whenever an arbitrary archive
boundary is crossed, would be a waste of time. The algorithms below
will check for new archives within a directory only when a request
for data is made for a time just after the end of the time line of a
given directory. If archives disappear while the context is open,
then I believe that the errors which occur if/when we attempt to
read the files will be sufficient.
Here are some algorithms for handling various events associated with
a multi-archive context:
When a new PM_CONTEXT_ARCHIVE context is opened (pmNewContext(3)):
The list of names (which may contain only one item) is
examined
For each item in the list
 If it is an individual archive, it is added to the active set
(see below)
 else If it is a directory
ÂÂÂ For each archive in the directory
ÂÂÂÂÂÂ add it to the active set (see below)
ÂÂÂ Mark the final archive in the time line of each directory
 else
ÂÂÂÂÂÂ error
Adding an archive to the active set entails:
reading the label to discover the start and end times
adding the archive to the active set in temporal order while
checking for overlaps
ÂÂ any temporal overlap is an error
When we need to change archives in order to fulfil a request:
If the request is for a time just beyond the end of the
final archive within a directory (marked above)
ÂÂ re-check the directory for new archives and add them to the
active set (see above)
ÂÂ re-discover which archive is the final one in the time line of
the directory and mark it
Determine which archive spans the time of the request
ÂÂ optimized by searching the time line beginning at the currently
active archive
Switch to that archive
ÂÂ if it has not been previously done
ÂÂÂ ÂÂÂ check the consistency of the PMNS of the new archive with
the
ÂÂÂ ÂÂÂÂÂ existing global PMNS
ÂÂ ÂÂÂÂ unmanageable differences are an error
If a request requires the entire PMNS of the context:
re-check all directories (if any) for new archives
read and check the PMNS of all previously unaccessed archives
If a request can be fulfilled within the currently active archive,
then there is no multi-archive overhead.
These algorithms should minimize and even eliminate multi-archive
overhead for most requests which are usually for data in the same
temporal neighbourhood as the previous request. Overhead only
occures when we need to transition to another archive in the set.
They should also minimize the overhead associated with managing
dynamic archive creation by tools like pmlogger. These are
accomplished by using the fact that each archive in the set is
temporally distinct and that we only need to check for new archives
when traversing past the end of the final archive in any given
directory.
Questions, concerns, ideas, comments ..... please!
Dave
|
|