Ken McDonell wrote:
Doing temporal data reduction correctly for PCP archives has been an
itch I've had for about 12 years ... yep the itch predates pmlogmerge,
which predates pmlogextract, which predates pmlogreduce. They all
managed to not solve the problem is assorted creative ways.
So, attached are my initial thoughts for a new pmlogreduce.
Comments most welcome before I start hacking too seriously.
------------------------------------------------------------------------
Proposal for a replacement pmlogreduce
Ken McDonell
kenj@xxxxxxxxxxx
In the open source PCP distribution, the existing pmlogreduce tool is a
quick hack in response to:
1. failure of both pmlogmerge and pmlogextract to meet their original
slice-n-dice specifications,
2. expediency for the SGI NASmanager product to be able to support
PCP archives spanning days, weeks and months,
We effectively solved this by changing the nasavg PMDA to only use archives
to prime the graph history (and limiting the duration), but then switch to
live mode - i.e. the PMDA is also a live PCP client, running in what might
be called head-up-your-own-ass-mode :)
I guess this might enable another holy grail: derived metrics across more
than just the temporal domain. And even more strangely, archives containing
data from more than one host.
Also, pmid remapping or aliasing would be a good feature to have, but
maybe that's a job for a different tool.
More comments when I have more time ...
Cheers
-- Mark
3. getting the data semantics correct is at best hard, and in some
cases impossible when the temporal domain is compressed.
This document outlines a plan to rewrite pmlogreduce to address the
deficiencies of the current implementation.
Basics
* One input archive - from either pmlogger or pmlogextract.
Specifically, if you want to combine multiple archives and do data
reduction, you'll need to:
1. keep all the original archives
2. concatenate them (and possibly filter them, see below)
with pmlogextract
3. then use pmlogreduce to apply the temporal reduction
* One output archive.
* Focus on semantically correct data reduction in the temporal domain.
* We intended to preserve the semantics of pmlogger's output as much
as possible. In particular this means when the archive is
processed with any of the standard tools, the value reported at
time t is representative of the value that would have been
observed over the interval up to time t.
* The acid test of correctness should be that a reporting tool, e.g.
kmchart, should produce the same results with either the input
archive or the output archive when the reporting interval is set
to the same delta as was used to create the output archive from
pmlogreduce.
Some Things NOT Supported
* Filtering of instances or metrics - pmlogextract does a fine job
of this, and we're not going to make pmlogreduce even more
complicated to support this functionality.
* PMID re-mapping - if the PMID of a metric has the misfortune to
change over its life, pmlogextract will choke and we never get to
pmlogreduce. The right way to address this would be an extension
to pmlogextract or the binary PCP archive editor that has been
part written and part threatened (pmlogneurosurgeon?).
* Instance domain re-mapping - it seems the only same assumption is
that the internal instance identifiers maintain constant semantics
for each instance domain over the duration processed by pmlogreduce.
* Changes in metric semantics. Many of these are impossible to
support, and the few that make sense require pmResult rewriting
and should probably be done in a steroid-enhanced version of
pmlogextract.
Since the variations that involve changes to metric semantics or metric
metadata would have to make it through pmlogextract, the problem really
belongs there, and pmlogreduce is effectively insulated from these ugly
issues by the "I only accept one input archive" assertion.
Some Things that WILL be Supported
The existing pmlogreduce attempts some of the list below, but most of
these features are either not implemented, or implemented incorrectly in
the current code.
* The temporal reduction is achieved by the -t delta command line
option. The output archive will contain observations at most once
per delta for each metric-instance pair in the input archive.
* The -A align command line option may be used to align the
observations in the output archive to natural time boundaries.
* The -S and -T command line options may be used to specify a
starting and/or ending time window on the input archive (and hence
the output archive).
* The -Z and -z command line options are supported to vary the
timezone interpretation of the -S and -T options.
* The size of the output archive may be limited with the -s command
line option.
* Multi-volume output archives will be supported through the -v
command line option and internal volume switching logic to ensure
the 32-bit offset limit of the temporal index is not exceeded.
* Counters will be rate converted (so mapped to INSTANTANEOUS
metrics, have their semantics changed when the TIME DIMENSION is
reduced by one, e.g. MBYTE -> MBYTE / SEC, and their TYPE will be
converted to DOUBLE).
* Counters that wrap between consecutive observations in the input
archive will be treated as a single counter wrap and converted
accordingly. Note that if one or more MARK records separates the
consecutive observations, the wrap conversion will not be done.
* INSTANTANEOUS metrics with numeric value will be converted to a
time-average. For example, consider the input archive data below:
Time Value
60 25
120 100
180 80
240 20
Then for the interval 100-200, the output value computed by pmlogreduce
would be:
(25*(120-100)+100*(180-120)+80*(200-180))/100 = 81. Alternatively
consider this to be the integral under the curve of the value over a
time interval, divided by the length of the time interval.
* Support for MARK records and missing data (at interval
boundaries). The notion of a confidence level will be introduced,
with a -k percent command line option. If the value for a
metric-instance is defined over at least percent of the interval,
then the corresponding value will be used as representative of the
value over the whole interval - which is like saying the missing
value was at the observed value for the remainder of the interval.
A likely default percent is 85.
* In the region of MARK records, the value will correctly be
interpreted as unknown between the last observation and the MARK,
and between the MARK and the first observation. The one exception
is DISCRETE metrics where a prior value is defined right up to the
MARK record.
* Dynamic instance domains will be supported.
Some Open Questions
The following issues warrant some discussion before I make unilateral
decisions.
1. Output Window Clipping. In several useful deployments of
pmlogreduce one may wish to further restrict the temporal domain
by selecting some re-occurring periods to be included, and some to
be excluded. Examples might be between the hours 08:00 and 20:00
each day, and/or each day excluding Saturday and Sunday. There
are several problems here:
1. suitable command line syntax to specify this sort of clipping
2. what would the output archive contain - no pmResult, or
pmResult and no metrics (which is formally a MARK record)
for each delta in the "clipped" region
3. there is no real tool support to replay and/or report on an
archive of this style
2. Should DISCRETE metrics appear in the output only if there is a
value observed in the corresponding interval in the input archive?
The alternative is to have all metrics repeated in every pmResult
in the output archive.
3. For DISCRETE metrics, and all but the last value before a MARK
record or the end of the input archive for INSTANTANEOUS metrics,
consecutive identical values can be omitted without changing the
data semantics - is this worth it?
4. What to do with COUNTER metrics that have a TIME dimension other
than 0 or 1? I don't know that we have any such metrics, and I'm
not sure what the real semantics of data like this might be, but
it seems pretty obvious that "rate conversion" is not going to
make the semantics any more obvious!
5. For INSTANTANEOUS and DISCRETE metrics with non-numeric values, we
have to decide what to do if multiple observations appear in the
input archive within a single output archive time interval. Take
the last observed value seems to be the least worst thing to do.
--
Mark Goodwin markgw@xxxxxxx
Engineering Manager for XFS and PCP Phone: +61-3-99631937
SGI Australian Software Group Cell: +61-4-18969583
-------------------------------------------------------------
|