On 19/11/15 05:57, Dave Brolley wrote:
Hi Ken (and All),
You may have seen the recent discussion regarding scaling, consistency
and support for dynamic behaviour within directories of archives. The
topic of how to treat archive boundaries with respect to scaling came up
as part of the discussion of whether and how to handle new archives
which may appear in the middle of the overall timeline while the context
is open.
Currently, the prototype treats archive boundaries as seamless. That is,
if we transition from Archive A to B while scaling some counter, the
last sample from A and the first same from B will be interpolated as if
they came from the same archive.
It has been suggested that the boundary actually represents a break in
the logging and that it should be treated as a virtual MARK record. Now
that I think about it, I am leaning toward this interpretation, since
the boundary does indeed represent a gap during which no logging was
performed.
I am interested in your opinion and suggestions for this and also for
the discussion re: scaling, consistency and dynamic behaviour.
Apologies, Dave. I owe you a considered review of your recent postings
and the replies, I've just been a bit busy with other matters.
I think by "scaling" you are referring to interpolation, although the
case for <mark> records is compelling for counter metrics, it is also
required for non-counter metrics (which are also subject to
interpolation when the PM_MODE_INTERP is used with pmSetMode()).
When you stitch archives together it is (semantically) required to treat
the transition from one archive to the next as a discontinuity and this
requires a <mark> record to be inserted into the stream.
This is what pmlogextract does, and I think the implementation of a
multi-archive context should produce _exactly_ the same stream of data
as would be produced by processing the merged archive from pmlogextract
run over all of the archives in the multi-archive context (hint: serious
QA fodder here).
Even when processing the multi-archive context with mode PM_MODE_FORW or
PM_MODE_BACK I would expect to see the <mark> records at the boundaries
between the archives.
As background, the reasons are:
1. stopping one archive and starting another is often associated with a
pmcd restart, which means some of the PMDAs will have reset their
counter metrics to zero and some of their non-counter metrics may have
assumed different values.
2. there may be a significant temporal gap between one archive and the
next (missing archives, pmlogger stopped working, pmcd stopped working,
lost the network connection between pmlogger and pmcd ...) so trying to
interpolate what happened in this region is little better than making
the numbers up.
<mark> records were introduced to ensure sound data semantics in the
regions between archives, and this is what you should be aiming for.
See the pmlogextract code to see how to generate an appropriate
timestamp for the <mark> record.
|