pcp
[Top] [All Lists]

Re: [pcp] braindump on unified-context / live-logging

To: Dave Brolley <brolley@xxxxxxxxxx>
Subject: Re: [pcp] braindump on unified-context / live-logging
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Mon, 20 Jan 2014 17:16:30 -0500 (EST)
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <52DD7596.3040306@xxxxxxxxxx>
References: <20140108013956.GG15448@xxxxxxxxxx> <52DD7596.3040306@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Thread-index: EHulGzo5ch/QqFzUnzRKp6kOOKB8ig==
Thread-topic: braindump on unified-context / live-logging
Hi Dave,

----- Original Message -----
> Hi All,
> 
> It looks like I will be getting involved with this, so let me see if I
> can summarize what I understand what has been discussed so far.
> 
> It looks like the big-picture direction of this is to allow tools to
> specify a time interval for their metrics which could include archived
> data and continue on into the 'live' domain without the tools needing to
> be aware of where the data originated and continues to originate. That
> is, a tool may want data from some starting time to some ending time or
> from some starting time and continuing on into the future and the tool
> should not have to be bothered with the details of which archives
> contain the previous data or how how the future data will be obtained.

*nod*

> A unified context was suggested for local data, but seems to have been
> replaced

Personally, I was unconvinced that the recent round of suggestions was a
step forward, so for me definitely not "replaced" ... I do still have a
strong preference for Plan A; everything stated there matches my current
thinking re the ideal long term approach we should take ("Plan A" being:
http://oss.sgi.com/pipermail/pcp/2013-September/003963.html)

Also note the above mail does not mean "local data" only.  See the final
para there - "We do however have pmproxy up our sleeves...".  pmproxy is
a client, which creates __pmContexts - which can be unified contexts too.
Work remains to define protocol extensions that would suit querying that
historical data (which is true of all suggested approaches).

Also also, it might not have been clear in the original mail but dealing
with *multiple, potentially overlapping* archives for the same host is
also very much needed, and done transparently (as well as live/archive
switching).  And this is probably the most difficult part - archives can
overlap, some may be growing, new metrics can come along into existing
archives (pmlc), the overlaps can be temporal, and metrics, or a metrics
instances, will be in sometimes overlapping, other times disjoint sets -
tough problems lurking there that need to be worked through.  There can
also be quite a few archive files to deal with (in the order of 1000s
per host after a couple of years of data collection).

> by the notion of a 'live' archive mode in which data would be
> obtained across archives, if needed, and in which an active archive
> would continue to be read as data is added to it, if needed. Given a
> specified time window, the distinction between -h and -a becomes blurred
> in this world.
> 
> Since the data may not be local, an intermediate server has been
> suggested (either a new one or an extension to an existing server) which
> would handle the details of where to get data for a given time period
> for a given host, thus abstracting the idea of whether the data is local
> or remote.
> 
> Some details of existing and previously existing tools which could read
> active pmlogger archives has been given along with their caveats.

Max also had a suggestion around ordering log label writes so that the
updated log end time is only written to the log label once the data has
been flushed.  Pretty sure the code doesn't do that currently, not sure
if its needed (given existing code in place now), but it sounded like a
good synchronisation option if tailing log data is still problematic (I
don't know the answer to that - we need to find out).

> Hopefully I have it right so far.
> 
> The area in which I will be getting involved, initially, will be the
> transition from archived to live data. So far, the discussion has
> focused on the difficulties of reading new data from an active archive.
> Maybe I missed it but the existence of an active archive suggests to me
> the existence of an active pmlogger which, suggests a reachable pmcd
> (perhaps via pmproxy). Has anyone suggested simply switching to
> obtaining live data from the pmcd rather than trying to read data from
> the active archive?

Yes - that is (part of) Plan A... "Capturing end of archive (PM_ERR_EOL)
and transitioning to a separate, live context".  Its necessary for both
to be in place - think of tools like pmchart, where interaction with the
archive and fetching from it is very complex (new metrics are added to
the fetched set on-the-fly, and it may run for a long time with archives
actively growing/changing/appearing underneath it).

[by "both" above, I mean both reading an actively growing archive & then
transitioning to a real host context - and back again, in pmcharts case]

The new context type (unified, as opposed to one host or one archive)
will need to manage a set of archive/host contexts, IOW.  In impl.h,
__pmContext will need extension to manage multiple __pmArchCtl's (so,
c_archctl will need to become a tree structure with an efficient time-
range lookup) and ideally we will add to that tree dynamically, as new
archives appear.

> Perhaps the concern was in missing a metric value
> still cached in pmlogger and not yet written to the archive? I'm sure
> that there would be additional synchronization issues. I just wanted to
> make sure that the idea had not already been raised and dismissed.

Right.  In terms of synchronisation issues, see src/libpcp/src/logutil.c
__pmGetArchiveEnd which deals with many, many such issues already, and
also interp.c (mwahahahah!) __pmLogFetchInterp (ctxp->c_archctl->ac_end)
which uses that interface under the covers.

cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>