pcp
[Top] [All Lists]

RE: archive format history, was Re: [Bug 1046] pmlogger heavy duplicatio

To: "'Frank Ch. Eigler'" <fche@xxxxxxxxxx>, <pcp@xxxxxxxxxxx>
Subject: RE: archive format history, was Re: [Bug 1046] pmlogger heavy duplication in .meta output
From: "Ken McDonell" <kenj@xxxxxxxxxxxxxxxx>
Date: Wed, 5 Feb 2014 09:37:17 +1100
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <20140203213412.GB6491@xxxxxxxxxx>
References: <bug-1046-936@xxxxxxxxxxxxxxxx/bugzilla/> <bug-1046-936-ayvjAepxjb@xxxxxxxxxxxxxxxx/bugzilla/> <20140203213412.GB6491@xxxxxxxxxx>
Thread-index: AQD6XDahYCyu5yw1Yqk+370zR3zIAwEjV14/AcAABeCcOBhdcA==
G'day Frank.

More History 101 below ... 8^)>

> > > - change lipcp/src/logmeta.c addindom and/or searchindom to merge
> > > rather than replace new instlist/namelist entries
> 
> > Are you suggesting reconstructing the temporal series of indom sets as
> > per the current format?
> 
> I was talking more about representation on disk, as opposed to a
particular
> reading/caching algorithm.

Hmm ... then I don't understand what "merge" means in the original
suggestion.  Can you provide a bit more detail of your thoughts?

> Curious about another aspect of the history.  Can someone explain how
> come we ended up with a .meta file that's separate from the main volume?
> The records could be intermingled in one (the pmFetch PDU's might just
> need a tag prefix, or maybe not even that).  Both sets accumulate
timewise.
> If the .meta data is a small fraction of the total, it could be replicated
in
> subsequent volumes without much fuss.
> So why a separate file?

Both the .meta and the .N files may grow over the period of pmlogger's life
(remember pmlc can dramatically effect what is being logged, although most
of the recent attention has been devoted to dynamic instance domains and
indom bloat in the .meta files ... something that was not possible in the
early days due to restrictions on the proc pmda, but that is another story).

Now, the only operating system I know (and love) that offers a file system
that supports multiple growing logical segments within the one file
container is the Michigan Terminal System (MTS) that was an OS/360
alternative in the days before Unix escaped from Bell Labs.  So with the
file systems we have to work with, the only option would be to have variant
records in the one file and intermix the metadata and pmResult data.  But
the metadata includes the PMNS and the metric descriptors and if there was a
single data stream all applications would have to read and process EVERY
record in the archive before they could answer the simplest questions about
the PMNS or the metric descriptors.  Given the expected relative size of the
metadata compared to the archive in toto (one, two, three or four orders of
magnitude), it was felt that the cost of reading all of the data to extract
the metadata was excessive.

To expedite random access into the archives we have an index, the .index
file, and this also grows over the life of pmlogger.  If the metadata was
combined with the pmResult data then one needs to consider what to do with
the index data ... putting it in the one file is  a waste of time unless you
have MTS and the ability to create a third growing segment in the file (if
you need to read the file to extract the index, you don't need the index!),
else put the index in another file, at which point the archive becomes
multi-file and a separate .meta file makes life only a very little more
difficult.

I still think this is the _right_ solution for Unix/Linux style file
systems.

Of course if there was a free, fast, multi-platform, reliable DBMS at the
time we designed the PCP archive format it may not have evolved as files in
the filesystem ... but it is not clear to me that this would have solved
more problems than it would have created (our data does not map well onto
the relational model, so we'd probably ended up with binary blobs for all
the stuff that really mattered and the DB infrastructure would not have
provided much leverage).

> If we had the meta+metric data together, we could have instantly a
> pipe/streaming format - something we've wanted for special PMDA
> purposes, but could come in handy in other ways.

What are the "special PMDA purposes"?  The PMDA operates with a pull model,
so the data stream is defined by the client request sequence ... when being
read, not written, PCP archives are subjected to the same operational
semantics in terms of the client request sequence.  The "across the API"
behaviour for a client is the same for PMCD (and so PMDA) and archives and
this is an intermixed sequence of data types, but I don't think of this a
streaming or pipe semantics.

I am interested to hear more ....

<Prev in Thread] Current Thread [Next in Thread>