pcp
[Top] [All Lists]

Re: [pcp] Checking PCP archives - RFC

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] Checking PCP archives - RFC
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Wed, 22 May 2013 09:52:32 +1000
Cc: PCP Mailing List <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <141842009.8767308.1369177350957.JavaMail.root@xxxxxxxxxx>
References: <519AC94B.9020904@xxxxxxxxxxxxxxxx> <141842009.8767308.1369177350957.JavaMail.root@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6
On 22/05/13 09:02, Nathan Scott wrote:
Hi Ken,

----- Original Message -----
The attached document spells out the case for pmlogcheck.

I'd appreciate feedback.

I think missing is discussion of the existing pmlogcheck(1) ...

I think you meant pmloglabel(1) ... good point, I had completely overlooked this, because the bug that triggered this had nothing to do with the label records.

... and how a
new tool will supplant that guy, presumably keeping the existing code
as a final pass?  pmloglabel also could get a mention - perhaps it'll
become a wrapper tool, since I'd expect its label checks would become
part of the bigger picture here.

pmlogcheck uses the PMAPI in non-interp mode.  pmloglabel doesn't use
the PMAPI at all for I/O, IIRC, and I think is more the model you're
thinking here (does the file I/O directly to get labels).

pmlogcheck (the new tool) will use low-level I/O until it is sure the structure of the archive files is correct (pass 0), then uses the PMAPI (or more likely the undocumented __pm* routines below the PMAPI) routines to check the semantics of the archive in the later passes.

I'd tend to keep pmloglabel separate until I figure out if and how pmlogcheck should offer to repair damaged archives. Certainly the label checking part of pmloglabel would be replicated (even copied maybe) into pmlogcheck.

...  I'm looking forward to helping out; keen to learn more
about some of the more ancient aspects of this code (metadata format &
temporal index format & usage).

There's really nothing too tricky here ... but I'll take the opportunity to add more verbose than usual comments (for me!) in pmlogcheck to document this sort of detail.

On a somewhat related discussion for another time - I'll throw it out
here just to get the hamster wheel spinning - I have had occasion to
wonder whether we could index other things too, so not only for quick
time-based lookups but also fast searching of other things like event
records containing specific parameters, parameters in a range, etc.
Big, big project - not sure if this can/should be shoe-horned into the
existing archive format - so, a discussion for some other time.  ;)

Interesting ... I suspect this might be another optional go-faster index external to the existing files (or perhaps another section added to the end of the current temporal index).

Do you imagine this tool might be able to *fix* corrupt archives too?
pmloglabel does, and that was extremely handy at the time.  Even if a
subset of the data was all that remained... can still be crucial if it
is the only record one has of what happened in the past.

Not sure about this ... very little of the sorts of problems I expect to encounter are amenable to automatic repair (with the possible exception of rebuilding the .index file which is feasible, but I'm not sure how useful until we actually have one that's missing). For example,
- timestamps going backwards
- pmid in pmResult not in PMNS
- instance in pmResult not in pmInDom at time of pmResult
- value encoding in pmResult does not match metric type from metadata

On the other hand, interactive repair could be very messy, both in terms of code complexity and UI confusion (anyone old enough to remember icheck/ncheck or even the early days of fsck will understand this).

I'll add repair to the TODO list for the moment.

Thanks for the feedback.

<Prev in Thread] Current Thread [Next in Thread>