Hi Ken,
----- Original Message -----
> > -----Original Message-----
> > pmlogger_merge: Warning: archive "20140123.09.55" is empty and will be
> > skipped Input archives to be merged:
> > 20140123.00.10
> > 20140123.16.25
> > pmlogextract: Error: __pmLogRead[log 20140123.00.10]: Corrupted record in
> > a PCP archive log
> > pmlogextract: Error occurred at byte offset 2690960 into a file of 2691072
> > bytes.
>
> Note, this is a place where we could be a whole lot smarter. The archive is
> not corrupted, it is simply truncated, and our library and tools could
> handle this in a more elegant manner (warn not error for example). I don'
> think this would be a big piece of work, we'd just need a different error
> (not PM_ERR_LOGREC) returned from __pmLogRead() and then an audit of the
> callers to __pmLogRead() ... pmlogextract would be a good place to start ...
> 8^)>
>
> I'd be willing to do this if it was considered a good idea.
Makes sense to me.
> I am not sure that pmlogger has been fixed to offer a mode that sacrifices
> additional write(2) calls as a trade-off for reducing the chance of log
> truncation ... this would reduce this case to being only seen when the
> system crashes (or fills up a filesystem), rather than including the
> pmlogger is killed by SIGKILL case).
Those write(2) calls are not atomic either of course. There's always going
to be tradeoffs, not clear there's anything terribly wrong with the status
quo for small values of archive size.
> [...]
> The empty log file was created by pmlogger just before it suffered infant
> mortality ... this is expected and does not break the merge and log rotate
> script.
Ah, interesting point! Thanks for the tip.
> > In terms of the cull/compress though, the last line is key ... it seems
> > overly
> > drastic - but appears to be working as planned, Ken?
>
> The current logic is if you cannot merge for some reason, do no delete the
> input archives ... but since the delete is done in a later pass, the only
> safe thing to do is to abandon the delete ... and this takes out the
> compress handling as well.
>
> The truncated archive fix proposed above would band-aid over this. But we
> probably need to add communication between the merge and cull/compress
> passes to allow the merge to indicate some archives are problematic and
> should not be culled or compressed ... this would not be too hard to
> implement.
Question is - is the right thing to do to attempt to keep these old, broken
archives, or cull them along with the old not broken ones? Thats not clear,
to me anyway, in this case I'd been happily accepting all data would be gone
after the cull time.
> > Self-correction would be good, once the problem archives have all scrolled
> > past their use-by date?
>
> So the second suggestion above would do this, I think.
If it can be done without being overly complex, thats fine, but otherwise I
think the (presumably simpler) alternative of culling all older files would
be fine too.
> > - from inspection of the pmlogger_daily script, we appear to always do a
> > logmerge, even if there is only one log - could this not simply be handled
> > via a
> > mv(1) of the files? (avoiding the read/write I/O there entirely, for the
> > simple
> > case of one archive for the previous day - i.e. no pmcd/pmlogger restarts).
>
> Yep, that's an optimization that could be done.
Fabulous!
> Probably safer to do a 2 pass operation
> 1. hard link each of the index, meta, 0, 1, ... files
> 2. rm the original ones
>
> This way you always have at least one complete archive, even if the system
> crashes or the script is interrupted/terminated.
Yep, sounds good.
cheers.
--
Nathan
|