pcp
[Top] [All Lists]

Re: pmlogger -u questions

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: pmlogger -u questions
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Sun, 13 Apr 2014 22:59:42 -0400
Cc: Nathan Scott <nathans@xxxxxxxxxx>, pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <534B4330.1060008@xxxxxxxxxxxxxxxx> (Ken McDonell's message of "Mon, 14 Apr 2014 12:08:48 +1000")
References: <01e901cf56df$4ce97de0$e6bc79a0$@internode.on.net> <1665962954.4723287.1397437104781.JavaMail.zimbra@xxxxxxxxxx> <534B4330.1060008@xxxxxxxxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
kenj wrote:

> [...]
> With -u the chance becomes 0% of the archive appearing truncated in
> the absence of a system crash.

Well, almost -- with the fwrite(3) data going out in dribs & drabs
already, all those pre-fflush(3) moments can make the file appear
truncated to another reader.  (One might see that with a abort()
inserted between the fwrite's and fflush's.)

The -u option is at least useful for a lesser level of correctness,
namely satisfying the pcp-archive.5 invariant that metadata must be
present for metric values in the .0 file, by fflush()ing the .meta
files before writing into the archive.

This -u is not sufficient to protect the data from system crashes;
one'd need fsync(2) syscalls in there too.  It could be colocated with
the -u fflush()es, or left to the fche/fsync-prototype fclose().


> Not consistent if pmlogger dies or someone tries to read the archive
> while it is being written.  It is not an on-disk issue.  [...]

(It is, to the extent that some kernel-level write(2)s could occur in
sequences that are inconsistent.)


>> Not clear who (which tools?/code?) benefit from that, if anyone...?
>
> All the loggers run by pmlogger_{check,daily} are candidate beneficiaries.

... as is anyone who loves their data. :-) With the present scheme,
it's not hard to find pmlogger-generated archives that PMAPI refuses
to open.  I've got a bunch here, whether resulting from an untimely
pmlogger exit or a system crash/reboot.  (Note that our own tools
sometimes SIGKILL an intransigent pmlogger.)

(By the way, the same thing happens to systemd journals on my
machines/VMs with some regularity, and those become write-offs after a
"recovery" consisting of just moving the corrupt files out of place
and letting them rot until GC.)


- FChE

<Prev in Thread] Current Thread [Next in Thread>