pcp
[Top] [All Lists]

Re: [pcp] pmlogger -u questions

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: [pcp] pmlogger -u questions
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Mon, 14 Apr 2014 12:08:48 +1000
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1665962954.4723287.1397437104781.JavaMail.zimbra@xxxxxxxxxx>
References: <01e901cf56df$4ce97de0$e6bc79a0$@internode.on.net> <1665962954.4723287.1397437104781.JavaMail.zimbra@xxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
On 14/04/14 10:58, Nathan Scott wrote:
...
I can't remember all of the context of this one, I think it was Frank
who pointed out -u but the context is eluding me...

As the archives are being created, the stdio buffering in pmlogger (before we even get to the block layer / buffer cache in the kernel) guarantees that if pmlogger dies unexpectedly, or someone else tries to read the archive as it is being written, there is a 99.95% chance (2047 in 2048) of the archive appearing to be corrupted (assuming 8K stdio buffers, and we always write records that are a whole number of 4-byte words, and assuming my pending truncated != corrupted change has not been applied).

With -u the chance becomes 0% of the archive appearing truncated in the absence of a system crash.

This closes, but does not remove the time window in which the physical files
are not consistent and aligned with the end of a logical archive record.


Is our concern about "not consistent" here on-disk or in-memory consistency?

Not consistent if pmlogger dies or someone tries to read the archive while it is being written. It is not an on-disk issue. It is an issue between pmlogger and the block layer / buffer cache of the kernel.

-u can be passed to all the pmloggers being managed by pmlogger_check and
friends by adding âu to each line in the control file.


Not clear who (which tools?/code?) benefit from that, if anyone...?

All the loggers run by pmlogger_{check,daily} are candidate beneficiaries.


As I see it, we have 3 options:
[...]
Thoughts? Comments?


I think we need some demonstrated, concrete, actual issues here - its all a
bit too hypothetical at this stage - a (QA?) tool that attempts to read from
the end of a growing log file for starters, and the clear existence of some
problems, before we start fixing them.  Do we have such a QA tool already?
(reading the cases covered in pmGetArchiveEnd in libpcp suggests we might?)

The issue is real, but I'm obviously not explaining it well enough. I am not sure there will be QA coverage, because the issue is a design problem, rather than a bug in implementation.

Let me try with an annotated example.

pmlogger is running and at some point, does a pmFetch which returns a PDU of 8700 bytes and a change in an instance domain, so the new instance domain requires 400 bytes to be written to the .meta file.

Now with the default stdio buffering ...

1. the data PDU has to be split somewhere on an 8K boundary, so the head of the PDU is written to the .0 file and the tail of the PDU stays in the stdio buffer within pmlogger

2. either none or some of the new instance domain remains in the stdio buffers within pmlogger ... making the metadata either truncated or inconsistent with the archive data file

Neither of these problems are new. But as we inch towards a unified context of some sort, and before that pmNewContext accepting a directory name as an argument for PM_CONTEXT_ARCHIVE it is going to become a more pressing matter.

<Prev in Thread] Current Thread [Next in Thread>