pcp
[Top] [All Lists]

Re: pmmgr pmlogger default behaviour

To: pcp@xxxxxxxxxxx
Subject: Re: pmmgr pmlogger default behaviour
From: fche@xxxxxxxxxx (Frank Ch. Eigler)
Date: Wed, 05 Feb 2014 21:00:32 -0500
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <1178788786.16735370.1391119719356.JavaMail.root@xxxxxxxxxx> (Nathan Scott's message of "Thu, 30 Jan 2014 17:08:39 -0500 (EST)")
References: <2108905700.15892281.1391033827018.JavaMail.root@xxxxxxxxxx> <1583484726.15908616.1391037005341.JavaMail.root@xxxxxxxxxx> <20140130181134.GA7584@xxxxxxxxxx> <1178788786.16735370.1391119719356.JavaMail.root@xxxxxxxxxx>
User-agent: Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux)
Hi -


>> What kinds of file corruption can one expect to deal with on modern
>> systems, that would affect these files so badly?  Do we want to be in
>> the storage-safety department just on their account? 
>
> Oh all manner of corruption - and using a wide umbrella there for
> "corruption", covering accidentally overwriting the start of a file
> (which, bizarrely, happens more than I'd expect); fat-fingers on
> sysadmins - accidental file removal; 

Considering that the pcp log/archive files are world-readable, there
is little reason for a privileged user to be poking around in the
directories.  Fat-fingered sysadmins are in a class of risk that is
better served with backups or limited accounts, not inconveniencing
ordinary users.


> system crash with no data flush - tail of the file corruption, etc,
> etc.  The last one is the most common, I think.  [...]

The inputs to merging are cleaned up by pmmgr only after the merging
processes successfully complete.  If that completion is premature in
your sense, then we must add an fsync()-equivalent into the various
archive-generating tools at their exit/close time.  This is
independent of pmmgr.


>> [...] I know of no modern file processing tool that deliberately
>> slices up its own data, just to protect it from unspecified
>> hypothetical breakage.

> For example all modern filesystems do this sort of thing - putting
> multiple copies of critical data structures all over the place, and
> going to great lengths to keep them at arms length from each other
> for recovery purposes

We weren't talking about replicating or checksumming, which we don't
do ourselves.  We were talking about artificially slicing up data,
hoping to isolate damage to individual chunks.  Not many filesystems
do this, to my knowledge; in fact many deliberately put related data
(inodes+blocks/directory-entries) as close to each other as practical
for speed.  Databases don't do this either AFAIK - they assume
reliable storage.  I'm also unaware of analogous system-log-type data
being cut up this way specifically against fat-fingered sysadmins.


> There is a huge difference between losing one days worth of data vs
> two weeks+.  The + is because current pmmgr scheme loses more and
> more data as the collection period is increased, whereas thats not
> the case for pmlogger_daily.

No, the pmmgr scheme "loses" (present tense?!) no data: pmmgr instead
preserves data in its original form in case of errors.  What can lose
data are hypothetical manual administrative interference or
hardware/OS failures, both of which are non-pcp-specific and thus have
traditional procedures to deal with.


> [...]  Something else that just occurred to me is that the pmmgr
> model of changing *every single archive* involved, *every single
> day* further increases the loss risk.  [...]

No, pmmgr does not "change" archives in the sense of rewriting or
modifying them.  pmmgr's "pmlogmerge" option creates *new ones*.  The
extra disk I/O does represent additional I/O but not casual corruption
risk.  (If one cannot trust one's kernel to write new files of a
hundred megabytes once a day, without losing parts, one needs to
switch to linux. :-)


This thread hardly acknowledged the fact that there is a trade-off
being discussed here.  The positive side is that merging makes it easy
for a PCP user to use the data.  She doesn't have to remember how to
splice it together; to play with multiple -a options or
comma-separated -a suboptions; or clumsy gui dialog boxes; or manual
pmlogextract steps.  It's already in one convenient chunk for instant
gratification.  It's as close to "grand-unified" as we can get to today.

I believe there is value in making convenient defaults new PCP users.
It's the defaults of a new tool we're talking about, not changing the
behaviors of the ones that established installations have gotten
accustomed to (and are quite capable of overriding defaults).  In that
context, let's not be afraid to experiment with more sophisticated
defaults in the future too, as long as they maximize new-user utility.


- FChE

<Prev in Thread] Current Thread [Next in Thread>