pcp
[Top] [All Lists]

RE: pcp updates - qa and pmlogmv

To: "'Frank Ch. Eigler'" <fche@xxxxxxxxxx>
Subject: RE: pcp updates - qa and pmlogmv
From: "Ken McDonell" <kenj@xxxxxxxxxxxxxxxx>
Date: Thu, 27 Mar 2014 14:46:39 +1100
Cc: <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <y0mppl97iff.fsf@xxxxxxxx>
References: <5331FF3C.20304@xxxxxxxxxxxxxxxx> <y0mppl97iff.fsf@xxxxxxxx>
Thread-index: AQJ5EVEfTMZP8uquxkUFyWPtek2heAJWw0rgmY4A2kA=
> -----Original Message-----
> From: Frank Ch. Eigler [mailto:fche@xxxxxxxxxx]
> Sent: Wednesday, 26 March 2014 2:22 PM
> ...
> Could you elaborate what kind of atomicity this is intended to guarantee?
> The man page says:
> 
>        Because PCP archives are important records of system activity,
special
>        care  is  taken  to ensure the integrity of an archive's files.
Should
>        any problem be encountered during the execution  of  pmlogmv,  all
the
>        files  associated with oldname will be preserved, and no new files
with
>        the newname prefix will be created.  In the event of a system
crash, at
>        least one of oldname or newname will be preserved.
> 
> One might reword the last and second-last sentences to indicate that a
> system crash is not an example of "any problem", so weaker guarantees
> apply.

OK, I was most interested in "failures" of the form ... cannot create link,
cannot rm and SIGINT.

For system crashes, I'll reword to be less assertive ... if the buffer cache
is flushed on the way down, then the intent was that either the new or the
old archive would survive (there is a very remote chance that both would
survive).  I cannot guarantee a file system with snapshorts or
multi-operation transactions, so here, like everywhere else we're exposes to
file system semantics.  My goal was to ensure we did not clobber or delete
parts of the archive in the process of renaming, provided the system is
still running at the end of pmlogmv.

> Looking at the script itself, it's not clear it can deliver even the first
guarantee
> as is.  Maybe the most clear-cut problem is in the unlink stage at the
end,
> wherein stat(1) Links:-count is used to validate the state for each input
file.
> But it appears subject to the common TOCTTOU vulnerability: a race
> between the time link-counts are checked and the results used.  If
multiple
> pmlogmv scripts run concurrently with the same inputs, the results are
> indeterminate.

I want to be pragmatic here.

The most common use case will be for pmlogger-daily where there is already a
directory-level lock to prevent bad concurrency from the pmlogger-check and
pmlogger-daily scripts.  So I don't think anything special is needed for
this case.

For interactive ad hoc use, the risk is no different to multiple
users/processes executing rm and mv at the same time ... I think this is a
case of don't point loaded weapons are your feet, not a case for genetic
engineering to remove feet from humans.

> Another scenario is if a detected error occurs during the unlink loop, it
> appears possible for some files to disappear: say the first two files from
> $tmp.old were successfully handled (including the unlink of the $old
name),
> but then a failure occurred for the third.  At this point, _cleanup nukes
all the
> $tmp.new files, leaving no trace of the first two at all.  You might need
to
> track a more accurate transaction-log or undo-log type data, not just a
> $tmp.old / $tmp.new lists to correct this part.

OK, this is indeed a design/implementation bug.  In an earlier version I
used $tmp.old as the driver and reversed the links if the old file did not
exist, then cleaned up the $tmp.new files if they existed.  In an overly
aggressive optimization I removed the first part and did not have qa
coverage for the case you described (or did not notice that the result had
changed and was wrong).  I'll fix this.

> (There might be other problems, these are just two that jumped out.)

Code reviews and feedback are always welcome, thanks.

<Prev in Thread] Current Thread [Next in Thread>