xfs
[Top] [All Lists]

Re: XFS write cache flush policy

To: Matthias Schniedermeyer <ms@xxxxxxx>
Subject: Re: XFS write cache flush policy
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 20 Dec 2012 09:43:28 +1100
Cc: Lin Li <sdeber@xxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20121219010445.GA24313@xxxxxxx>
References: <20121214111924.GA4762@xxxxxxx> <20121215221622.GF9806@dastard> <20121216103025.GA14880@xxxxxxx> <20121216111046.GA16756@xxxxxxx> <20121216204847.GN9806@dastard> <20121216232251.GA20370@xxxxxxx> <20121217232441.GA5031@dastard> <20121218003438.GB30736@xxxxxxx> <20121218202914.GC15182@dastard> <20121219010445.GA24313@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Dec 19, 2012 at 02:04:45AM +0100, Matthias Schniedermeyer wrote:
> On 19.12.2012 07:29, Dave Chinner wrote:
> > On Tue, Dec 18, 2012 at 01:34:38AM +0100, Matthias Schniedermeyer wrote:
> > > On 18.12.2012 10:24, Dave Chinner wrote:
> > > > 
> > > > diff --git a/fs/xfs/xfs_sync.c b/fs/xfs/xfs_sync.c
> > > > index 9500caf..7bf85e8 100644
> > > > --- a/fs/xfs/xfs_sync.c
> > > > +++ b/fs/xfs/xfs_sync.c
> > > > @@ -400,7 +400,7 @@ xfs_sync_worker(
> > > >          * cancel_delayed_work_sync on this work queue before tearing 
> > > > down
> > > >          * the ail and the log in xfs_log_unmount.
> > > >          */
> > > > -       if (!(mp->m_super->s_flags & MS_ACTIVE) &&
> > > > +       if ((mp->m_super->s_flags & MS_ACTIVE) &&
> > > >             !(mp->m_flags & XFS_MOUNT_RDONLY)) {
> > > >                 /* dgc: errors ignored here */
> > > >                 if (mp->m_super->s_writers.frozen == SB_UNFROZEN &&
> > > > 
> > > > 
> > > 
> > > This also appears to fix the other case.
> > > When the activity ceases sharply and the log is still not written after 
> > > minutes.
> > > 
> > > After writing 10 files, waiting a minute, yanking ... all 10 files where 
> > > there.
> > > So the OP-case MIGHT have been this same error.
> > > But that's the amateuer talking again.
> > 
> > I kinda deserved that, didn't I? ;)
> > 
> > But now I understand the problem, I agree with you that the OP was
> > probably seeing the same bug. I understand the cause, and can
> > explain exactly how it would cause both sets of symptoms reported...
> 
> Great.
> 
> That means less lost time in the future, when a USB-disc "decides" to go 
> MIA.
> The record was about 45 minutes lost, or something over 200GB just 
> going up in smoke (without the smoke).
> 
> At least until such a bug is reintroduced in the future.
> This Bug was introduced in 3.5(*) and existed up to 3.7 and if i 
> understand you correctly was fixed more or less by accident for 3.8.
.....
> git blame dates the lines at 2012-05-21

Sure, and the review discussion of this very change identified the
whole startup/shutdown code as needing to be rewritten to fix a
systemic problems that we'd been patching repeatedly:

http://oss.sgi.com/archives/xfs/2012-06/msg00064.html

IOWs, while the bug was introduced by accident due to the code being
nasty and difficult to follow, it wasn't fixed by accident. It was
fixed as a result of review, reflection and redesign - three
fundamental properties of the iterative engineering process we use
here...

> I'd say there is definitely something amiss in the test-suite, this is 
> basic functionality that appears untested to me. (I don't know what the 
> test-suite contains, only that it exists)

This failure behaviour is definitely tested - the tests that were
triggering all the unmount problems that we were trying to fix were
the ones that simulate filesystem shutdowns and check that the data
is still there after the filesystem is remounted. i.e. very similar
tests to what you are running here.

The thing is, no matter how much testing we do, there are always
cases that will elude us. In this case, it's the fact that you have
to slowly create files and write lots of data to trigger the
problem. If you create enough files quickly enough, then the log
gets written automatically, and the problem you are seeing is not
exposed.

Many of these tests were written before delayed logging existed, so
file creations were comparitively slow compared to now, so the
balance of tests changes as we change the kernel code. That is, a
test that once may have exposed this problem due to the timing of
operations may not expose it now because the timing is different.
Hence what we once tested may not get exercised effectively now.
Once we identify issues like this, we generally write new tests to
reproduce and cover the issue...

That's the reality of the situation - testing is never going to be
100% reliable, even if we have tests that specifically cover
behaviour like this....

> At least i'd count a dropped connection or power failure (The only 
> difference is that in the latter case the cache MAY get dropped, 
> otherwise i'd say both cases are basically the same) among the basic 
> functionality that should be assured by a journaling fileystem.

A journalling filesystem doesn't guarantee that you won't lose any
data on crash, power fail or permanent IO errors. All journalling
guarantees is that the filesystem is *consistent* after recovery.
i.e. you don't have to run xfs_repair after such a failure to
ensure it is not corrupted. In all your testing, you have not seen
the filesystem become corrupted, so the journalling has fulfilled
it's guarantee.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>