[Top] [All Lists]

Re: XFS write cache flush policy

To: Matthias Schniedermeyer <ms@xxxxxxx>
Subject: Re: XFS write cache flush policy
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sun, 16 Dec 2012 09:16:22 +1100
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Lin Li <sdeber@xxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20121214111924.GA4762@xxxxxxx>
References: <CAA_rkDfFUmZzT_kMznsTSNVxdfqfmz=bmJ400wdBOzocgP32eA@xxxxxxxxxxxxxx> <20121208192927.GA17875@xxxxxxx> <20121210005820.GG15784@dastard> <20121210091239.GA21114@xxxxxxx> <50C64C17.9080206@xxxxxxxxxxx> <20121214111924.GA4762@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Dec 14, 2012 at 12:19:24PM +0100, Matthias Schniedermeyer wrote:
> > >> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> > >>
> > >> Basically, you have an IO error situation, and you have dm-crypt
> > >> in-between buffering an unknown about of changes. In my experience,
> > >> data loss eventsi are rarely filesystem problems when USB drives or
> > >> dm-crypt is involved...
> > > 
> > > I don't know the inner workings auf dm-*, but shouldn't it behave 
> > > transparent and rely on the block-layer for buffering.
> > 
> > I think that's partly why Dave asked you to test it, to check
> > that theory ;)
> To test that theory.
> Technically this is an other machine than the original but i tried to 
> recreate as much from the original cirumstances as possible.
> Kernel is 3.6.7
> First i recreated the circumstances.
> I plugged a HDD i'm throwing out into the enclosure that was the most 
> problematic, created the dm-crypt-layer & filesystem as reported and 
> started copying.
> In all testes i didn't supply any mount-options!

That's one for the spell-check fail file.... :P

> 1)
> After a few minutes i "emulated" the problem by unplugging the cable.
> At that point about 40 files were copied, but only 25 where there after 
> i replugged the cable.

Ok, so you've basically got a situation where there is a single
directory block being modified repeatedly as new files are created.
Which means that there are a significant number of changes being
aggregated in memory and may not have been written to the log. When
you pull the cable, those changes are lost because they can't be
written to disk any more.

> 2)
> BUT the directory-structure had changed in the meantime, the first 22 
> files were in an other directory i didn't have the first time.

I don't understand what you are saying here. Can you please add "ls
-l" output of the directory structure before and after so that I can
observe exactly what you are trying to describe?

> In the 
> first test all >=200 files were in the same directory.
> So i retested by just copying the directory with which i had my original 
> trouble.
> This time i used a timer and after a little over 5 minutes 23 files were 
> copied, after replugging only the same 3 files as from the first try 
> where retained.

Which usually means one of two things:
        1. the metadata changes never got written to the log; or
        2. log recovery discarded them.


> 6)
> Same as 5)
> But this time i issued a 'sync' at about the halfway-point.
> This time a total of 13 files were retained, a ls -l just before the 
> sync showed 12 files. But the sync took 20 seconds, so the 13th file 
> must have been completed in the time between start/finished of the sync 
> command.

Which doesn't rule out either possibility. Log recovery can discard
transactions because of many reasons, one of them being a 19 year
old bug that isn't yet fixed in 3.7 (only just merged into

Which, given the same directory blocks are being continually
rewritten, it's entirely possible that this can occur - they keep
getting moved forward in the log, past the push target that is set
every 30s.

What you need to do is mount the filesystem after replugging it with
the mount options "-o ro,norecovery" to see what it on disk before
log recovery is run. If the files are all there, then it's a log
write or log recovery problem. If the files are not present, then the
metadata has not been written to disk and they aren't in the log,

If the files are on disk prior to log recovery running, then you ned
to dump the log to a file using xfs_logprint and send it to me so I
can analyse the content of the log.

If the files are not in the directory on disk before recovery, then
I suspect we are going to need an event trace to determine the
sequence of events leading up to unplug event. Having a copy of the
log would be handy in that case, too.

> In conclusive the amateuer in me says:
> The data is never send to the drive, as all this test DON'T include a 
> power-failure, only connection failure.


Like most amateurs you've jump to the obvious conclusion without
considering all the other possibilities that could give the same

There's a reason that we ask for specific information in
bug reports - paraphrasing or describing problems by words is
ambiguous and tainted by your perception of what the problem is. Not
to mention that what a user thinks is irrelevant is the often
exactly the critical detail an expert is looking to find.

What I'm saying is that playing the "armchair expert" simply makes
it harder for the real experts (us) to understand what your problem
is.  You may be right in the end that metadata hasn't been written,
but we have to understand *why* the metadata wasn't written to be
able to fix the problem, and that takes a whole lot more analysis
than just guessing...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>