xfs
[Top] [All Lists]

Re: XFS write cache flush policy

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: XFS write cache flush policy
From: Matthias Schniedermeyer <ms@xxxxxxx>
Date: Fri, 14 Dec 2012 12:19:24 +0100
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Lin Li <sdeber@xxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <50C64C17.9080206@xxxxxxxxxxx>
References: <CAA_rkDfFUmZzT_kMznsTSNVxdfqfmz=bmJ400wdBOzocgP32eA@xxxxxxxxxxxxxx> <20121208192927.GA17875@xxxxxxx> <20121210005820.GG15784@dastard> <20121210091239.GA21114@xxxxxxx> <50C64C17.9080206@xxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
> >> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> >>
> >> Basically, you have an IO error situation, and you have dm-crypt
> >> in-between buffering an unknown about of changes. In my experience,
> >> data loss eventsi are rarely filesystem problems when USB drives or
> >> dm-crypt is involved...
> > 
> > I don't know the inner workings auf dm-*, but shouldn't it behave 
> > transparent and rely on the block-layer for buffering.
> 
> I think that's partly why Dave asked you to test it, to check
> that theory ;)

To test that theory.

Technically this is an other machine than the original but i tried to 
recreate as much from the original cirumstances as possible.
Kernel is 3.6.7

First i recreated the circumstances.
I plugged a HDD i'm throwing out into the enclosure that was the most 
problematic, created the dm-crypt-layer & filesystem as reported and 
started copying.

In all testes i didn't supply any mount-options!

1)
After a few minutes i "emulated" the problem by unplugging the cable.
At that point about 40 files were copied, but only 25 where there after 
i replugged the cable.

2)
BUT the directory-structure had changed in the meantime, the first 22 
files were in an other directory i didn't have the first time. In the 
first test all >=200 files were in the same directory.

So i retested by just copying the directory with which i had my original 
trouble.
This time i used a timer and after a little over 5 minutes 23 files were 
copied, after replugging only the same 3 files as from the first try 
where retained.

3)
This time i ditched the dm-crypt-layer.
I mkfs'ed with the same parameters on a plain 100GB partition.

Copied the same files as in 2), after 5 minutes 24 files were copied and 
after re-plugging the same 3 files were retained.


At this point the amateur in me says: dm-crypt is "transparent".

A new kernel was released, so a retry with 3.7.0/plain-partition.

4)
Same as 3)

The only difference is that 3.7.0 appears to be much quicker to pass on 
the error, the rsync-process was "happyly" procedding with 3.6.7 until i 
manually cancled it a few second after unplugging the cable.
With 3.7.0 it immediately stopped with Input/Output error.

5)
Same as 3/4)

A second before unplugging i 'ls -l'ed the directory, all files copied 
were visible at that point.

6)
Same as 5)

But this time i issued a 'sync' at about the halfway-point.
This time a total of 13 files were retained, a ls -l just before the 
sync showed 12 files. But the sync took 20 seconds, so the 13th file 
must have been completed in the time between start/finished of the sync 
command.


In conclusive the amateuer in me says:
The data is never send to the drive, as all this test DON'T include a 
power-failure, only connection failure.





-- 

Matthias

<Prev in Thread] Current Thread [Next in Thread>