xfs
[Top] [All Lists]

Re: XFS write cache flush policy

To: Matthias Schniedermeyer <ms@xxxxxxx>
Subject: Re: XFS write cache flush policy
From: Ric Wheeler <rwheeler@xxxxxxxxxx>
Date: Fri, 14 Dec 2012 13:57:11 +0000
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Lin Li <sdeber@xxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20121214111924.GA4762@xxxxxxx>
References: <CAA_rkDfFUmZzT_kMznsTSNVxdfqfmz=bmJ400wdBOzocgP32eA@xxxxxxxxxxxxxx> <20121208192927.GA17875@xxxxxxx> <20121210005820.GG15784@dastard> <20121210091239.GA21114@xxxxxxx> <50C64C17.9080206@xxxxxxxxxxx> <20121214111924.GA4762@xxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0
On 12/14/2012 11:19 AM, Matthias Schniedermeyer wrote:
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Basically, you have an IO error situation, and you have dm-crypt
in-between buffering an unknown about of changes. In my experience,
data loss eventsi are rarely filesystem problems when USB drives or
dm-crypt is involved...
I don't know the inner workings auf dm-*, but shouldn't it behave
transparent and rely on the block-layer for buffering.
I think that's partly why Dave asked you to test it, to check
that theory ;)
To test that theory.

Technically this is an other machine than the original but i tried to
recreate as much from the original cirumstances as possible.
Kernel is 3.6.7

First i recreated the circumstances.
I plugged a HDD i'm throwing out into the enclosure that was the most
problematic, created the dm-crypt-layer & filesystem as reported and
started copying.

In all testes i didn't supply any mount-options!

1)
After a few minutes i "emulated" the problem by unplugging the cable.
At that point about 40 files were copied, but only 25 where there after
i replugged the cable.

Just a note - depending on the drive and its firmware, unplugging a cable is *not* the same as a power loss since the firmware detects the loss of link and immediately writes back any volatile cache data to platter (and it has power, so that is easy for it to do :)).

You really should drop power to the enclosure to get a "mean" test :)

Ric


2)
BUT the directory-structure had changed in the meantime, the first 22
files were in an other directory i didn't have the first time. In the
first test all >=200 files were in the same directory.

So i retested by just copying the directory with which i had my original
trouble.
This time i used a timer and after a little over 5 minutes 23 files were
copied, after replugging only the same 3 files as from the first try
where retained.

3)
This time i ditched the dm-crypt-layer.
I mkfs'ed with the same parameters on a plain 100GB partition.

Copied the same files as in 2), after 5 minutes 24 files were copied and
after re-plugging the same 3 files were retained.


At this point the amateur in me says: dm-crypt is "transparent".

A new kernel was released, so a retry with 3.7.0/plain-partition.

4)
Same as 3)

The only difference is that 3.7.0 appears to be much quicker to pass on
the error, the rsync-process was "happyly" procedding with 3.6.7 until i
manually cancled it a few second after unplugging the cable.
With 3.7.0 it immediately stopped with Input/Output error.

5)
Same as 3/4)

A second before unplugging i 'ls -l'ed the directory, all files copied
were visible at that point.

6)
Same as 5)

But this time i issued a 'sync' at about the halfway-point.
This time a total of 13 files were retained, a ls -l just before the
sync showed 12 files. But the sync took 20 seconds, so the 13th file
must have been completed in the time between start/finished of the sync
command.


In conclusive the amateuer in me says:
The data is never send to the drive, as all this test DON'T include a
power-failure, only connection failure.






<Prev in Thread] Current Thread [Next in Thread>