xfs
[Top] [All Lists]

XFS file corruption bug - more info

To: linux-xfs@xxxxxxxxxxx
Subject: XFS file corruption bug - more info
From: James Foris <jforis@xxxxxxxxx>
Date: Wed, 16 Mar 2005 22:29:34 -0600
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041116
We were not able to complete all tests we wanted today, but some critical
information on how to reproduce the problem was uncovered, so here is some
further clarifications.


1. The bad blocks were never written, as opposed to being zeroed; we have been
using new data patterns on each pass and can see that the bad blocks are from the
previous pass when this corruption happens.



2. The test files are created by copying a reference file with a "cp" command
from a shell script. This probably eliminates DIRECT_IO issues from consideration.



3. A key requirement we learned today is that multiple writers must be used to create the problem.

If only a single write process is used, no corruption is found when we check the files.
When two or more (we typically use 5) are creating files in the "md0" partition at the
same time, then the problem appears.


In practice, we have a script that creates directories then copies/creates a fixed number of files
into each. It is invoked with a directory name, and the name of a reference file; 5 copies of the
script are executed simultaneously with different target directory names.


If only a single copy of this writer script is executed - we do not see corruption.


4. We have confirmed that while the first chunk is never corrupted, all other chunks
may show corruption.


For a 516K file and a 128K chunk size, the first 128K is never affected and corruption
may be seen anywhere else in the file (chunk 2, 3, or 4). (We are using 512+4 K to make
sure that different file alignments relative to file system chunk boundaries are tested.)



5. We have switched to running all tests on a fairly stock 2.6.11 kernel. We do add
some of the fedora core patches that affect the RPM build environment and device
driver updates. Any FC patch that affected core systems (execshield, 4K stacks,
net/disk dump, etc) is not applied - we have learned to be very conservative about this,
having had things like JAVA broken by their changes.



Tomorrow (I hope) we will address more chunk sizes and minimum partition sizes.
Also, we will check behavior of the JFS/RAID0 file system combination.


As always, any ideas/questions are welcome.

Jim Foris





<Prev in Thread] Current Thread [Next in Thread>
  • XFS file corruption bug - more info, James Foris <=