On Mon, Sep 22, 2014 at 03:40:36PM -0400, Brian Foster wrote:
> XFS had a data corruption problem where writeback of pages to unwritten
> extents would fail to run unwritten extent conversion at I/O completion.
> This causes subsequent reads of written, but unconverted regions to
> return zeroes. This occurs on sub-page block size filesystems when
> writeback contends for the inode lock (e.g., with a file writer).
>
> Add a test that creates the conditions to reproduce the data corruption
> and detect it by looking for unwritten extents after all said extents
> have been overwritten.
>
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
I still think the error handling for the unwritten extent case is
wrong. Failure debug is exactly what $seqres.full is for, so that's
where failure information should go. If we detect a failure case
and have to abort immediately, then _fail() should be used. And
_fail() leaves a message to look at $seqres.full for details.
So:
> + # Check for unwritten extents. We should have none since we wrote over
> + # the entire preallocated region and ran fsync.
> + $XFS_IO_PROG \
> + -c "fiemap -v" \
> + $SCRATCH_MNT/file | _filter_fiemap | \
> + grep unwritten >> $seqres.full 2>&1
> + if [ $? == 0 ]; then
> + # data corruption! dump the extent list and break out to dump
> + # the file content
> + $XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/file
> + break
> + fi
Can simply be:
$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/file | \
tee -a $seqres.full | \
filter_fiemap | grep unwritten
[ $? == 0 ] && _fail "Unwritten extents found!"
and will result in the output:
generic/032 0s.... [failed, exit status 1] - output mismatch (see
....results/generic/032.bad)
And results/generic/032.bad will contain:
....
Unwritten extents found!
(see ..../results/generic/032.full for details)
And the complete fiemap output will be in results/generic/032.full.
> +done
> +
> +echo $iters iterations
> +
> +kill $syncpid
> +wait
> +
> +# clear page cache and dump the file
> +_scratch_remount
> +hexdump $SCRATCH_MNT/file
> +
> +_scratch_unmount
No need to unmount. check does that when checking the filesystem,
and if not the next _require_scratch call will do it....
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|