[Top] [All Lists]

[problem] xfstests generic/311 unreliable...

To: xfs@xxxxxxxxxxx
Subject: [problem] xfstests generic/311 unreliable...
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 7 May 2013 17:11:02 +1000
Delivered-to: xfs@xxxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
Hi Josef,

I was just looking at a generic/311, and I think there's something
fundamentally wrong with the way it is checking the scratch device.

You reported it was failing for internal test 19 on XFS, but I'm
seeing is fail after the first test or 2, randomly. It has never
made it past test 3. So I had a little bit of a closer look at it's
structure. Essentially it is doing this (and the contents seen by
each step:

scratch dev + mkfs
overlay dm-flakey
mount/write/kill/unmount dm-flakey

All good up to here. Now, you can _check_scratch_fs which sees:

scratch dev + check

i.e. it's not seeing all the changes written to dm-flakey and so
xfs-check it seeing corruption.

After I realised this was stacking block devices and checking the
underlying block device, the cause was pretty obvious: scratch-dev
and dm-flakey have different address spaces, so changes written
throughone address space will not be seen through the other address
space if there is stale cached data in the original address space.

And that's exactly what is happening. This patch:

--- a/tests/generic/311
+++ b/tests/generic/311
@@ -79,6 +79,7 @@ _mount_flakey()
+       echo 3 > /proc/sys/vm/drop_caches

Makes the problem go away for xfs_check. But really, I don't like
the assumption that the test is built on - that writes through one
block device are visible through another. It's just asking for weird

Is there some way that you can restructure this test so it doesn't
have this problem (e.g. do everything on dm-flakey)?


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>