[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] 2.6.0-test4-mm1: NFS+XFS=data corruption



Steve Lord <lord@sgi.com> wrote:
>
> > > Is this enough information to help find the cause of the bug? If not,
>  > > it might be several days (if I'm unlucky, maybe even a week or two)
>  > > before I have time to do anything more...
>  > > 
>  > 
>  > -mm kernels have O_DIRECT-for-NFS patches in them.  And some versions of
>  > RPM use O_DIRECT.  Whether O_DIRECT makes any difference at the server end
>  > I do not know, but it would be useful if you could repeat the test on stock
>  > 2.6.0-test4.
>  > 
>  > Alternatively, run
>  > 
>  > 	export LD_ASSUME_KERNEL=2.2.5
>  > 
>  > before running RPM.  I think that should tell RPM to not try O_DIRECT.
> 
>  I doubt the NFS client is O_DIRECT capable here, I have run some rpm
>  builds over nfs to 2.6.0-test4 and an xfs filesystem, everything is
>  behaving so far. I will try mm1 tomorrow.
> 
>  Do we know if this NFS V3 or V2 by the way?

OK, sorry for the noise.  It appears that this is due to the AIO patches in
-mm.  fsx-linux fails instantly on nfsv3 to localhost on XFS.  It's OK on
ext2 for some reason.

Binary searching reveals that the offending patch is
O_SYNC-speedup-nolock-fix.patch

testcase:

	mkfs.xfs -f /dev/hda5
	mount /dev/hda5 /mnt/hda5
	chmod a+rw /mnt/hda5
	service nfs start
	mount localhost:/mnt/hda5 /mnt/localhost
	cd /mnt/localhost
	fsx-linux foo


truncating to largest ever: 0x13e76
READ BAD DATA: offset = 0x18f13, size = 0xee06, fname = foo
OFFSET  GOOD    BAD     RANGE
0x26000 0x02eb  0x0000  0x    0
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x26001 0xeb02  0x0000  0x    1
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x26002 0x0228  0x0000  0x    2
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops