xfs
[Top] [All Lists]

Re: XFS CRC errors after a crash

To: Jan Kara <jack@xxxxxxx>
Subject: Re: XFS CRC errors after a crash
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 26 Jun 2014 07:59:52 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140625164939.GA27620@xxxxxxxxxxxxx>
References: <20140625164939.GA27620@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Jun 25, 2014 at 06:49:39PM +0200, Jan Kara wrote:
>   Hello,
> 
>   so I've been crash-testing XFS (just killing KVM with XFS filesystem
> mounted) a bit with V5 superblock enabled in 3.16-rc1 and I can pretty
> easily hit CRC mismatches after that. Kernel complains like:

Yes, we had that reported yesterday by another person, using virtual
box. I've been unable to reproduce it on my local KVM VMs, so I'm
wondering what your configuration KVM configuration is?

> [518184.794175] XFS (sdb3): Mounting V5 Filesystem
> [518184.902898] XFS (sdb3): Starting recovery (logdev: internal)
> [518187.118860] XFS (sdb3): Metadata CRC error detected at 
> xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1
> [518187.118870] XFS (sdb3): Unmount and run xfs_repair
> [518187.118875] XFS (sdb3): First 64 bytes of corrupted metadata buffer:
> [518187.118882] ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f 
> aa 40  XAGF...........@
> [518187.118887] ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 
> 00 01  ..mS..w.........
> [518187.118891] ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 
> 00 03  ................
> [518187.118895] ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 
> 00 00  ................
> [518187.118923] XFS (sdb3): metadata I/O error: block 0x1 
> ("xfs_trans_read_buf_map") error 74 numblks 1
> 
> So it seem like the checksum doesn't get updated properly in all the cases.
> Looking into the logdump, there doesn't seem to be any modifications for
> this AGF block in unrelayed part of the log but there are some modifications
> in the older parts of the log - the latest LSN where block 1 was updated is
> 1,4639 (and the buffer contents in the log corresponds to the data I see in
> block 1). However the lsn field in AGF structure in that block shows 1,3616
> so that really seems stale (and I've checked and in that transaction the
> block has been modified as well).

That tallies with what has been reported -it was the AGI block,
however. What I know so far is that the CRC matches for the version
of the structure logged at the apparent LSN, but the data is more
recent.

Now the only way I can see the data getting updated without the LSN
being updates is through log recovery, the analysis is here:

http://oss.sgi.com/pipermail/xfs/2014-June/036938.html

At the bottom of the email is a request for a information resulting
from a reproduction cycle. Can you run that cycle and provide the
metadumps and dmesg when a problem is first found?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>