xfs
[Top] [All Lists]

Re: XFS: Internal error XFS_WANT_CORRUPTED_RETURN

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: XFS: Internal error XFS_WANT_CORRUPTED_RETURN
From: Dave Jones <davej@xxxxxxxxxx>
Date: Thu, 12 Dec 2013 11:20:36 -0500
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <52A9E0EF.1000206@xxxxxxxxxxx>
References: <20131211172725.GA4606@xxxxxxxxxx> <20131211230128.GM10988@dastard> <52A9E0EF.1000206@xxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Dec 12, 2013 at 10:14:39AM -0600, Eric Sandeen wrote:
 > On 12/11/13, 5:01 PM, Dave Chinner wrote:
 > > On Wed, Dec 11, 2013 at 12:27:25PM -0500, Dave Jones wrote:
 > >> Powered up my desktop this morning and noticed I couldn't cd into ~/Mail
 > >> dmesg didn't look good.  "XFS: Internal error XFS_WANT_CORRUPTED_RETURN"
 > >> http://codemonkey.org.uk/junk/xfs-1.txt
 > > 
 > > They came from xfs_dir3_block_verify() on read IO completion, which
 > > indicates that the corruption was on disk and in the directory
 > > structure. Yeah, definitely a verifier error:
 > > 
 > > XFS (sda3): metadata I/O error: block 0x2e790 ("xfs_trans_read_buf_map") 
 > > error 117 numblks 8
 > > 
 > > Are you running a CRC enabled filesystem? (i.e. mkfs.xfs -m crc=1)
 > > 
 > > Is there any evidence that this verifier has fired in the past on
 > > write? If not, then it's a good chance that it's a media error
 > > causing this, because the same verifier runs when the metadata is
 > > written to ensure we are not writing bas stuff to disk.
 > 
 > Dave C, have you given any thought to how to make the verifier errors more
 > actionable?  If davej throws up his hands, the rest of the world is obviously
 > in trouble.  ;)
 > 
 > To the inexperienced this looks like a "crash" thanks to the backtrace.
 > I do understand that it's necessary for bug reports, but I wonder if we
 > could preface it with something informative or instructive.
 > 
 > We also don't get a block number or inode number, although you or I can
 > dig the inode number out of the hexdump, in this case.
 > 
 > We also don't get any details of what the values in the failed check were;
 > not from the check macro itself or from the hexdump, necessarily, since
 > it only prints the first handful of bytes.

This morning, same ssd spewed a bunch of other errors when find ran over a 
kernel tree..

http://paste.fedoraproject.org/61189/38686344

in that case I did get a block number (the irony of the failing block # is not 
lost on me)

As soon as the new one arrives, I'll try some destructive tests on the failing 
one.

I'm just happy it's stayed alive long enough for me to get the data off it.
When my Intel SSD failed earlier this year it was just a brick.

        Dave

<Prev in Thread] Current Thread [Next in Thread>