xfs
[Top] [All Lists]

Re: Weird XFS Corruption Error

To: Sascha Askani <saskani@xxxxxxxxx>
Subject: Re: Weird XFS Corruption Error
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sat, 25 Jan 2014 08:52:57 +1100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <FF1E62DE-5CFC-469B-BBF7-F5AB04AD4C0C@xxxxxxxxx>
References: <CDBC891F-BCF6-4B9B-ADDB-9E143973D188@xxxxxxxxx> <20140122233141.GI27606@dastard> <FF1E62DE-5CFC-469B-BBF7-F5AB04AD4C0C@xxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Jan 24, 2014 at 08:56:32AM +0100, Sascha Askani wrote:
> Hi Dave, 
> 
> thanks for your reply and Iâm sorry for the delayed answerâ
> 
> Am 23.01.2014 um 00:31 schrieb Dave Chinner <david@xxxxxxxxxxxxx>:
> 
> > On Wed, Jan 22, 2014 at 05:09:10PM +0100, Sascha Askani wrote:
> > 
> > So, an inode extent map btree block failed verification for some
> > reason. Hmmm - there should have been 4 lines of hexdump output
> > there as well. Can you post that as well? Or have you modified
> > /proc/sys/fs/xfs/error_level to have a value of 0 so it is not
> > emitted?
> > 
> 
> /proc/sys/fs/xfs/error_level is set to 3, sorry for not including this in my 
> original post, the Hexdump is pretty âboringâ (or interesting, depending on 
> your point of view):
> 
> [964197.435322] ffff881f8e59b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00  ................
> [964197.862037] ffff881f8e59b010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00  ................
> [964198.288694] ffff881f8e59b020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00  ................
> [964198.712093] ffff881f8e59b030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00  ................

Yeah, that confirms what I suspected - the buffer has been
overwritten with zeros. That tends to imply *something* has zeroed
the start of the block device, and that's the cause of all the
problems.

> > Oh, wow. Ok, if the primary superblock is gone, along with metadata
> > in the first few blocks of the filesystem, then something has
> > overwritten the start of the block device the filesystem is on.
> > 
> >> 2. mounted the filesystem, which gave me a âStructure needs cleaningâ 
> >> after a couple of seconds
> >> 3. tried mounting again for good measure, same error âStructure needs 
> >> cleaningâ
> > 
> > Right - the kernel can't read a valid superlock, either.
> 
> Just seen this messages in the log which were emitted when trying to mount 
> the FS:
> 
> [964606.038733] XFS (dm-8): metadata I/O error: block 0x200 
> ("xlog_recover_do..(read#2)") error 117 numblks 16
> [964606.515048] XFS (dm-8): log mount/recovery failed: error 117
> [964606.515386] XFS (dm-8): log mount failed

Yup, that's trying to read an inode cluster. It's also right near
the start of the filesystem (0x200 * 512 bytes = 256k into the
filesystem) So log recovery is trying to replay an inode change and
finding the inodes that underly the change in the log are corrupt.

This really looks like something outside the filesystem caused the
problem. It's probably too late to find out what caused it either,
but I'd be checking with your HW vendor(s) about known problems with
their hardware/firmware....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>