xfs
[Top] [All Lists]

Re: xfs_repair fails with corrupt dinode 17491441757, extent total = 1,

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: xfs_repair fails with corrupt dinode 17491441757, extent total = 1, nblocks = 0. This is a bug.
From: Arkadiusz Miśkiewicz <arekm@xxxxxxxx>
Date: Thu, 3 Nov 2011 11:40:46 +0100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20111103102609.GA12066@xxxxxxxxxxxxx>
References: <201110311156.20421.arekm@xxxxxxxx> <20111103102609.GA12066@xxxxxxxxxxxxx>
User-agent: KMail/1.13.7 (Linux/3.1.0-final-dirty; KDE/4.7.3; x86_64; ; )
On Thursday 03 of November 2011, Christoph Hellwig wrote:
> On Mon, Oct 31, 2011 at 11:56:20AM +0100, Arkadiusz Mi??kiewicz wrote:
> > xfs_repair version 3.1.6
> > 
> > disconnected inode 17491441754, moving to lost+found
> > disconnected inode 17491441755, moving to lost+found
> > disconnected inode 17491441756, moving to lost+found
> > disconnected inode 17491441757, moving to lost+found
> > corrupt dinode 17491441757, extent total = 1, nblocks = 0.  This is a
> > bug. Please capture the filesystem metadata with xfs_metadump and
> > report it to xfs@xxxxxxxxxxxx
> > cache_node_purge: refcount was 1, not zero (node=0x21450c90)
> > 
> > fatal error -- 117 - couldn't iget disconnected inode
> > 
> > 30GB metadump image, 6.1GB compressed of ~7TB real partition
> > http://ixion.pld-linux.org/~arekm/lv_storage1.metadump.xz
> > 
> > You need ~8-12GB of memory for xfs_repair on this.
> > 
> > I can also provide ssh access to the system with this image and all
> > needed stuff, so you don't need to download it or waste own resources.
> 
> I think I understand the problem - we found a disconnected inode,
> which we try to move to lost + found.  For some reason the inode
> is found to be incorrect by xfs_iformat, so iget bailds out.
> 
> The fix will be to do a pass over the the inodes we want to move
> to correct such inconsistencies and/or junk them.  I'll try to prepare
> a fix as soon as I get some time, but I'm fairly busy at the moment.
> 
> Btw, what did you to to the fs?  Having the total blocks out of sync
> with the numbers in the data and attribute forks seems like an extremly
> unusal error case.

Well,

This serwer has 16 various SATA disk connected to art-of-crap controller - 
Promise SuperTrak EX16350.

The system exhibits funny issues with intel_idle driver 
(https://lkml.org/lkml/2011/10/28/270).

It has only 8GB of ram which xfs_repair eats for breakfast causing watchdog to 
reboot machine while xfs_repair was in progress (would be nice if repair could 
estimate needed ram before it is too late).

All these issues combined caused few reboots in which some were in middle of 
xfs_repair. Most likely all that caused such corruption.

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

<Prev in Thread] Current Thread [Next in Thread>