xfs
[Top] [All Lists]

Re: xfs_repair 3.1.4/3.1.5: fatal error -- couldn't malloc dir2 buffer d

To: Marc Lehmann <schmorp@xxxxxxxxxx>
Subject: Re: xfs_repair 3.1.4/3.1.5: fatal error -- couldn't malloc dir2 buffer data
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 8 Aug 2011 10:29:46 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20110806233913.GH3162@dastard>
References: <20110806121728.GA20341@xxxxxxxxxx> <20110806141241.GF3162@dastard> <20110806175428.GA1900@xxxxxxxxxx> <20110806233913.GH3162@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sun, Aug 07, 2011 at 09:39:13AM +1000, Dave Chinner wrote:
> On Sat, Aug 06, 2011 at 07:54:28PM +0200, Marc Lehmann wrote:
> > On Sun, Aug 07, 2011 at 12:12:41AM +1000, Dave Chinner 
> > <david@xxxxxxxxxxxxx> wrote:
> > > > this is 3.1.5 - 3.1.4 simply segfaults. using ltrace shows this as
> > > > last call to malloc:
> > > > 
> > > >    malloc(18446744073708732928)                                  = NULL
> > > > 
> > > > I think thats a bit unreasonable of xfs_repair :)
> > > 
> > > Can you share a metadump of the image in question?
> > 
> > I can, but unfortunately, it's fixed itself in the meantime:
> > 
> > I wanted to make a copy of the image, and mounted it read-write. I stat'ed
> > all files inside (which worked) and then rsynced all files out.
> > 
> > Then I unmounmted it and re-ran xfs_repair
> > (http://ue.tst.eu/3cbc07150eb6b69c63361937c6c3044f.txt) which got much
> > farther, but failed with the same error.
> 
> Looks lke corrupt directory blocks are causing it.
> 
> > Then I re-ran xfs_repair one last time, which ran through without any 
> > "error"
> > messages.
> > 
> > An xfs_metadata -o is here (gzipped):
> > http://data.plan9.de/smoker-chroot.bin.gz
> 
> I'll have a look at it.

$ sudo xfs_repair -V  -f /vm-images/busted.img 
xfs_repair version 3.1.5
$ sudo xfs_repair  -f /vm-images/busted.img 
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 11
        - agno = 1
        - agno = 3
        - agno = 2
        - agno = 4
        - agno = 10
        - agno = 7
        - agno = 5
        - agno = 6
        - agno = 8
        - agno = 9
        - agno = 12
        - agno = 13
        - agno = 15
        - agno = 14
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
$

Yup, there's nothing wrong with the filesystem in the image you
posted.

I need an image of the broken filesystem to be able to find the bug
in xfs_repair. Next time it breaks, can you post the image of the
broken fs? i.e. run xfs_repair -n first to see if it will fail
without trying to repair the corruption it encounters, then take a
metadump before really trying to repair the problem...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>