xfs
[Top] [All Lists]

Re: Bug in xfs_repair

To: Erik Tews <erik@xxxxxxxxxxxxxxxxx>
Subject: Re: Bug in xfs_repair
From: Nathan Scott <nathans@xxxxxxx>
Date: Tue, 17 Jun 2003 16:56:43 +1000
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20030616205718.GA6783@debian.franken.de>
References: <20030616205718.GA6783@debian.franken.de>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.3i
On Mon, Jun 16, 2003 at 10:57:18PM +0200, Erik Tews wrote:
> Hi

hi there,

> I think I found a bug in xfs_repair. It saegfaults when I run it on a

Yep, that looks like a new bug.

> filesystem which is a little bit corrupt. After I ran it with efence, I
> got these backtraces. I think it cannot handle this block-out-of-range
> condition correctly. I have attached all important informations.

Hmm.. these are always difficult to diagnose when I haven't got the
filesystem right in front of me (so I can sit in gdb and xfs_repair
at the same time for interactive debugging).

What is almost certainly happening is we are moving past the end of
the buffer we've read in (ie. the one pointed to by "ablock" below).
This is likely because of either corruption in values in the buffer
itself which repair has not catered for, or corruption of some other
related control value we're using (many of these will be hanging off
the "mp" variable you see in the stack trace.

If you can figure out whats causing the pointer to walk past the end
of the buffer, you've nailed the problem.

Hmmm, what else?  From the trace we can see we're walking the by-block
freespace btree in the ninth allocation group, and in particular we're
up to (ag-relative) block number 338 (you can use the xfs_db "convert"
command to get the real disk address - iirc, theres an example on the
man page describing how to do that).  For a quick fix, you may be able
to zero that block using xfs_db/dd, but it'd be even better to figure
out the underlying cause of the segv...

> (gdb) bt
> #0  0x0807918f in scanfunc_bno (ablock=0x405bd000, level=0, bno=338, agno=9, 
> suspect=0, isroot=1) at scan.c:569
> #1  0x0807760a in scan_sbtree (root=338, nlevels=1, agno=9, suspect=0, 
> func=0x8078d05 <scanfunc_bno>, isroot=1) at scan.c:84
> #2  0x0807b401 in scan_ag (agno=9) at swab.h:125
> #3  0x0806570f in phase2 (mp=0xbfffdc70) at phase2.c:149
> #4  0x0807c9e9 in main (argc=2, argv=0xbfffdc70) at xfs_repair.c:506
> ...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...


cheers.

-- 
Nathan


<Prev in Thread] Current Thread [Next in Thread>