xfs
[Top] [All Lists]

Re: xfs_repair crashing (versions 3.1.4 and 3.1.5)

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5)
From: Anisse Astier <anisse@xxxxxxxxx>
Date: Fri, 22 Apr 2011 13:09:20 +0200
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <4DB084CE.8020600@xxxxxxxxxxx>
References: <BANLkTikh1i3aYgXNZut+AGT-1kz=aqv-Eg@xxxxxxxxxxxxxx> <20110419082705.GI23985@dastard> <20110419130737.45beb611@xxxxxxxxxxxxxxxxx> <4DB084CE.8020600@xxxxxxxxxxx>
On Thu, 21 Apr 2011 14:26:06 -0500, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote :

> On 4/19/11 6:07 AM, Anisse Astier wrote:
> > On Tue, 19 Apr 2011 18:27:05 +1000, Dave Chinner <david@xxxxxxxxxxxxx> 
> > wrote :
> > 
> >> On Mon, Apr 18, 2011 at 09:24:22PM +0200, Anisse Astier wrote:
> >>> directory flags set on non-directory inode 2283178100, would fix bad 
> >>> flags.
> >>> bad key in bmbt root (is 73434, would reset to 74194) in inode
> >>> 2283178100 data fork
> >>> bad fwd (right) sibling pointer (saw 145202888 should be NULLDFSBNO)
> >>> Segmentation fault
> >>
> >> Hmmm. The very next line doesn't appear before the segfault, making
> >> me think that it's the printf that is causing it to crash.
> >>
> >>         if (check_dups == 0 &&
> >>                 cursor.level[0].right_fsbno != NULLDFSBNO)  {
> >>                 do_warn(
> >>         _("bad fwd (right) sibling pointer (saw %llu should be 
> >> NULLDFSBNO)\n"),
> >>                         cursor.level[0].right_fsbno);
> >>
> >> We get this line of output.
> >>
> >>                 do_warn(
> >>         _("\tin inode %u (%s fork) bmap btree block %llu\n"),
> >>                         XFS_AGINO_TO_INO(mp, agno, ino), forkname,
> >>                         cursor.level[0].fsbno);
> >>
> >> But not this one. I wonder if passing a 64bit number to a %u format
> >> string (shoul dbe %llu) causes problems on ARM? All the variables
> >> are valid as they are printed or accessed elsewhere in the function,
> >> so that's the only thing I can think of without a stack trace to
> >> tell me otherwise....
> > 
> > I have no idea. I did not succeed in getting a stacktrace. CPU is an
> > ARM9, and I used Debian armel squeeze & wheezy  xfsprogs binaries.
> 
> Perhaps you could try removing or fixing the printf Dave suspects, rebuild 
> repair, and run it again?

Yep, I figured that much, it just took me a while to get up & running
another system capable of building xfsprogs.

Now that I have that, and that I commented the do_warn, xfs_repair is
still running after the previous failing point:
[…]
        - agno = 17
bad key in bmbt root (is 73434, would reset to 74194) in inode 2283178100 data 
fork
bad fwd (right) sibling pointer (saw 145202888 should be NULLDFSBNO)
bad data fork in inode 2283178100
would have cleared inode 2283178100
        - agno = 18
[…] (ongoing)

Once this is done, I'll test with %llu instead of %u.

But please be patient, it's a 900GB filesystem (half-full) with just an 800
MHz ARM9 processor doing the work, so xfs_repair takes hours to complete.
Plus I won't have time to do many tests before next week.

To be continued.

Anisse

<Prev in Thread] Current Thread [Next in Thread>