xfs
[Top] [All Lists]

Re: xfs_repair crashing (versions 3.1.4 and 3.1.5)

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5)
From: Anisse Astier <anisse@xxxxxxxxx>
Date: Wed, 4 May 2011 12:24:03 +0200
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs@xxxxxxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>
In-reply-to: <20110504091141.GA30330@xxxxxxxxxxxxx>
References: <BANLkTikh1i3aYgXNZut+AGT-1kz=aqv-Eg@xxxxxxxxxxxxxx> <20110419082705.GI23985@dastard> <20110419130737.45beb611@xxxxxxxxxxxxxxxxx> <4DB084CE.8020600@xxxxxxxxxxx> <20110422130920.7be686c6@xxxxxxxxxxxxxxxxx> <20110504091141.GA30330@xxxxxxxxxxxxx>
On Wed, May 4, 2011 at 11:11 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> On Fri, Apr 22, 2011 at 01:09:20PM +0200, Anisse Astier wrote:
>> Yep, I figured that much, it just took me a while to get up & running
>> another system capable of building xfsprogs.
>>
>> Now that I have that, and that I commented the do_warn, xfs_repair is
>> still running after the previous failing point:
>> [???]
>>         - agno = 17
>> bad key in bmbt root (is 73434, would reset to 74194) in inode 2283178100 
>> data fork
>> bad fwd (right) sibling pointer (saw 145202888 should be NULLDFSBNO)
>> bad data fork in inode 2283178100
>> would have cleared inode 2283178100
>>         - agno = 18
>> [???] (ongoing)
>>
>> Once this is done, I'll test with %llu instead of %u.
>>
>> But please be patient, it's a 900GB filesystem (half-full) with just an 800
>> MHz ARM9 processor doing the work, so xfs_repair takes hours to complete.
>> Plus I won't have time to do many tests before next week.
>>
>> To be continued.
>
> Any updates?

Well, Dave had it all figured, and replacing %u by %llu fixes indeed
the problem.

Just for future reference, the stack of crashing process:
#0  strlen () at ../ports/sysdeps/arm/strlen.S:29
#1  0x40204f78 in _IO_vfprintf_internal (s=0xbe9a9730,
format=0xbe9a7676 "27", ap=<value optimized out>) at vfprintf.c:1614
#2  0x40205f70 in buffered_vfprintf (s=0x402f2668, format=0x88168874
<Address 0x88168874 out of bounds>, args=...) at vfprintf.c:2254
#3  0x40201a44 in _IO_vfprintf_internal (s=0x402f2668, format=0x7c198
"\tin inode %u (%s fork) bmap btree block %llu\n", ap=<value optimized
out>) at vfprintf.c:1306
#4  0x0003cd48 in do_warn (msg=0x7b4dc "data") at xfs_repair.c:379
#5  0x00017088 in process_btinode (mp=<value optimized out>, agno=17,
ino=<value optimized out>, dip=<value optimized out>, type=34387,
dirty=0xbe9aa418, tot=0x5,
    nex=0xbe9aa418, blkmapp=0xbe9aa2d8, whichfork=-1097162040,
check_dups=-1097162016) at dinode.c:1284
#6  0x00017a04 in process_inode_data_fork (mp=<value optimized out>,
agno=17, ino=1476724, dino=0x1db7800, type=5, dirty=0xbe9aa418,
totblocks=0xbe9aa2d8, nextents=0xbe9aa2c8,
    dblkmap=0xbe9aa2e0, check_dups=0) at dinode.c:2048
#7  0x0001a5f0 in process_dinode_int (mp=<value optimized out>,
dino=0x1db7800, agno=<value optimized out>, ino=<value optimized out>,
was_free=0, dirty=0x1ad34, used=0x0,
    verify_mode=-1097161704, uncertain=0, ino_discovery=1,
check_dups=0, extra_attr_check=1, isa_dir=0x0, parent=0xbe9aa408) at
dinode.c:2631
#8  0x0001ad34 in process_dinode (mp=0x7c198, dino=0x1b,
agno=2283178100, ino=0, was_free=0, dirty=0xbe9aa418, used=0xbe9aa41c,
ino_discovery=1, check_dups=0,
    extra_attr_check=1, isa_dir=0xbe9aa414, parent=0xbe9aa408) at dinode.c:2773
#9  0x00010630 in process_inode_chunk (mp=0xbe9aa508, agno=17,
num_inos=<value optimized out>, first_irec=<value optimized out>,
ino_discovery=1, check_dups=0,
    extra_attr_check=1, bogus=0x0) at dino_chunks.c:777
#10 0x000110ec in process_aginodes (mp=0xbe9aa508, pf_args=0xed5c8,
agno=17, ino_discovery=1, check_dups=0, extra_attr_check=1) at
dino_chunks.c:1024
#11 0x00028724 in process_ag_func (wq=0x400608, agno=17, arg=0xed5c8)
at phase3.c:154
#12 0x00028e24 in process_ags (mp=0xbe9aa508) at phase3.c:193
#13 phase3 (mp=0xbe9aa508) at phase3.c:232
#14 0x0003ddd8 in main (argc=<value optimized out>, argv=<value
optimized out>) at xfs_repair.c:712

>
> In the meantime I cooked up a little patch (below) to add format string
> checking to the repair-internal varargs printing helpers, which produces
> a lot of warnings.  A lot of that is different underlying types for
> fixes-size 64-bit types, but there's quite a few legit errors there as
> well.
>
>
> Index: xfsprogs-dev/repair/err_protos.h
> ===================================================================
> --- xfsprogs-dev.orig/repair/err_protos.h       2011-04-22 12:45:25.018475622 
> +0200
> +++ xfsprogs-dev/repair/err_protos.h    2011-04-22 12:47:22.014508467 +0200
> @@ -17,10 +17,14 @@
>  */
>
>  /* abort, internal error */
> -void  __attribute__((noreturn)) do_abort(char const *, ...);
> +void  __attribute__((noreturn)) do_abort(char const *, ...)
> +       __attribute__((format(printf,1,2)));
>  /* abort, system error */
> -void  __attribute__((noreturn)) do_error(char const *, ...);
> +void  __attribute__((noreturn)) do_error(char const *, ...)
> +       __attribute__((format(printf,1,2)));
>  /* issue warning */
> -void do_warn(char const *, ...);
> +void do_warn(char const *, ...)
> +       __attribute__((format(printf,1,2)));
>  /* issue log message */
> -void do_log(char const *, ...);
> +void do_log(char const *, ...)
> +       __attribute__((format(printf,1,2)));
>

I'll give it a try.

Anisse

<Prev in Thread] Current Thread [Next in Thread>