xfs
[Top] [All Lists]

Re: xfs_repair crashing (versions 3.1.4 and 3.1.5)

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: xfs_repair crashing (versions 3.1.4 and 3.1.5)
From: Anisse Astier <anisse@xxxxxxxxx>
Date: Fri, 6 May 2011 00:46:44 +0200
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs@xxxxxxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>
In-reply-to: <BANLkTinCh1xrSv_ZFJin-OcD1xQb6V+EDw@xxxxxxxxxxxxxx>
References: <BANLkTikh1i3aYgXNZut+AGT-1kz=aqv-Eg@xxxxxxxxxxxxxx> <20110419082705.GI23985@dastard> <20110419130737.45beb611@xxxxxxxxxxxxxxxxx> <4DB084CE.8020600@xxxxxxxxxxx> <20110422130920.7be686c6@xxxxxxxxxxxxxxxxx> <20110504091141.GA30330@xxxxxxxxxxxxx> <BANLkTinCh1xrSv_ZFJin-OcD1xQb6V+EDw@xxxxxxxxxxxxxx>
On Wed, May 4, 2011 at 12:24 PM, Anisse Astier <anisse@xxxxxxxxx> wrote:
> On Wed, May 4, 2011 at 11:11 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
>> On Fri, Apr 22, 2011 at 01:09:20PM +0200, Anisse Astier wrote:
>>> Yep, I figured that much, it just took me a while to get up & running
>>> another system capable of building xfsprogs.
>>>
>>> Now that I have that, and that I commented the do_warn, xfs_repair is
>>> still running after the previous failing point:
>>> [???]
>>>         - agno = 17
>>> bad key in bmbt root (is 73434, would reset to 74194) in inode 2283178100 
>>> data fork
>>> bad fwd (right) sibling pointer (saw 145202888 should be NULLDFSBNO)
>>> bad data fork in inode 2283178100
>>> would have cleared inode 2283178100
>>>         - agno = 18
>>> [???] (ongoing)
>>>
>>> Once this is done, I'll test with %llu instead of %u.
>>>
>>> But please be patient, it's a 900GB filesystem (half-full) with just an 800
>>> MHz ARM9 processor doing the work, so xfs_repair takes hours to complete.
>>> Plus I won't have time to do many tests before next week.
>>>
>>> To be continued.
>>
>> Any updates?
>
> Well, Dave had it all figured, and replacing %u by %llu fixes indeed
> the problem.
>
> Just for future reference, the stack of crashing process:
> #0  strlen () at ../ports/sysdeps/arm/strlen.S:29
> #1  0x40204f78 in _IO_vfprintf_internal (s=0xbe9a9730,
> format=0xbe9a7676 "27", ap=<value optimized out>) at vfprintf.c:1614
> #2  0x40205f70 in buffered_vfprintf (s=0x402f2668, format=0x88168874
> <Address 0x88168874 out of bounds>, args=...) at vfprintf.c:2254
> #3  0x40201a44 in _IO_vfprintf_internal (s=0x402f2668, format=0x7c198
> "\tin inode %u (%s fork) bmap btree block %llu\n", ap=<value optimized
> out>) at vfprintf.c:1306
> #4  0x0003cd48 in do_warn (msg=0x7b4dc "data") at xfs_repair.c:379
> #5  0x00017088 in process_btinode (mp=<value optimized out>, agno=17,
> ino=<value optimized out>, dip=<value optimized out>, type=34387,
> dirty=0xbe9aa418, tot=0x5,
>    nex=0xbe9aa418, blkmapp=0xbe9aa2d8, whichfork=-1097162040,
> check_dups=-1097162016) at dinode.c:1284
> #6  0x00017a04 in process_inode_data_fork (mp=<value optimized out>,
> agno=17, ino=1476724, dino=0x1db7800, type=5, dirty=0xbe9aa418,
> totblocks=0xbe9aa2d8, nextents=0xbe9aa2c8,
>    dblkmap=0xbe9aa2e0, check_dups=0) at dinode.c:2048
> #7  0x0001a5f0 in process_dinode_int (mp=<value optimized out>,
> dino=0x1db7800, agno=<value optimized out>, ino=<value optimized out>,
> was_free=0, dirty=0x1ad34, used=0x0,
>    verify_mode=-1097161704, uncertain=0, ino_discovery=1,
> check_dups=0, extra_attr_check=1, isa_dir=0x0, parent=0xbe9aa408) at
> dinode.c:2631
> #8  0x0001ad34 in process_dinode (mp=0x7c198, dino=0x1b,
> agno=2283178100, ino=0, was_free=0, dirty=0xbe9aa418, used=0xbe9aa41c,
> ino_discovery=1, check_dups=0,
>    extra_attr_check=1, isa_dir=0xbe9aa414, parent=0xbe9aa408) at dinode.c:2773
> #9  0x00010630 in process_inode_chunk (mp=0xbe9aa508, agno=17,
> num_inos=<value optimized out>, first_irec=<value optimized out>,
> ino_discovery=1, check_dups=0,
>    extra_attr_check=1, bogus=0x0) at dino_chunks.c:777
> #10 0x000110ec in process_aginodes (mp=0xbe9aa508, pf_args=0xed5c8,
> agno=17, ino_discovery=1, check_dups=0, extra_attr_check=1) at
> dino_chunks.c:1024
> #11 0x00028724 in process_ag_func (wq=0x400608, agno=17, arg=0xed5c8)
> at phase3.c:154
> #12 0x00028e24 in process_ags (mp=0xbe9aa508) at phase3.c:193
> #13 phase3 (mp=0xbe9aa508) at phase3.c:232
> #14 0x0003ddd8 in main (argc=<value optimized out>, argv=<value
> optimized out>) at xfs_repair.c:712
>
>>
>> In the meantime I cooked up a little patch (below) to add format string
>> checking to the repair-internal varargs printing helpers, which produces
>> a lot of warnings.  A lot of that is different underlying types for
>> fixes-size 64-bit types, but there's quite a few legit errors there as
>> well.
>>
>>
>> Index: xfsprogs-dev/repair/err_protos.h
>> ===================================================================
>> --- xfsprogs-dev.orig/repair/err_protos.h       2011-04-22 
>> 12:45:25.018475622 +0200
>> +++ xfsprogs-dev/repair/err_protos.h    2011-04-22 12:47:22.014508467 +0200
>> @@ -17,10 +17,14 @@
>>  */
>>
>>  /* abort, internal error */
>> -void  __attribute__((noreturn)) do_abort(char const *, ...);
>> +void  __attribute__((noreturn)) do_abort(char const *, ...)
>> +       __attribute__((format(printf,1,2)));
>>  /* abort, system error */
>> -void  __attribute__((noreturn)) do_error(char const *, ...);
>> +void  __attribute__((noreturn)) do_error(char const *, ...)
>> +       __attribute__((format(printf,1,2)));
>>  /* issue warning */
>> -void do_warn(char const *, ...);
>> +void do_warn(char const *, ...)
>> +       __attribute__((format(printf,1,2)));
>>  /* issue log message */
>> -void do_log(char const *, ...);
>> +void do_log(char const *, ...)
>> +       __attribute__((format(printf,1,2)));
>>
>
> I'll give it a try.
Before:

Building repair
    [DEP]
    [CC]     agheader.o
    [CC]     attr_repair.o
    [CC]     avl.o
    [CC]     avl64.o
    [CC]     bmap.o
    [CC]     btree.o
    [CC]     dino_chunks.o
    [CC]     dinode.o
    [CC]     dir.o
    [CC]     dir2.o
    [CC]     globals.o
    [CC]     incore.o
    [CC]     incore_bmc.o
    [CC]     init.o
    [CC]     incore_ext.o
    [CC]     incore_ino.o
    [CC]     phase1.o
    [CC]     phase2.o
    [CC]     phase3.o
    [CC]     phase4.o
    [CC]     phase5.o
    [CC]     phase6.o
    [CC]     phase7.o
    [CC]     progress.o
    [CC]     prefetch.o
    [CC]     rt.o
    [CC]     sb.o
    [CC]     scan.o
    [CC]     threads.o
    [CC]     versions.o
    [CC]     xfs_repair.o
    [LD]     xfs_repair

After:

Building repair
    [DEP]
    [CC]     agheader.o
    [CC]     attr_repair.o
    [CC]     avl.o
    [CC]     avl64.o
    [CC]     bmap.o
bmap.c: In function 'blkmap_getn':
bmap.c:145: warning: format '%u' expects type 'unsigned int', but
argument 2 has type 'xfs_dfilblks_t'
    [CC]     btree.o
    [CC]     dino_chunks.o
    [CC]     dinode.o
dinode.c: In function 'process_btinode':
dinode.c:1272: warning: format '%d' expects type 'int', but argument 4
has type '__uint64_t'
dinode.c:1287: warning: format '%u' expects type 'unsigned int', but
argument 2 has type 'long long unsigned int'
    [CC]     dir.o
    [CC]     dir2.o
    [CC]     globals.o
    [CC]     incore.o
    [CC]     incore_bmc.o
    [CC]     init.o
    [CC]     incore_ext.o
    [CC]     incore_ino.o
    [CC]     phase1.o
    [CC]     phase2.o
    [CC]     phase3.o
    [CC]     phase4.o
    [CC]     phase5.o
    [CC]     phase6.o
phase6.c: In function 'longform_dir2_entry_check':
phase6.c:2479: warning: format '%u' expects type 'unsigned int', but
argument 2 has type 'xfs_fsize_t'
phase6.c: In function 'shortform_dir_entry_check':
phase6.c:2815: warning: too many arguments for format
    [CC]     phase7.o
    [CC]     progress.o
    [CC]     prefetch.o
    [CC]     rt.o
    [CC]     sb.o
sb.c: In function 'get_sb':
sb.c:491: warning: too many arguments for format
    [CC]     scan.o
scan.c: In function 'scanfunc_allocbt':
scan.c:567: warning: format '%d' expects type 'int', but argument 4
has type 'const char *'
scan.c:573: warning: format '%d' expects type 'int', but argument 4
has type 'const char *'
scan.c: In function 'validate_agf':
scan.c:1114: warning: format '%u' expects type 'unsigned int', but
argument 3 has type '__uint64_t'
    [CC]     threads.o
    [CC]     versions.o
    [CC]     xfs_repair.o
xfs_repair.c: In function 'calc_mkfs':
xfs_repair.c:457: warning: format '%lu' expects type 'long unsigned
int', but argument 4 has type 'xfs_agino_t'
xfs_repair.c:462: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:466: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:480: warning: format '%lu' expects type 'long unsigned
int', but argument 4 has type 'xfs_agino_t'
xfs_repair.c:485: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:489: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:503: warning: format '%lu' expects type 'long unsigned
int', but argument 4 has type 'xfs_agino_t'
xfs_repair.c:508: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:512: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
    [LD]     xfs_repair


Stating the obvious here, but we can see the supposed crash cause in
dinode.c (%u instead %llu) is preceded by a similar error (%d instead
of %llu).
There are also other warnings that need fixing.

Anisse

<Prev in Thread] Current Thread [Next in Thread>