xfs_repair crashing (versions 3.1.4 and 3.1.5)
Anisse Astier
anisse at astier.eu
Thu May 5 17:46:44 CDT 2011
On Wed, May 4, 2011 at 12:24 PM, Anisse Astier <anisse at astier.eu> wrote:
> On Wed, May 4, 2011 at 11:11 AM, Christoph Hellwig <hch at infradead.org> wrote:
>> On Fri, Apr 22, 2011 at 01:09:20PM +0200, Anisse Astier wrote:
>>> Yep, I figured that much, it just took me a while to get up & running
>>> another system capable of building xfsprogs.
>>>
>>> Now that I have that, and that I commented the do_warn, xfs_repair is
>>> still running after the previous failing point:
>>> [???]
>>> - agno = 17
>>> bad key in bmbt root (is 73434, would reset to 74194) in inode 2283178100 data fork
>>> bad fwd (right) sibling pointer (saw 145202888 should be NULLDFSBNO)
>>> bad data fork in inode 2283178100
>>> would have cleared inode 2283178100
>>> - agno = 18
>>> [???] (ongoing)
>>>
>>> Once this is done, I'll test with %llu instead of %u.
>>>
>>> But please be patient, it's a 900GB filesystem (half-full) with just an 800
>>> MHz ARM9 processor doing the work, so xfs_repair takes hours to complete.
>>> Plus I won't have time to do many tests before next week.
>>>
>>> To be continued.
>>
>> Any updates?
>
> Well, Dave had it all figured, and replacing %u by %llu fixes indeed
> the problem.
>
> Just for future reference, the stack of crashing process:
> #0 strlen () at ../ports/sysdeps/arm/strlen.S:29
> #1 0x40204f78 in _IO_vfprintf_internal (s=0xbe9a9730,
> format=0xbe9a7676 "27", ap=<value optimized out>) at vfprintf.c:1614
> #2 0x40205f70 in buffered_vfprintf (s=0x402f2668, format=0x88168874
> <Address 0x88168874 out of bounds>, args=...) at vfprintf.c:2254
> #3 0x40201a44 in _IO_vfprintf_internal (s=0x402f2668, format=0x7c198
> "\tin inode %u (%s fork) bmap btree block %llu\n", ap=<value optimized
> out>) at vfprintf.c:1306
> #4 0x0003cd48 in do_warn (msg=0x7b4dc "data") at xfs_repair.c:379
> #5 0x00017088 in process_btinode (mp=<value optimized out>, agno=17,
> ino=<value optimized out>, dip=<value optimized out>, type=34387,
> dirty=0xbe9aa418, tot=0x5,
> nex=0xbe9aa418, blkmapp=0xbe9aa2d8, whichfork=-1097162040,
> check_dups=-1097162016) at dinode.c:1284
> #6 0x00017a04 in process_inode_data_fork (mp=<value optimized out>,
> agno=17, ino=1476724, dino=0x1db7800, type=5, dirty=0xbe9aa418,
> totblocks=0xbe9aa2d8, nextents=0xbe9aa2c8,
> dblkmap=0xbe9aa2e0, check_dups=0) at dinode.c:2048
> #7 0x0001a5f0 in process_dinode_int (mp=<value optimized out>,
> dino=0x1db7800, agno=<value optimized out>, ino=<value optimized out>,
> was_free=0, dirty=0x1ad34, used=0x0,
> verify_mode=-1097161704, uncertain=0, ino_discovery=1,
> check_dups=0, extra_attr_check=1, isa_dir=0x0, parent=0xbe9aa408) at
> dinode.c:2631
> #8 0x0001ad34 in process_dinode (mp=0x7c198, dino=0x1b,
> agno=2283178100, ino=0, was_free=0, dirty=0xbe9aa418, used=0xbe9aa41c,
> ino_discovery=1, check_dups=0,
> extra_attr_check=1, isa_dir=0xbe9aa414, parent=0xbe9aa408) at dinode.c:2773
> #9 0x00010630 in process_inode_chunk (mp=0xbe9aa508, agno=17,
> num_inos=<value optimized out>, first_irec=<value optimized out>,
> ino_discovery=1, check_dups=0,
> extra_attr_check=1, bogus=0x0) at dino_chunks.c:777
> #10 0x000110ec in process_aginodes (mp=0xbe9aa508, pf_args=0xed5c8,
> agno=17, ino_discovery=1, check_dups=0, extra_attr_check=1) at
> dino_chunks.c:1024
> #11 0x00028724 in process_ag_func (wq=0x400608, agno=17, arg=0xed5c8)
> at phase3.c:154
> #12 0x00028e24 in process_ags (mp=0xbe9aa508) at phase3.c:193
> #13 phase3 (mp=0xbe9aa508) at phase3.c:232
> #14 0x0003ddd8 in main (argc=<value optimized out>, argv=<value
> optimized out>) at xfs_repair.c:712
>
>>
>> In the meantime I cooked up a little patch (below) to add format string
>> checking to the repair-internal varargs printing helpers, which produces
>> a lot of warnings. A lot of that is different underlying types for
>> fixes-size 64-bit types, but there's quite a few legit errors there as
>> well.
>>
>>
>> Index: xfsprogs-dev/repair/err_protos.h
>> ===================================================================
>> --- xfsprogs-dev.orig/repair/err_protos.h 2011-04-22 12:45:25.018475622 +0200
>> +++ xfsprogs-dev/repair/err_protos.h 2011-04-22 12:47:22.014508467 +0200
>> @@ -17,10 +17,14 @@
>> */
>>
>> /* abort, internal error */
>> -void __attribute__((noreturn)) do_abort(char const *, ...);
>> +void __attribute__((noreturn)) do_abort(char const *, ...)
>> + __attribute__((format(printf,1,2)));
>> /* abort, system error */
>> -void __attribute__((noreturn)) do_error(char const *, ...);
>> +void __attribute__((noreturn)) do_error(char const *, ...)
>> + __attribute__((format(printf,1,2)));
>> /* issue warning */
>> -void do_warn(char const *, ...);
>> +void do_warn(char const *, ...)
>> + __attribute__((format(printf,1,2)));
>> /* issue log message */
>> -void do_log(char const *, ...);
>> +void do_log(char const *, ...)
>> + __attribute__((format(printf,1,2)));
>>
>
> I'll give it a try.
Before:
Building repair
[DEP]
[CC] agheader.o
[CC] attr_repair.o
[CC] avl.o
[CC] avl64.o
[CC] bmap.o
[CC] btree.o
[CC] dino_chunks.o
[CC] dinode.o
[CC] dir.o
[CC] dir2.o
[CC] globals.o
[CC] incore.o
[CC] incore_bmc.o
[CC] init.o
[CC] incore_ext.o
[CC] incore_ino.o
[CC] phase1.o
[CC] phase2.o
[CC] phase3.o
[CC] phase4.o
[CC] phase5.o
[CC] phase6.o
[CC] phase7.o
[CC] progress.o
[CC] prefetch.o
[CC] rt.o
[CC] sb.o
[CC] scan.o
[CC] threads.o
[CC] versions.o
[CC] xfs_repair.o
[LD] xfs_repair
After:
Building repair
[DEP]
[CC] agheader.o
[CC] attr_repair.o
[CC] avl.o
[CC] avl64.o
[CC] bmap.o
bmap.c: In function 'blkmap_getn':
bmap.c:145: warning: format '%u' expects type 'unsigned int', but
argument 2 has type 'xfs_dfilblks_t'
[CC] btree.o
[CC] dino_chunks.o
[CC] dinode.o
dinode.c: In function 'process_btinode':
dinode.c:1272: warning: format '%d' expects type 'int', but argument 4
has type '__uint64_t'
dinode.c:1287: warning: format '%u' expects type 'unsigned int', but
argument 2 has type 'long long unsigned int'
[CC] dir.o
[CC] dir2.o
[CC] globals.o
[CC] incore.o
[CC] incore_bmc.o
[CC] init.o
[CC] incore_ext.o
[CC] incore_ino.o
[CC] phase1.o
[CC] phase2.o
[CC] phase3.o
[CC] phase4.o
[CC] phase5.o
[CC] phase6.o
phase6.c: In function 'longform_dir2_entry_check':
phase6.c:2479: warning: format '%u' expects type 'unsigned int', but
argument 2 has type 'xfs_fsize_t'
phase6.c: In function 'shortform_dir_entry_check':
phase6.c:2815: warning: too many arguments for format
[CC] phase7.o
[CC] progress.o
[CC] prefetch.o
[CC] rt.o
[CC] sb.o
sb.c: In function 'get_sb':
sb.c:491: warning: too many arguments for format
[CC] scan.o
scan.c: In function 'scanfunc_allocbt':
scan.c:567: warning: format '%d' expects type 'int', but argument 4
has type 'const char *'
scan.c:573: warning: format '%d' expects type 'int', but argument 4
has type 'const char *'
scan.c: In function 'validate_agf':
scan.c:1114: warning: format '%u' expects type 'unsigned int', but
argument 3 has type '__uint64_t'
[CC] threads.o
[CC] versions.o
[CC] xfs_repair.o
xfs_repair.c: In function 'calc_mkfs':
xfs_repair.c:457: warning: format '%lu' expects type 'long unsigned
int', but argument 4 has type 'xfs_agino_t'
xfs_repair.c:462: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:466: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:480: warning: format '%lu' expects type 'long unsigned
int', but argument 4 has type 'xfs_agino_t'
xfs_repair.c:485: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:489: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:503: warning: format '%lu' expects type 'long unsigned
int', but argument 4 has type 'xfs_agino_t'
xfs_repair.c:508: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
xfs_repair.c:512: warning: format '%lu' expects type 'long unsigned
int', but argument 2 has type 'xfs_agino_t'
[LD] xfs_repair
Stating the obvious here, but we can see the supposed crash cause in
dinode.c (%u instead %llu) is preceded by a similar error (%d instead
of %llu).
There are also other warnings that need fixing.
Anisse
More information about the xfs
mailing list