Bugzilla – Bug 712
2.6.17.7 XFS internal error xfs_da_do_buf
Last modified: 2009-06-22 06:09:09 CDT
I found that xfs behaves unstable on recent kernel 2.6.17.7. I put here results of doing one test: untaring big file on xfs volume. Please look at results, also as if some additional information is required. Hardware: Production board based on iop80321 Xscale CPU (ARM). Software: I am using linux-2.6.17.7 vanilla, toolchain based on 3.4 gcc. Repoduce Steps: 1)mkfs.xfs /dev/xxxx 2)mount /dev/xxxx /mnt 3)cd /mnt ; tar -xf /volume/somestuff than get I/O error here getting like that: 0x0: 58 44 00 00 00 12 0f a6 00 00 00 00 72 75 00 00 Filesystem "dm-14": XFS internal error xfs_da_do_buf(2) at line 2212 of file fs/xfs/xfs_da_btree.c. Caller 0xc0159a0c [<c0028a4c>] (dump_stack+0x0/0x14) from [<c016cf74>] (xfs_error_report+0x54/0x64) [<c016cf20>] (xfs_error_report+0x0/0x64) from [<c016d074>] (xfs_corruption_error+0xf0/0x108) [<c016cf84>] (xfs_corruption_error+0x0/0x108) from [<c0159854>] (xfs_da_do_buf+0x678/0x798) [<c01591dc>] (xfs_da_do_buf+0x0/0x798) from [<c0159a0c>] (xfs_da_read_buf+0x3c/0x44) [<c01599d4>] (xfs_da_read_buf+0x4/0x44) from [<c015f340>] (xfs_dir2_block_addname+0x68/0x998) [<c015f2d8>] (xfs_dir2_block_addname+0x0/0x998) from [<c0167644>] (xfs_dir2_sf_addname+0x1e8/0x918) [<c016745c>] (xfs_dir2_sf_addname+0x0/0x918) from [<c015ec04>] (xfs_dir2_createname+0xb8/0x140) [<c015eb4c>] (xfs_dir2_createname+0x0/0x140) from [<c0195404>] (xfs_symlink+0x804/0x9f4) [<c0194c00>] (xfs_symlink+0x0/0x9f4) from [<c019f768>] (xfs_vn_symlink+0xbc/0x178) [<c019f6ac>] (xfs_vn_symlink+0x0/0x178) from [<c0089540>] (vfs_symlink+0x80/0xc0) [<c00894c0>] (vfs_symlink+0x0/0xc0) from [<c008961c>] (sys_symlinkat+0x9c/0xf8) r7 = CFCC1F30 r6 = C7771000 r5 = C46BAB60 r4 = C46BAB60 [<c0089580>] (sys_symlinkat+0x0/0xf8) from [<c0089690>] (sys_symlink+0x18/0x1c) r8 = C0023EE4 r7 = 00000053 r6 = 00047580 r5 = 00047530 r4 = 00047584 [<c0089678>] (sys_symlink+0x0/0x1c) from [<c0023d40>] (ret_fast_syscall+0x0/0x2c) Filesystem "dm-14": XFS internal error xfs_trans_cancel at line 1150 of file fs/xfs/xfs_trans.c. Caller 0xc019558c [<c0028a4c>] (dump_stack+0x0/0x14) from [<c016cf74>] (xfs_error_report+0x54/0x64) [<c016cf20>] (xfs_error_report+0x0/0x64) from [<c018b6a4>] (xfs_trans_cancel+0x7c/0x12c) [<c018b628>] (xfs_trans_cancel+0x0/0x12c) from [<c019558c>] (xfs_symlink+0x98c/0x9f4) r7 = CFCC1DBC r6 = CFCC1E80 r5 = 000003DE r4 = 00000000 [<c0194c00>] (xfs_symlink+0x0/0x9f4) from [<c019f768>] (xfs_vn_symlink+0xbc/0x178) [<c019f6ac>] (xfs_vn_symlink+0x0/0x178) from [<c0089540>] (vfs_symlink+0x80/0xc0) [<c00894c0>] (vfs_symlink+0x0/0xc0) from [<c008961c>] (sys_symlinkat+0x9c/0xf8) r7 = CFCC1F30 r6 = C7771000 r5 = C46BAB60 r4 = C46BAB60 [<c0089580>] (sys_symlinkat+0x0/0xf8) from [<c0089690>] (sys_symlink+0x18/0x1c) r8 = C0023EE4 r7 = 00000053 r6 = 00047580 r5 = 00047530 r4 = 00047584 [<c0089678>] (sys_symlink+0x0/0x1c) from [<c0023d40>] (ret_fast_syscall+0x0/0x2c) xfs_force_shutdown(dm-14,0x8) called from line 1151 of file fs/xfs/xfs_trans.c. Return address = 0xc01a2c4c Filesystem "dm-14": Corruption of in-memory data detected. Shutting down filesystem: dm-14 NOTE: hardware is not responsible for corruption, ie ext3 works fine. xfs_repair(2.8.4) -n ... ... entry "less" in directory inode 2009243 points to free inode 2025152, would junk entry entry "seq" in directory inode 2009243 points to free inode 2025153, would junk entry entry "test" in directory inode 2009243 points to free inode 2025154, would junk entry entry "tr" in directory inode 2009243 points to free inode 2025155, would junk entry entry "subst" in directory inode 2009243 points to free inode 2025156, would junk entry entry "uuidgen" in directory inode 2009243 points to free inode 2025157, would junk entry entry "netcat" in directory inode 2009243 points to free inode 2025158, would junk entry entry "wget" in directory inode 2009243 points to free inode 2025159, would junk entry entry "which" in directory inode 2009243 points to free inode 2025160, would junk entry entry "bc" in directory inode 2009243 points to free inode 2025161, would junk entry entry "chrootuid" in directory inode 2009243 points to free inode 2025162, would junk entry entry "killall" in directory inode 2009243 points to free inode 2025163, would junk entry entry "diff" in directory inode 2009243 points to free inode 2025164, would junk entry - traversals finished ... - moving disconnected inodes to lost+found ... disconnected dir inode 392012, would move to lost+found disconnected dir inode 556954, would move to lost+found disconnected dir inode 2009243, would move to lost+found Phase 7 - verify link counts... would have reset inode 392004 nlinks from 13 to 9 would have reset inode 556955 nlinks from 4 to 3 would have reset inode 854746 nlinks from 3 to 2 No modify flag set, skipping filesystem flush and exiting. xfs_repair -L /dev/xxxx entry "which" at block 0 offset 256 in directory inode 2009243 references free inode 2025160 clearing inode number in entry at offset 256... entry "bc" at block 0 offset 272 in directory inode 2009243 references free inode 2025161 clearing inode number in entry at offset 272... entry "chrootuid" at block 0 offset 288 in directory inode 2009243 references free inode 2025162 clearing inode number in entry at offset 288... entry "killall" at block 0 offset 312 in directory inode 2009243 references free inode 2025163 clearing inode number in entry at offset 312... entry "diff" at block 0 offset 336 in directory inode 2009243 references free inode 2025164 clearing inode number in entry at offset 336... no .. entry for directory 2009243 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - ensuring existence of lost+found directory - traversing filesystem starting at / ... rebuilding directory inode 128 rebuilding directory inode 42065 rebuilding directory inode 392004 rebuilding directory inode 1401147 #
| Production board based on iop80321 Xscale CPU (ARM). Is this reproducible on other platforms? We've seen problems with compilers on ARM in the past, not properly manipulating 64 bit numbers... if this can't be reproduced anywhere else, odds are good that this is another ARM platform/compiler bug. cheers.
Nathan, let me add my two cents. I run Linux on the AMD64 platform. I see the same behavior as described in this bug. I am on kernel 2.6.17.8 gcc 4.1.1 I have experienced what amounts to a near complete data loss as a result of this bug. All attempts at recovery have been unsuccessful so far. FS recovery using xfs_repair results in errors. Now before you ask, I'm not a novice. I had a RAID1 setup. Mirror is corrupted too, obviously. I had backups as well. The point is, my love affair with XFS is *OVER*. It is clear that she is not a partner that is safe to be with, she carries and spreads disease. I was attempting to copy stuff off onto another volume, which eventually resulted in: Filesystem "dm-14": XFS internal error xfs_da_do_buf(2) at line 2212 of file fs/xfs/xfs_da_btree.c. Caller 0xc0159a0c <-- the hex value I saw was different, as well as the previous "linux 2212". for me it was 2128
Same here too. Nov 9 05:25:46 asterisk kernel: Filesystem "md0": XFS internal error xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c. Caller 0xc01ebae8 Nov 9 05:25:46 asterisk kernel: [xfs_da_do_buf+1710/2144] xfs_da_do_buf+0x6ae/0x860 Nov 9 05:25:46 asterisk kernel: [xfs_da_read_buf+72/96] xfs_da_read_buf+0x48/0x60 Nov 9 05:25:46 asterisk kernel: [xfs_trans_log_buf+94/144] xfs_trans_log_buf+0x5e/0x90 Nov 9 05:25:46 asterisk kernel: [xfs_dir2_data_log_unused+81/112] xfs_dir2_data_log_unused+0x51/0x70 Nov 9 05:25:46 asterisk kernel: [xfs_da_read_buf+72/96] xfs_da_read_buf+0x48/0x60 Nov 9 05:25:46 asterisk kernel: [xfs_dir2_leafn_remove+593/1040] xfs_dir2_leafn_remove+0x251/0x410 Nov 9 05:25:46 asterisk kernel: [xfs_dir2_leafn_remove+593/1040] xfs_dir2_leafn_remove+0x251/0x410 Nov 9 05:25:46 asterisk kernel: [xfs_dir2_node_removename+157/224] xfs_dir2_node_removename+0x9d/0xe0 Nov 9 05:25:46 asterisk kernel: [xfs_dir_removename+246/272] xfs_dir_removename+0xf6/0x110 Nov 9 05:25:46 asterisk kernel: [kmem_zone_zalloc+38/80] kmem_zone_zalloc+0x26/0x50 Nov 9 05:25:46 asterisk kernel: [xfs_trans_ijoin+43/128] xfs_trans_ijoin+0x2b/0x80 Nov 9 05:25:46 asterisk kernel: [xfs_remove+535/1120] xfs_remove+0x217/0x460 Nov 9 05:25:46 asterisk kernel: [xfs_vn_permission+0/32] xfs_vn_permission+0x0/0x20 Nov 9 05:25:46 asterisk kernel: [permission+194/256] permission+0xc2/0x100 Nov 9 05:25:46 asterisk kernel: [__link_path_walk+3308/3328] __link_path_walk+0xcec/0xd00 Nov 9 05:25:46 asterisk kernel: [xfs_ifree+166/208] xfs_ifree+0xa6/0xd0 Nov 9 05:25:46 asterisk kernel: [xfs_vn_unlink+35/96] xfs_vn_unlink+0x23/0x60 Nov 9 05:25:46 asterisk kernel: [mntput_no_expire+27/144] mntput_no_expire+0x1b/0x90 Nov 9 05:25:46 asterisk kernel: [link_path_walk+107/192] link_path_walk+0x6b/0xc0 Nov 9 05:25:46 asterisk kernel: [xfs_trans_unlocked_item+56/96] xfs_trans_unlocked_item+0x38/0x60 Nov 9 05:25:46 asterisk kernel: [xfs_access+63/80] xfs_access+0x3f/0x50 Nov 9 05:25:46 asterisk kernel: [xfs_vn_permission+0/32] xfs_vn_permission+0x0/0x20 Nov 9 05:25:46 asterisk kernel: [xfs_vn_permission+15/32] xfs_vn_permission+0xf/0x20 Nov 9 05:25:46 asterisk kernel: [permission+194/256] permission+0xc2/0x100 Nov 9 05:25:46 asterisk kernel: [vfs_unlink+219/256] vfs_unlink+0xdb/0x100 Nov 9 05:25:46 asterisk kernel: [do_unlinkat+149/272] do_unlinkat+0x95/0x110 Nov 9 05:25:46 asterisk kernel: [sys_getdents64+179/192] sys_getdents64+0xb3/0xc0 Nov 9 05:25:46 asterisk kernel: [syscall_call+7/11] syscall_call+0x7/0xb SMART says disks are ok, I have raid1 setup, which is OK (not corrupted), only underlying system gave me this. 2.6.18.1, I upgraded to this version yesterday from 2.6.17.6 Debian stable, i386 # gcc -v Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux Thread model: posix gcc version 3.3.5 (Debian 1:3.3.5-13)
Try and run xfs_repair from the latest xfsprogs 2.8.16 and report the errors found. You may have problems from a bug in the 2.6.17.1-6 kernels.
Sorry this has been unresolved for so long; is anyone still seeing this behavior? Pavel, you may have been hitting a problem specific to arm, which I hope I fixed a few releases back. Any idea if this is working for you now?
Closed due to lack of feedback. Please re-open if you have new information.