Bug 712 - 2.6.17.7 XFS internal error xfs_da_do_buf
: 2.6.17.7 XFS internal error xfs_da_do_buf
Status: RESOLVED WORKSFORME
Product: XFS
Classification: Unclassified
Component: XFS kernel code
: unspecified
: Other Linux
: P2 critical
: ---
Assigned To: XFS power people
:
:
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-29 10:09 CDT by Pavel Mironchik
Modified: 2009-06-22 06:09 CDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel Mironchik 2006-07-29 10:09:35 CDT
I found that xfs behaves unstable on recent kernel 2.6.17.7. I put 
here results of doing one test: untaring big file on xfs volume. 
Please look at results, also as if some additional information is required.

Hardware:
 Production board based on iop80321 Xscale CPU (ARM).
Software:
 I am using linux-2.6.17.7 vanilla, toolchain based on 3.4 gcc.

Repoduce Steps:
1)mkfs.xfs /dev/xxxx
2)mount /dev/xxxx /mnt
3)cd /mnt ; tar -xf /volume/somestuff
than get I/O error here

getting like that:
0x0: 58 44 00 00 00 12 0f a6 00 00 00 00 72 75 00 00 
Filesystem "dm-14": XFS internal error xfs_da_do_buf(2) at line 2212 of file
fs/xfs/xfs_da_btree.c.  Caller 0xc0159a0c
[<c0028a4c>] (dump_stack+0x0/0x14) from [<c016cf74>] (xfs_error_report+0x54/0x64)
[<c016cf20>] (xfs_error_report+0x0/0x64) from [<c016d074>]
(xfs_corruption_error+0xf0/0x108)
[<c016cf84>] (xfs_corruption_error+0x0/0x108) from [<c0159854>]
(xfs_da_do_buf+0x678/0x798)
[<c01591dc>] (xfs_da_do_buf+0x0/0x798) from [<c0159a0c>] (xfs_da_read_buf+0x3c/0x44)
[<c01599d4>] (xfs_da_read_buf+0x4/0x44) from [<c015f340>]
(xfs_dir2_block_addname+0x68/0x998)
[<c015f2d8>] (xfs_dir2_block_addname+0x0/0x998) from [<c0167644>]
(xfs_dir2_sf_addname+0x1e8/0x918)
[<c016745c>] (xfs_dir2_sf_addname+0x0/0x918) from [<c015ec04>]
(xfs_dir2_createname+0xb8/0x140)
[<c015eb4c>] (xfs_dir2_createname+0x0/0x140) from [<c0195404>]
(xfs_symlink+0x804/0x9f4)
[<c0194c00>] (xfs_symlink+0x0/0x9f4) from [<c019f768>] (xfs_vn_symlink+0xbc/0x178)
[<c019f6ac>] (xfs_vn_symlink+0x0/0x178) from [<c0089540>] (vfs_symlink+0x80/0xc0)
[<c00894c0>] (vfs_symlink+0x0/0xc0) from [<c008961c>] (sys_symlinkat+0x9c/0xf8)
 r7 = CFCC1F30  r6 = C7771000  r5 = C46BAB60  r4 = C46BAB60
[<c0089580>] (sys_symlinkat+0x0/0xf8) from [<c0089690>] (sys_symlink+0x18/0x1c)
 r8 = C0023EE4  r7 = 00000053  r6 = 00047580  r5 = 00047530
 r4 = 00047584 
[<c0089678>] (sys_symlink+0x0/0x1c) from [<c0023d40>] (ret_fast_syscall+0x0/0x2c)
Filesystem "dm-14": XFS internal error xfs_trans_cancel at line 1150 of file
fs/xfs/xfs_trans.c.  Caller 0xc019558c
[<c0028a4c>] (dump_stack+0x0/0x14) from [<c016cf74>] (xfs_error_report+0x54/0x64)
[<c016cf20>] (xfs_error_report+0x0/0x64) from [<c018b6a4>]
(xfs_trans_cancel+0x7c/0x12c)
[<c018b628>] (xfs_trans_cancel+0x0/0x12c) from [<c019558c>]
(xfs_symlink+0x98c/0x9f4)
 r7 = CFCC1DBC  r6 = CFCC1E80  r5 = 000003DE  r4 = 00000000
[<c0194c00>] (xfs_symlink+0x0/0x9f4) from [<c019f768>] (xfs_vn_symlink+0xbc/0x178)
[<c019f6ac>] (xfs_vn_symlink+0x0/0x178) from [<c0089540>] (vfs_symlink+0x80/0xc0)
[<c00894c0>] (vfs_symlink+0x0/0xc0) from [<c008961c>] (sys_symlinkat+0x9c/0xf8)
 r7 = CFCC1F30  r6 = C7771000  r5 = C46BAB60  r4 = C46BAB60
[<c0089580>] (sys_symlinkat+0x0/0xf8) from [<c0089690>] (sys_symlink+0x18/0x1c)
 r8 = C0023EE4  r7 = 00000053  r6 = 00047580  r5 = 00047530
 r4 = 00047584 
[<c0089678>] (sys_symlink+0x0/0x1c) from [<c0023d40>] (ret_fast_syscall+0x0/0x2c)
xfs_force_shutdown(dm-14,0x8) called from line 1151 of file fs/xfs/xfs_trans.c.
 Return address = 0xc01a2c4c
Filesystem "dm-14": Corruption of in-memory data detected.  Shutting down
filesystem: dm-14

NOTE: hardware is not responsible for corruption, ie ext3 works fine.

xfs_repair(2.8.4) -n

...
...
entry "less" in directory inode 2009243 points to free inode 2025152, would junk
entry
entry "seq" in directory inode 2009243 points to free inode 2025153, would junk
entry
entry "test" in directory inode 2009243 points to free inode 2025154, would junk
entry
entry "tr" in directory inode 2009243 points to free inode 2025155, would junk entry
entry "subst" in directory inode 2009243 points to free inode 2025156, would
junk entry
entry "uuidgen" in directory inode 2009243 points to free inode 2025157, would
junk entry
entry "netcat" in directory inode 2009243 points to free inode 2025158, would
junk entry
entry "wget" in directory inode 2009243 points to free inode 2025159, would junk
entry
entry "which" in directory inode 2009243 points to free inode 2025160, would
junk entry
entry "bc" in directory inode 2009243 points to free inode 2025161, would junk entry
entry "chrootuid" in directory inode 2009243 points to free inode 2025162, would
junk entry
entry "killall" in directory inode 2009243 points to free inode 2025163, would
junk entry
entry "diff" in directory inode 2009243 points to free inode 2025164, would junk
entry
        - traversals finished ... 
        - moving disconnected inodes to lost+found ... 
disconnected dir inode 392012, would move to lost+found
disconnected dir inode 556954, would move to lost+found
disconnected dir inode 2009243, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 392004 nlinks from 13 to 9
would have reset inode 556955 nlinks from 4 to 3
would have reset inode 854746 nlinks from 3 to 2
No modify flag set, skipping filesystem flush and exiting.


xfs_repair -L /dev/xxxx
entry "which" at block 0 offset 256 in directory inode 2009243 references free
inode 2025160
        clearing inode number in entry at offset 256...
entry "bc" at block 0 offset 272 in directory inode 2009243 references free
inode 2025161
        clearing inode number in entry at offset 272...
entry "chrootuid" at block 0 offset 288 in directory inode 2009243 references
free inode 2025162
        clearing inode number in entry at offset 288...
entry "killall" at block 0 offset 312 in directory inode 2009243 references free
inode 2025163
        clearing inode number in entry at offset 312...
entry "diff" at block 0 offset 336 in directory inode 2009243 references free
inode 2025164
        clearing inode number in entry at offset 336...
no .. entry for directory 2009243
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ... 
rebuilding directory inode 128
rebuilding directory inode 42065
rebuilding directory inode 392004
rebuilding directory inode 1401147
#
Comment 1 Nathan Scott 2006-07-31 16:08:41 CDT
| Production board based on iop80321 Xscale CPU (ARM).

Is this reproducible on other platforms?  We've seen problems with
compilers on ARM in the past, not properly manipulating 64 bit
numbers... if this can't be reproduced anywhere else, odds are good
that this is another ARM platform/compiler bug.

cheers.
Comment 2 Aaron Kulbe 2006-08-14 09:26:20 CDT
Nathan, let me add my two cents.

I run Linux on the AMD64 platform.  I see the same behavior as described in this
bug.

I am on kernel 2.6.17.8
gcc 4.1.1

I have experienced what amounts to a near complete data loss as a result of this
bug.  All attempts at recovery have been unsuccessful so far.  FS recovery using
xfs_repair results in errors.

Now before you ask, I'm not a novice. I had a RAID1 setup.  Mirror is corrupted
too, obviously. I had backups as well.  The point is, my love affair with XFS is
*OVER*.  It is clear that she is not a partner that is safe to be with, she
carries and spreads disease.

I was attempting to copy stuff off onto another volume, which eventually
resulted in:

Filesystem "dm-14": XFS internal error xfs_da_do_buf(2) at line 2212 of file
fs/xfs/xfs_da_btree.c.  Caller 0xc0159a0c <-- the hex value I saw was different,
as well as the previous "linux 2212".  for me it was 2128



Comment 3 Juraj Bednar 2006-11-09 01:25:46 CST
Same here too.

Nov  9 05:25:46 asterisk kernel: Filesystem "md0": XFS internal error
xfs_da_do_buf(1) at line 1992 of file fs/xfs/xfs_da_btree.c.  Caller 0xc01ebae8
Nov  9 05:25:46 asterisk kernel:  [xfs_da_do_buf+1710/2144]
xfs_da_do_buf+0x6ae/0x860
Nov  9 05:25:46 asterisk kernel:  [xfs_da_read_buf+72/96] xfs_da_read_buf+0x48/0x60
Nov  9 05:25:46 asterisk kernel:  [xfs_trans_log_buf+94/144]
xfs_trans_log_buf+0x5e/0x90
Nov  9 05:25:46 asterisk kernel:  [xfs_dir2_data_log_unused+81/112]
xfs_dir2_data_log_unused+0x51/0x70
Nov  9 05:25:46 asterisk kernel:  [xfs_da_read_buf+72/96] xfs_da_read_buf+0x48/0x60
Nov  9 05:25:46 asterisk kernel:  [xfs_dir2_leafn_remove+593/1040]
xfs_dir2_leafn_remove+0x251/0x410
Nov  9 05:25:46 asterisk kernel:  [xfs_dir2_leafn_remove+593/1040]
xfs_dir2_leafn_remove+0x251/0x410
Nov  9 05:25:46 asterisk kernel:  [xfs_dir2_node_removename+157/224]
xfs_dir2_node_removename+0x9d/0xe0
Nov  9 05:25:46 asterisk kernel:  [xfs_dir_removename+246/272]
xfs_dir_removename+0xf6/0x110
Nov  9 05:25:46 asterisk kernel:  [kmem_zone_zalloc+38/80]
kmem_zone_zalloc+0x26/0x50
Nov  9 05:25:46 asterisk kernel:  [xfs_trans_ijoin+43/128] xfs_trans_ijoin+0x2b/0x80
Nov  9 05:25:46 asterisk kernel:  [xfs_remove+535/1120] xfs_remove+0x217/0x460
Nov  9 05:25:46 asterisk kernel:  [xfs_vn_permission+0/32]
xfs_vn_permission+0x0/0x20
Nov  9 05:25:46 asterisk kernel:  [permission+194/256] permission+0xc2/0x100
Nov  9 05:25:46 asterisk kernel:  [__link_path_walk+3308/3328]
__link_path_walk+0xcec/0xd00
Nov  9 05:25:46 asterisk kernel:  [xfs_ifree+166/208] xfs_ifree+0xa6/0xd0
Nov  9 05:25:46 asterisk kernel:  [xfs_vn_unlink+35/96] xfs_vn_unlink+0x23/0x60
Nov  9 05:25:46 asterisk kernel:  [mntput_no_expire+27/144]
mntput_no_expire+0x1b/0x90
Nov  9 05:25:46 asterisk kernel:  [link_path_walk+107/192] link_path_walk+0x6b/0xc0
Nov  9 05:25:46 asterisk kernel:  [xfs_trans_unlocked_item+56/96]
xfs_trans_unlocked_item+0x38/0x60
Nov  9 05:25:46 asterisk kernel:  [xfs_access+63/80] xfs_access+0x3f/0x50
Nov  9 05:25:46 asterisk kernel:  [xfs_vn_permission+0/32]
xfs_vn_permission+0x0/0x20
Nov  9 05:25:46 asterisk kernel:  [xfs_vn_permission+15/32]
xfs_vn_permission+0xf/0x20
Nov  9 05:25:46 asterisk kernel:  [permission+194/256] permission+0xc2/0x100
Nov  9 05:25:46 asterisk kernel:  [vfs_unlink+219/256] vfs_unlink+0xdb/0x100
Nov  9 05:25:46 asterisk kernel:  [do_unlinkat+149/272] do_unlinkat+0x95/0x110
Nov  9 05:25:46 asterisk kernel:  [sys_getdents64+179/192] sys_getdents64+0xb3/0xc0
Nov  9 05:25:46 asterisk kernel:  [syscall_call+7/11] syscall_call+0x7/0xb


SMART says disks are ok, I have raid1 setup, which is OK (not corrupted), only
underlying system gave me this.

2.6.18.1, I upgraded to this version yesterday from 2.6.17.6

Debian stable, i386

# gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared
--enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext
--enable-clocale=gnu --enable-debug --enable-java-gc=boehm
--enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)
Comment 4 Barry Naujok 2006-11-09 16:30:42 CST
Try and run xfs_repair from the latest xfsprogs 2.8.16 and report
the errors found. You may have problems from a bug in the 2.6.17.1-6 
kernels.
Comment 5 Eric Sandeen 2008-12-23 14:51:52 CST
Sorry this has been unresolved for so long; is anyone still seeing this
behavior?  Pavel, you may have been hitting a problem specific to arm, which I
hope I fixed a few releases back.  Any idea if this is working for you now?
Comment 6 Christoph Hellwig 2009-06-22 06:09:09 CDT
Closed due to lack of feedback.  Please re-open if you have new information.