Hi,
Since my testing of 2.5.x (which began at 2.5.59), I've been experiencing
mysterious bugs, which manifest themselves while accessing directories
containing many files (like ldconfig or ls -l /usr/bin do). The syslog
output from the latest I tested (SGI XFS for Linux 2.5.68 with debug
enabled), looks like this:
#v+
0x0: fe ed ba be 00 00 01 7a 00 00 00 01 00 00 00 00
Filesystem "ide0(3,5)": XFS internal error xfs_da_do_buf(2) at line 2248 of
file fs/xfs/xfs_da_btree.c. Caller 0xc01a4dcf
Call Trace:
[<c01a49eb>] xfs_da_do_buf+0x50c/0x83e
[<c01a4dcf>] xfs_da_read_buf+0x57/0x5b
last message repeated 2 times
[<c01a9491>] xfs_dir2_block_lookup_int+0x52/0x28a
[<c01a9491>] xfs_dir2_block_lookup_int+0x52/0x28a
[<c0191b40>] xfs_bmap_last_offset+0xc8/0x123
[<c01a93a5>] xfs_dir2_block_lookup+0x2b/0xc5
[<c01a76f9>] xfs_dir2_lookup+0x104/0x17b
[<c013679d>] do_page_cache_readahead+0x7e/0x14e
[<c01349c9>] __rmqueue+0xbb/0x10b
[<c01e7038>] xfs_dir_lookup_int+0x58/0x163
[<c01ed384>] xfs_lookup+0x88/0xd3
[<c01fbb62>] linvfs_lookup+0x6f/0xe3
[<c0153920>] real_lookup+0xc0/0xe2
[<c0153b8d>] do_lookup+0x9e/0xa9
[<c0153ec7>] link_path_walk+0x32f/0x616
[<c0154996>] open_namei+0x7e/0x3d7
[<c0147d7a>] filp_open+0x43/0x69
[<c014816f>] sys_open+0x5b/0x8b
[<c0109027>] syscall_call+0x7/0xb
0x0: fe ed ba be 00 00 01 7a 00 00 00 01 00 00 00 00
Filesystem "ide0(3,5)": XFS internal error xfs_da_do_buf(2) at line 2248 of
file fs/xfs/xfs_da_btree.c. Caller 0xc01a4dcf
Call Trace:
[<c01a49eb>] xfs_da_do_buf+0x50c/0x83e
[<c01a4dcf>] xfs_da_read_buf+0x57/0x5b
[<c01a4dcf>] xfs_da_read_buf+0x57/0x5b
[<c01a76f9>] xfs_dir2_lookup+0x104/0x17b
[<c013185b>] find_get_page+0x1a/0x25
[<c01327ab>] filemap_nopage+0x1d3/0x2b6
[<c01a8323>] xfs_dir2_put_dirent64_direct+0x0/0x96
[<c01a4dcf>] xfs_da_read_buf+0x57/0x5b
[<c01a90c3>] xfs_dir2_block_getdents+0x77/0x25a
[<c01a90c3>] xfs_dir2_block_getdents+0x77/0x25a
[<c0191b40>] xfs_bmap_last_offset+0xc8/0x123
[<c01a8323>] xfs_dir2_put_dirent64_direct+0x0/0x96
[<c01a8218>] xfs_dir2_isblock+0x3a/0xd5
[<c01a8323>] xfs_dir2_put_dirent64_direct+0x0/0x96
[<c01a79b2>] xfs_dir2_getdents+0xd8/0x161
[<c01a8323>] xfs_dir2_put_dirent64_direct+0x0/0x96
[<c01efbd7>] xfs_readdir+0x6f/0xdd
[<c01f7bcd>] linvfs_readdir+0x10a/0x262
[<c0116d99>] do_page_fault+0x125/0x441
[<c0157ad8>] vfs_readdir+0x7c/0x7e
[<c0157c07>] filldir+0x0/0xdb
[<c0157d7a>] sys_getdents+0x98/0xec
[<c0157c07>] filldir+0x0/0xdb
[<c0109027>] syscall_call+0x7/0xb
0x0: fe ed ba be 00 00 01 7a 00 00 00 01 00 00 00 00
Filesystem "ide0(3,5)": XFS internal error xfs_da_do_buf(2) at line 2248 of
file fs/xfs/xfs_da_btree.c. Caller 0xc01a4dcf
Call Trace:
[<c01a49eb>] xfs_da_do_buf+0x50c/0x83e
[<c01a4dcf>] xfs_da_read_buf+0x57/0x5b
[<c01a4dcf>] xfs_da_read_buf+0x57/0x5b
[<c0208244>] vsprintf+0x27/0x2b
[<c01a4dcf>] xfs_da_read_buf+0x57/0x5b
[<c01a9491>] xfs_dir2_block_lookup_int+0x52/0x28a
[<c01a9491>] xfs_dir2_block_lookup_int+0x52/0x28a
[<c0191b40>] xfs_bmap_last_offset+0xc8/0x123
[<c01a93a5>] xfs_dir2_block_lookup+0x2b/0xc5
[<c01a76f9>] xfs_dir2_lookup+0x104/0x17b
[<c01a76f9>] xfs_dir2_lookup+0x104/0x17b
[<c013185b>] find_get_page+0x1a/0x25
[<c01327ab>] filemap_nopage+0x1d3/0x2b6
[<c01a4dcf>] xfs_da_read_buf+0x57/0x5b
[<c01e7038>] xfs_dir_lookup_int+0x58/0x163
[<c01ed384>] xfs_lookup+0x88/0xd3
[<c01fbb62>] linvfs_lookup+0x6f/0xe3
[<c0153920>] real_lookup+0xc0/0xe2
[<c0153b8d>] do_lookup+0x9e/0xa9
[<c0153ec7>] link_path_walk+0x32f/0x616
[<c015459f>] __user_walk+0x49/0x5e
[<c01501a0>] vfs_stat+0x1f/0x5b
[<c01507ff>] sys_stat64+0x1b/0x39
[<c0109027>] syscall_call+0x7/0xb
#v-
This is connected with some files (or even, particularly in the earlier
versions, whole directories) being inaccesible (this is a partial log, more
can be seen on http://hell.org.pl/~sziwan/xfs-errors).
The filesystem runs perfectly stable on various 2.4.x XFS patches,
furthermore, xfs_check and xfs_repair do not find any errors. The above
errors do not produce filesystem corruption, either.
So far, no errors have been observed while using the filesystem in normal
conditions under 2.4.
The errors are not deterministic, i.e. it sometimes happens the filesystem
is OK, yet sometimes the system fails to boot properly. At times, the whole
/usr/bin is seeming empty, the other times is is /usr/local.
Moreover, they appear both when the kernel was compiled using gcc-2.95, and
gcc-3.2.2. The kernel configuration doesn't seem to influence it, either.
Following is a snippet from the strace ls -l /usr/bin, which may be meaningful:
#v+
lstat64("/usr/bin", {st_mode=S_IFDIR|0755, st_size=40960, ...}) = 0
open("/dev/null", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOTDIR (Not a
directory)
open("/usr/bin", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=40960, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
brk(0x8059000) = 0x8059000
getdents64(0x3, 0x80576a0, 0x1000, 0) = 4096
brk(0x805f000) = 0x805f000
getdents64(0x3, 0x80576a0, 0x1000, 0) = 4080
brk(0x806a000) = 0x806a000
getdents64(0x3, 0x80576a0, 0x1000, 0) = 4088
brk(0x807b000) = 0x807b000
getdents64(0x3, 0x80576a0, 0x1000, 0) = 4072
getdents64(0x3, 0x80576a0, 0x1000, 0) = 4064
getdents64(0x3, 0x80576a0, 0x1000, 0) = 4072
getdents64(0x3, 0x80576a0, 0x1000, 0) = -990
getdents(3, out of memory
) = -990
close(3) = 0
#v-
There is, additionally, another error, which might (or might not be)
related:
#v+
buffer layer error at fs/buffer.c:2702
Call Trace:
[<c014cd54>] drop_buffers+0xb3/0xb9
[<c014cd96>] try_to_free_buffers+0x3c/0x96
[<c01f742d>] linvfs_release_page+0x74/0x78
[<c014adc9>] try_to_release_page+0x5c/0x6c
[<c014aebc>] block_invalidatepage+0xe3/0xf6
[<c0138abb>] do_invalidatepage+0x27/0x2b
[<c0138b46>] truncate_complete_page+0x87/0x89
[<c0138cc3>] truncate_inode_pages+0xed/0x31d
[<c015de53>] generic_delete_inode+0xb8/0xba
[<c015dfcd>] iput+0x55/0x6f
[<c0155725>] sys_unlink+0x86/0x13c
[<c0109027>] syscall_call+0x7/0xb
#v-
This happens upon file operations during shutdown stage, e.g.
dd if=/dev/urandom of=/etc/random-seed
rm -f /var/lock/subsys/*
And probably around the final sync too.
The latter one (though, as I said, I'm not sure if it is related) is
perfectly deterministic and reproducible.
I'll be happy to provide any relevant information to find the cause of this
problem.
Best regards,
--
Karol 'sziwan' Kozimor
sziwan@xxxxxxxxxxx
|