Background: I'm trying to evaluate the suitability of XFS -vs- ReiserFS for
an NFS server. I'm leaning towards XFS because we have a few IRIX machines,
and the admins are familiar with XFS (backup/restore, etc.).
The "test" server is a dual P-II 450 machine w/ 256MB RAM and a couple
of SCSI disks on an Adaptex 2940. One of these disks was formatted as XFS.
This disk is NFS exported to 4-5 client machines. The base installation is
RedHat Wolverine; we upgraded the kernel to the latest XFS patched kernel,
0.10-test2 using the kernel source RPM on the OSS ftp site.
Testing: We are beating on this exported filesystem from 4-5 clients. All
clients are reading random 3MB files each, in a tight loop. One of the
clients is writing 0-1MB size files (randomly), again in a tight loop.
This homespun testing methodology tries to simulate the kind of use the
real server would expect to get.
Problem: After about 3-4 hours of continuous beating, the system experienced
a lockup. The message on the console is duplicated below. Also given
below is a "function call trace" which I attempted to figure out using
disassembled output of the kernel, my kernel hacking skills being
severely lacking :-)
Question: Is this a knfsd problem, or an XFS problem? It appears like the
problem is with XFS, but I've been easily mistaken before...
Thanks for any help in this regard,
Ajay
Listing #1: Console message -----------------------------------------------
NMI watchdog detected LOCKUP on CPU0, registers:
CPU: 0
EIP: 0010:[<c02b12a8>]
EFLAGS: 00000086
eax: 00000000 ebx: c1453c00 ecx: cf025cd0 edx: c1453c00
esi: 0000007c edi: c1447d10 ebp: cf024000 esp: cf025be8
ds: 0018 es: 0018 ss: 0018
Process: nfsd (pid: 645, stackpage=cf025000)
Stack: c1447d10 00000286 00000003 c012a2a3 c1447d10 00000003 00000000 00000000
00000001 00001000 c1447d78 c0134e34 c1447d10 00000003 00000000 00000001
c0134f11 00000001 00000000 c12f5e1c 00000811 cf9d69e0 00000000 00000001
Call Trace: [<c012a2a3>] [<c0134e34>] [<c0134f11>] [<c01351e4>] [<c01895cb>]
[<c018926d>] [<c012d834>]
[<c0124a8d>] [<c01f1cd8>] [<c01f1b2c>] [<c012514b>] [<c01253e1>]
[<c0125807>] [<c0125750>] [<c01f1fcf>]
[<c01eeee7>] [<c01eee40>] [<c017316d>] [<c01eee40>] [<c0170335>]
[<c016fa33>] [<c02a9a78>] [<c016f7da>]
[<c0107503>]
Code: 7e f8 e9 6c 8e e7 ff 90 7e 18 00 f3 90 7e f8 e9 d3 8e e7 ff
Console shuts up...
Listing #2: Function call trace (done by hand) --------------------------------
address in function
------- ------------------------
c012a258 <kmem_cache_alloc>
c0134df4 <get_unused_buffer_head>
c0134eec <create_buffers>
c01351cc <create_empty_buffers>
c0189568 <hook_buffers_to_page>
c0189178 <pagebuf_read_full_page>
c012d6f8 <__alloc_pages>
c01249d4 <add_to_page_cache_unique>
c01f1cc4 <linvfs_read_full_page>
c01f1b2c <linvfs_pb_bmap>
c0124f28 <generic_file_readahead>
c01251c8 <do_generic_file_read>
c01257a4 <generic_file_read>
c0125750 <file_read_actor>
c01f1d90 <xfs_read>
c01eee40 <linvfs_read>
c01eee40 <linvfs_read>
c0172f40 <nfsd_read>
c01eee40 <linvfs_read>
c0170214 <nfsd_proc_read>
c016f968 <nfsd_dispatch>
c02a97cc <svc_process>
c016f610 <nfsd>
c01074e0 <kernel_thread>
|