Hi,
For some time we've been having problem with a server, which is acting
as a master/control node and NFS server for a computational cluster
(~180 client nodes). The server will crash after anywhere between
a few hours and 10 days operation. We've tried various kernels and
XFS patch versions from 2.4.9 kernel with XFS patch-2.4.9-xfs-2001-08-17
up to and including 2.4.16 kernel with the xfs-2.4.16-all-i386 patch,
if anything the 2.4.9 kernel has proved the most reliable (it normally
lasts between 4 and 10 days! - 2.4.16 lasted less than 24hrs).
I've just recovered and processed the following Oops from the most
recent crash (running 2.4.9 kernel), ksymoops output below which would
appear to point to a problem in the XFS kernel code as called from the
nfsd daemon process.
The server is a dual (1Ghz PIII) based on a SuperMicro ServerWorks LE
motherboard with 1Gbyte RAM, 40Gbyte Maxtor system disk and a QLogic
QLA2200 FC card connecting an external HW (IDE) RAID array. Its got
a RedHat 6.2 based distro but with the 2.4.x series kernel and XFS
patches. (We're just starting to run some controlled tests with a similar
server with a RH 7.2 distro and 2.4.14 kernel/XFS 1.0.2 release and or
2.4.17 and latest XFS patch release).
Anyone any ideas what is causing this? or better still how to fix it?
Thanks
Ian
-
Unable to handle kernel NULL pointer dereference at virtual address 00000030
c0192c29
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0192c29>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000 ebx: 0000000a ecx: 00000000 edx: 00000000
esi: 00000000 edi: 00002001 ebp: 00000000 esp: f527fb6c
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 592, stackpage=f527f000)
Stack: d604fe4c 00000000 00000000 f527fc54 00025960 00000010 d7625912 f70e7964
d7625980 00000000 00000001 00027961 f527fbd0 f70e7800 00000286 109ae350
00000000 00000000 e935d010 00000000 00000085 f70e7800 e935d000 00000757
Call Trace: [<c01947d0>] [<c01905ee>] [<c01dd9ab>] [<c0191787>] [<c01ded2e>]
[<c01a3a08>] [<c01cb173>] [<c01e27c3>] [<c01e1cb0>] [<f8d6831a>]
[<c0280f8c>]
[<c01ee286>] [<f8da0dda>] [<c01e1cb0>] [<c014d6f0>] [<f8da291f>]
[<f8da7deb>]
[<f8db0320>] [<f8d9f5a3>] [<f8db0320>] [<f8d67c88>] [<f8db0280>]
[<f8dafd78>]
[<f8d9f349>] [<c010576f>]
Code: 8b 52 30 89 54 24 58 51 55 8b 44 24 60 50 8b 54 24 78 52 e8
>>EIP; c0192c28 <xfs_alloc_lookup+148/394> <=====
Trace; c01947d0 <xfs_alloc_lookup_le+20/28>
Trace; c01905ee <xfs_free_ag_extent+56/4e0>
Trace; c01dd9aa <xfs_trans_commit+24e/27c>
Trace; c0191786 <xfs_free_extent+da/104>
Trace; c01ded2e <xfs_trans_get_efd+22/2c>
Trace; c01a3a08 <xfs_bmap_finish+fc/180>
Trace; c01cb172 <xfs_itruncate_finish+28a/3fc>
Trace; c01e27c2 <xfs_setattr+b12/fb8>
Trace; c01e1cb0 <xfs_setattr+0/fb8>
Trace; f8d6831a <[sunrpc]svc_udp_data_ready+5e/bc>
Trace; c0280f8c <udp_queue_rcv_skb+130/1b0>
Trace; c01ee286 <linvfs_notify_change+192/1bc>
Trace; f8da0dda <[nfsd]nfsd_iget+f6/110>
Trace; c01e1cb0 <xfs_setattr+0/fb8>
Trace; c014d6f0 <notify_change+90/130>
Trace; f8da291e <[nfsd]nfsd_setattr+426/564>
Trace; f8da7dea <[nfsd]nfsd3_proc_setattr+b6/c4>
Trace; f8db0320 <[nfsd]nfsd_procedures3+40/2c0>
Trace; f8d9f5a2 <[nfsd]nfsd_dispatch+ca/168>
Trace; f8db0320 <[nfsd]nfsd_procedures3+40/2c0>
Trace; f8d67c88 <[sunrpc]svc_process+2ac/544>
Trace; f8db0280 <[nfsd]nfsd_svcstats+0/40>
Trace; f8dafd78 <[nfsd]nfsd_version3+0/10>
Trace; f8d9f348 <[nfsd]nfsd+1b8/348>
Trace; c010576e <kernel_thread+22/30>
Code; c0192c28 <xfs_alloc_lookup+148/394>
00000000 <_EIP>:
Code; c0192c28 <xfs_alloc_lookup+148/394> <=====
0: 8b 52 30 mov 0x30(%edx),%edx <=====
Code; c0192c2a <xfs_alloc_lookup+14a/394>
3: 89 54 24 58 mov %edx,0x58(%esp,1)
Code; c0192c2e <xfs_alloc_lookup+14e/394>
7: 51 push %ecx
Code; c0192c30 <xfs_alloc_lookup+150/394>
8: 55 push %ebp
Code; c0192c30 <xfs_alloc_lookup+150/394>
9: 8b 44 24 60 mov 0x60(%esp,1),%eax
Code; c0192c34 <xfs_alloc_lookup+154/394>
d: 50 push %eax
Code; c0192c36 <xfs_alloc_lookup+156/394>
e: 8b 54 24 78 mov 0x78(%esp,1),%edx
Code; c0192c3a <xfs_alloc_lookup+15a/394>
12: 52 push %edx
Code; c0192c3a <xfs_alloc_lookup+15a/394>
13: e8 00 00 00 00 call 18 <_EIP+0x18> c0192c40
<xfs_alloc_lookup+160/394>
--
/////////////Technical Coordination, Research Services////////////////////
Ian Hardy Tel: 023 80 593577
Computing Services
Southampton University email: idh@xxxxxxxxxxx
Southampton S017 1BJ, UK. i.d.hardy@xxxxxxxxxxx
\\'BUGS: The notion of errors is ill-defined' (IRIX man page for netstat)\
|