xfs
[Top] [All Lists]

XFS/NFS server oops ..... any ideas.

To: linux-xfs@xxxxxxxxxxx
Subject: XFS/NFS server oops ..... any ideas.
From: "Ian D. Hardy" <i.d.hardy@xxxxxxxxxxx>
Date: Tue, 15 Jan 2002 19:33:39 +0000
Organization: University of Southampton
Sender: owner-linux-xfs@xxxxxxxxxxx
Hi,

For some time we've been having problem with a server, which is acting
as a master/control node and NFS server for a computational cluster 
(~180 client nodes). The server will crash after anywhere between 
a few hours and 10 days operation. We've tried various kernels and
XFS patch versions from 2.4.9 kernel with XFS patch-2.4.9-xfs-2001-08-17
up to and including 2.4.16 kernel with the xfs-2.4.16-all-i386 patch,
if anything the 2.4.9 kernel has proved the most reliable (it normally
lasts between 4 and 10 days! - 2.4.16 lasted less than 24hrs).

I've just recovered and processed the following Oops from the most 
recent crash (running 2.4.9 kernel), ksymoops output below which would
appear to point to a problem in the XFS kernel code as called from the
nfsd daemon process.

The server is a dual (1Ghz PIII) based on a SuperMicro ServerWorks LE
motherboard with 1Gbyte RAM, 40Gbyte Maxtor system disk and a QLogic
QLA2200 FC card connecting an external HW (IDE) RAID array. Its got
a RedHat 6.2 based distro but with the 2.4.x series kernel and XFS
patches. (We're just starting to run some controlled tests with a similar
server with a RH 7.2 distro and 2.4.14 kernel/XFS 1.0.2 release and or
2.4.17 and latest XFS patch release).

Anyone any ideas what is causing this? or better still how to fix it?

Thanks

Ian

-

Unable to handle kernel NULL pointer dereference at virtual address 00000030
c0192c29
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0192c29>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: 0000000a   ecx: 00000000   edx: 00000000
esi: 00000000   edi: 00002001   ebp: 00000000   esp: f527fb6c
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 592, stackpage=f527f000)
Stack: d604fe4c 00000000 00000000 f527fc54 00025960 00000010 d7625912 f70e7964 
       d7625980 00000000 00000001 00027961 f527fbd0 f70e7800 00000286 109ae350 
       00000000 00000000 e935d010 00000000 00000085 f70e7800 e935d000 00000757 
Call Trace: [<c01947d0>] [<c01905ee>] [<c01dd9ab>] [<c0191787>] [<c01ded2e>] 
   [<c01a3a08>] [<c01cb173>] [<c01e27c3>] [<c01e1cb0>] [<f8d6831a>] 
[<c0280f8c>] 
   [<c01ee286>] [<f8da0dda>] [<c01e1cb0>] [<c014d6f0>] [<f8da291f>] 
[<f8da7deb>] 
   [<f8db0320>] [<f8d9f5a3>] [<f8db0320>] [<f8d67c88>] [<f8db0280>] 
[<f8dafd78>] 
   [<f8d9f349>] [<c010576f>] 
Code: 8b 52 30 89 54 24 58 51 55 8b 44 24 60 50 8b 54 24 78 52 e8 

>>EIP; c0192c28 <xfs_alloc_lookup+148/394>   <=====
Trace; c01947d0 <xfs_alloc_lookup_le+20/28>
Trace; c01905ee <xfs_free_ag_extent+56/4e0>
Trace; c01dd9aa <xfs_trans_commit+24e/27c>
Trace; c0191786 <xfs_free_extent+da/104>
Trace; c01ded2e <xfs_trans_get_efd+22/2c>
Trace; c01a3a08 <xfs_bmap_finish+fc/180>
Trace; c01cb172 <xfs_itruncate_finish+28a/3fc>
Trace; c01e27c2 <xfs_setattr+b12/fb8>
Trace; c01e1cb0 <xfs_setattr+0/fb8>
Trace; f8d6831a <[sunrpc]svc_udp_data_ready+5e/bc>
Trace; c0280f8c <udp_queue_rcv_skb+130/1b0>
Trace; c01ee286 <linvfs_notify_change+192/1bc>
Trace; f8da0dda <[nfsd]nfsd_iget+f6/110>
Trace; c01e1cb0 <xfs_setattr+0/fb8>
Trace; c014d6f0 <notify_change+90/130>
Trace; f8da291e <[nfsd]nfsd_setattr+426/564>
Trace; f8da7dea <[nfsd]nfsd3_proc_setattr+b6/c4>
Trace; f8db0320 <[nfsd]nfsd_procedures3+40/2c0>
Trace; f8d9f5a2 <[nfsd]nfsd_dispatch+ca/168>
Trace; f8db0320 <[nfsd]nfsd_procedures3+40/2c0>
Trace; f8d67c88 <[sunrpc]svc_process+2ac/544>
Trace; f8db0280 <[nfsd]nfsd_svcstats+0/40>
Trace; f8dafd78 <[nfsd]nfsd_version3+0/10>
Trace; f8d9f348 <[nfsd]nfsd+1b8/348>
Trace; c010576e <kernel_thread+22/30>
Code;  c0192c28 <xfs_alloc_lookup+148/394>
00000000 <_EIP>:
Code;  c0192c28 <xfs_alloc_lookup+148/394>   <=====
   0:   8b 52 30                  mov    0x30(%edx),%edx   <=====
Code;  c0192c2a <xfs_alloc_lookup+14a/394>
   3:   89 54 24 58               mov    %edx,0x58(%esp,1)
Code;  c0192c2e <xfs_alloc_lookup+14e/394>
   7:   51                        push   %ecx
Code;  c0192c30 <xfs_alloc_lookup+150/394>
   8:   55                        push   %ebp
Code;  c0192c30 <xfs_alloc_lookup+150/394>
   9:   8b 44 24 60               mov    0x60(%esp,1),%eax
Code;  c0192c34 <xfs_alloc_lookup+154/394>
   d:   50                        push   %eax
Code;  c0192c36 <xfs_alloc_lookup+156/394>
   e:   8b 54 24 78               mov    0x78(%esp,1),%edx
Code;  c0192c3a <xfs_alloc_lookup+15a/394>
  12:   52                        push   %edx
Code;  c0192c3a <xfs_alloc_lookup+15a/394>
  13:   e8 00 00 00 00            call   18 <_EIP+0x18> c0192c40
<xfs_alloc_lookup+160/394>


--

/////////////Technical Coordination, Research Services////////////////////
Ian Hardy                                   Tel: 023 80 593577
Computing Services                             
Southampton University                      email: idh@xxxxxxxxxxx
Southampton  S017 1BJ, UK.                         i.d.hardy@xxxxxxxxxxx
\\'BUGS: The notion of errors is ill-defined' (IRIX man page for netstat)\


<Prev in Thread] Current Thread [Next in Thread>