xfs
[Top] [All Lists]

Re: XFS/NFS server oops ..... any ideas.

To: "Ian D. Hardy" <i.d.hardy@xxxxxxxxxxx>
Subject: Re: XFS/NFS server oops ..... any ideas.
From: Craig Tierney <ctierney@xxxxxxxx>
Date: Tue, 15 Jan 2002 12:53:58 -0700
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <3C448413.3FF98CB1@xxxxxxxxxxx>; from i.d.hardy@xxxxxxxxxxx on Tue, Jan 15, 2002 at 07:33:39PM +0000
References: <3C448413.3FF98CB1@xxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mutt/1.2.5i
Can you trigger the problem easily if you startup a dd process writing
to the disk over nfs from twenty different nodes simultaneously (different
files, same filesystem)?

Craig


> Hi,
> 
> For some time we've been having problem with a server, which is acting
> as a master/control node and NFS server for a computational cluster 
> (~180 client nodes). The server will crash after anywhere between 
> a few hours and 10 days operation. We've tried various kernels and
> XFS patch versions from 2.4.9 kernel with XFS patch-2.4.9-xfs-2001-08-17
> up to and including 2.4.16 kernel with the xfs-2.4.16-all-i386 patch,
> if anything the 2.4.9 kernel has proved the most reliable (it normally
> lasts between 4 and 10 days! - 2.4.16 lasted less than 24hrs).
> 
> I've just recovered and processed the following Oops from the most 
> recent crash (running 2.4.9 kernel), ksymoops output below which would
> appear to point to a problem in the XFS kernel code as called from the
> nfsd daemon process.
> 
> The server is a dual (1Ghz PIII) based on a SuperMicro ServerWorks LE
> motherboard with 1Gbyte RAM, 40Gbyte Maxtor system disk and a QLogic
> QLA2200 FC card connecting an external HW (IDE) RAID array. Its got
> a RedHat 6.2 based distro but with the 2.4.x series kernel and XFS
> patches. (We're just starting to run some controlled tests with a similar
> server with a RH 7.2 distro and 2.4.14 kernel/XFS 1.0.2 release and or
> 2.4.17 and latest XFS patch release).
> 
> Anyone any ideas what is causing this? or better still how to fix it?
> 
> Thanks
> 
> Ian
> 
> -
> 
> Unable to handle kernel NULL pointer dereference at virtual address 00000030
> c0192c29
> *pde = 00000000
> Oops: 0000
> CPU:    0
> EIP:    0010:[<c0192c29>]
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010246
> eax: 00000000   ebx: 0000000a   ecx: 00000000   edx: 00000000
> esi: 00000000   edi: 00002001   ebp: 00000000   esp: f527fb6c
> ds: 0018   es: 0018   ss: 0018
> Process nfsd (pid: 592, stackpage=f527f000)
> Stack: d604fe4c 00000000 00000000 f527fc54 00025960 00000010 d7625912 
> f70e7964 
>        d7625980 00000000 00000001 00027961 f527fbd0 f70e7800 00000286 
> 109ae350 
>        00000000 00000000 e935d010 00000000 00000085 f70e7800 e935d000 
> 00000757 
> Call Trace: [<c01947d0>] [<c01905ee>] [<c01dd9ab>] [<c0191787>] [<c01ded2e>] 
>    [<c01a3a08>] [<c01cb173>] [<c01e27c3>] [<c01e1cb0>] [<f8d6831a>] 
> [<c0280f8c>] 
>    [<c01ee286>] [<f8da0dda>] [<c01e1cb0>] [<c014d6f0>] [<f8da291f>] 
> [<f8da7deb>] 
>    [<f8db0320>] [<f8d9f5a3>] [<f8db0320>] [<f8d67c88>] [<f8db0280>] 
> [<f8dafd78>] 
>    [<f8d9f349>] [<c010576f>] 
> Code: 8b 52 30 89 54 24 58 51 55 8b 44 24 60 50 8b 54 24 78 52 e8 
> 
> >>EIP; c0192c28 <xfs_alloc_lookup+148/394>   <=====
> Trace; c01947d0 <xfs_alloc_lookup_le+20/28>
> Trace; c01905ee <xfs_free_ag_extent+56/4e0>
> Trace; c01dd9aa <xfs_trans_commit+24e/27c>
> Trace; c0191786 <xfs_free_extent+da/104>
> Trace; c01ded2e <xfs_trans_get_efd+22/2c>
> Trace; c01a3a08 <xfs_bmap_finish+fc/180>
> Trace; c01cb172 <xfs_itruncate_finish+28a/3fc>
> Trace; c01e27c2 <xfs_setattr+b12/fb8>
> Trace; c01e1cb0 <xfs_setattr+0/fb8>
> Trace; f8d6831a <[sunrpc]svc_udp_data_ready+5e/bc>
> Trace; c0280f8c <udp_queue_rcv_skb+130/1b0>
> Trace; c01ee286 <linvfs_notify_change+192/1bc>
> Trace; f8da0dda <[nfsd]nfsd_iget+f6/110>
> Trace; c01e1cb0 <xfs_setattr+0/fb8>
> Trace; c014d6f0 <notify_change+90/130>
> Trace; f8da291e <[nfsd]nfsd_setattr+426/564>
> Trace; f8da7dea <[nfsd]nfsd3_proc_setattr+b6/c4>
> Trace; f8db0320 <[nfsd]nfsd_procedures3+40/2c0>
> Trace; f8d9f5a2 <[nfsd]nfsd_dispatch+ca/168>
> Trace; f8db0320 <[nfsd]nfsd_procedures3+40/2c0>
> Trace; f8d67c88 <[sunrpc]svc_process+2ac/544>
> Trace; f8db0280 <[nfsd]nfsd_svcstats+0/40>
> Trace; f8dafd78 <[nfsd]nfsd_version3+0/10>
> Trace; f8d9f348 <[nfsd]nfsd+1b8/348>
> Trace; c010576e <kernel_thread+22/30>
> Code;  c0192c28 <xfs_alloc_lookup+148/394>
> 00000000 <_EIP>:
> Code;  c0192c28 <xfs_alloc_lookup+148/394>   <=====
>    0:   8b 52 30                  mov    0x30(%edx),%edx   <=====
> Code;  c0192c2a <xfs_alloc_lookup+14a/394>
>    3:   89 54 24 58               mov    %edx,0x58(%esp,1)
> Code;  c0192c2e <xfs_alloc_lookup+14e/394>
>    7:   51                        push   %ecx
> Code;  c0192c30 <xfs_alloc_lookup+150/394>
>    8:   55                        push   %ebp
> Code;  c0192c30 <xfs_alloc_lookup+150/394>
>    9:   8b 44 24 60               mov    0x60(%esp,1),%eax
> Code;  c0192c34 <xfs_alloc_lookup+154/394>
>    d:   50                        push   %eax
> Code;  c0192c36 <xfs_alloc_lookup+156/394>
>    e:   8b 54 24 78               mov    0x78(%esp,1),%edx
> Code;  c0192c3a <xfs_alloc_lookup+15a/394>
>   12:   52                        push   %edx
> Code;  c0192c3a <xfs_alloc_lookup+15a/394>
>   13:   e8 00 00 00 00            call   18 <_EIP+0x18> c0192c40
> <xfs_alloc_lookup+160/394>
> 
> 
> --
> 
> /////////////Technical Coordination, Research Services////////////////////
> Ian Hardy                                   Tel: 023 80 593577
> Computing Services                             
> Southampton University                      email: idh@xxxxxxxxxxx
> Southampton  S017 1BJ, UK.                         i.d.hardy@xxxxxxxxxxx
> \\'BUGS: The notion of errors is ill-defined' (IRIX man page for netstat)\

-- 
Craig Tierney (ctierney@xxxxxxxx)


<Prev in Thread] Current Thread [Next in Thread>