xfs
[Top] [All Lists]

RE: Re-occurance of NFS server panics

To: <I.D.Hardy@xxxxxxxxxxx>, "'Steve Lord'" <lord@xxxxxxx>
Subject: RE: Re-occurance of NFS server panics
From: "Ian D. Hardy" <I.D.Hardy@xxxxxxxxxxx>
Date: Thu, 19 Sep 2002 18:17:01 +0100
Cc: <linux-xfs@xxxxxxxxxxx>, <O.G.Parchment@xxxxxxxxxxx>, "'Russell Cattelan'" <cattelan@xxxxxxxxxxx>
Importance: Normal
In-reply-to: <200209182049.VAA11727544@plum.sucs.soton.ac.uk>
Reply-to: <I.D.Hardy@xxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Steve, +

I'm not sure if this provides any more info but I've just had another
crash on this server, again caught by 'slab.c', this time from an 'nfsd'
process:

Reading Oops report from the terminal
 kernel BUG at slab.c:1218!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c0132aeb>]    Tainted: P 
EFLAGS: 00010002
eax: c1c0e060   ebx: 00091858   ecx: 00001000   edx: 000005a6
esi: 5a5a5a5a   edi: c96c8bcc   ebp: ebdfe000   esp: f381fe8c
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 856, stackpage=f381f000)
Stack: ebdfe000 c96c8bcc ebdff000 ebdfe000 c0132e33 c1c0e060 c96c8bcc
ebdfe000 
       0000001b f7ee5720 f7ee5694 c1c0e060 008c0f70 eeafd000 c013360a
c1c0e060 
       f7ee5714 0000001e 00000286 eeafd000 ed79ffff c1c0e4d0 c010cef8
000003a0 
Call Trace:    [<c0132e33>] [<c013360a>] [<c010cef8>] [<c02558a2>]
[<c02558bc>]
  [<c0255a36>] [<c02557eb>] [<c0255898>] [<c0255e11>] [<f89d0488>]
[<f89d1820>]
  [<f89d14d8>] [<f8a252b4>] [<c0107296>] [<f8a251a0>]

Code: 0f 0b c2 04 a0 54 2b c0 89 d8 0f af c1 8d 04 30 39 c5 74 08 
 
Entering kdb (current=0xf381e000, pid 856) on processor 0 Oops: invalid
operand
due to oops @ 0xc0132aeb
eax = 0xc1c0e060 ebx = 0x00091858 ecx = 0x00001000 edx = 0x000005a6 
esi = 0x5a5a5a5a edi = 0xc96c8bcc esp = 0xf381fe8c eip = 0xc0132aeb 
ebp = 0xebdfe000 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010002 
xds = 0xc0250018 xes = 0x00000018 origeax = 0xffffffff &regs =
0xf381fe58
[0]kdb> 
 kernel BUG at slab.c:1218!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c0132aeb>]    Tainted: P 
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: c1c0e060   ebx: 00091858   ecx: 00001000   edx: 000005a6
esi: 5a5a5a5a   edi: c96c8bcc   ebp: ebdfe000   esp: f381fe8c
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 856, stackpage=f381f000)
Stack: ebdfe000 c96c8bcc ebdff000 ebdfe000 c0132e33 c1c0e060 c96c8bcc
ebdfe000 
       0000001b f7ee5720 f7ee5694 c1c0e060 008c0f70 eeafd000 c013360a
c1c0e060 
       f7ee5714 0000001e 00000286 eeafd000 ed79ffff c1c0e4d0 c010cef8
000003a0 
Call Trace:    [<c0132e33>] [<c013360a>] [<c010cef8>] [<c02558a2>]
[<c02558bc>]
  [<c0255a36>] [<c02557eb>] [<c0255898>] [<c0255e11>] [<f89d0488>]
[<f89d1820>]
  [<f89d14d8>] [<f8a252b4>] [<c0107296>] [<f8a251a0>]
Code: 0f 0b c2 04 a0 54 2b c0 89 d8 0f af c1 8d 04 30 39 c5 74 08 

>>EIP; c0132aea <kmem_extra_free_checks+2a/70>   <=====
Trace; c0132e32 <free_block+162/210>
Trace; c013360a <kfree+14a/180>
Trace; c010cef8 <call_do_IRQ+6/e>
Trace; c02558a2 <skb_release_data+72/80>
Trace; c02558bc <kfree_skbmem+c/70>
Trace; c0255a36 <__kfree_skb+116/120>
Trace; c02557ea <skb_drop_fraglist+3a/50>
Trace; c0255898 <skb_release_data+68/80>
Trace; c0255e10 <skb_linearize+90/f0>
Trace; f89d0488 <[sunrpc]svc_udp_recvfrom+128/380>
Trace; f89d1820 <[sunrpc]svc_send+70/1a0>
Trace; f89d14d8 <[sunrpc]svc_recv+2c8/470>
Trace; f8a252b4 <[nfsd]nfsd+114/350>
Trace; c0107296 <kernel_thread+26/30>
Trace; f8a251a0 <[nfsd]nfsd+0/350>
Code;  c0132aea <kmem_extra_free_checks+2a/70>
00000000 <_EIP>:
Code;  c0132aea <kmem_extra_free_checks+2a/70>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0132aec <kmem_extra_free_checks+2c/70>
   2:   c2 04 a0                  ret    $0xa004
Code;  c0132aee <kmem_extra_free_checks+2e/70>
   5:   54                        push   %esp
Code;  c0132af0 <kmem_extra_free_checks+30/70>
   6:   2b c0                     sub    %eax,%eax
Code;  c0132af2 <kmem_extra_free_checks+32/70>
   8:   89 d8                     mov    %ebx,%eax
Code;  c0132af4 <kmem_extra_free_checks+34/70>
   a:   0f af c1                  imul   %ecx,%eax
Code;  c0132af6 <kmem_extra_free_checks+36/70>
   d:   8d 04 30                  lea    (%eax,%esi,1),%eax
Code;  c0132afa <kmem_extra_free_checks+3a/70>
  10:   39 c5                     cmp    %eax,%ebp
Code;  c0132afc <kmem_extra_free_checks+3c/70>
  12:   74 08                     je     1c <_EIP+0x1c> c0132b06
<kmem_extra_free_checks+46/70>

Entering kdb (current=0xf381e000, pid 856) on processor 0 Oops: invalid
operand
eax = 0xc1c0e060 ebx = 0x00091858 ecx = 0x00001000 edx = 0x000005a6 
esi = 0x5a5a5a5a edi = 0xc96c8bcc esp = 0xf381fe8c eip = 0xc0132aeb 
ebp = 0xebdfe000 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010002 


-----Original Message-----
From: I.D.Hardy@xxxxxxxxxxx [mailto:I.D.Hardy@xxxxxxxxxxx] 
Sent: 18 September 2002 21:49

Steve +,

> 
> On Wed, 2002-09-18 at 13:31, Ian D. Hardy wrote:
> > Steve,
> > 
> > >On Mon, 2002-09-16 at 11:56, Ian D. Hardy wrote:
> > >> Steve,
> > >> 
> > >> Thanks for the quick response. I don't always get a Oops output
> > >> (sometimes the server just hangs and requires a reboot). However
as
> > it
> > >> happens the server has just crashed again with the following Oops
> > >> (through 'ksymoops'):
> > >
> > >This one suggests heap corruption more than anything else.
> > >
> > >Steve
> > 
> > I upgraded the kernel to the current CVS (2.4.19-xfs) tree (as of 
> > Monday 16th Sept.) today and got a very similar looking Ooops to the

> > one I reported on Monday, see below. I guess it is very difficult to

> > know what would have caused any heap corruption. As I understand it 
> > there's nothing in these latest panics to directly link them with 
> > XFS? I need to do some more thinking on this. Any pointers would be 
> > very welcome.
> > 
> > Regards Ian Hardy
> > 
> > 
> >  kernel BUG at slab.c:1439!
> 
> 
> OK, so you have slab debugging turned on, which I was going to ask you

> to do. Looks like someone walked off the end of an allocation here, 
> that is progress.
> 
> So, the question is what did it, thats the really hard part! Is this 
> machine just running XFS via NFS, or is it doing anything else? Also, 
> which options do you have turned on in XFS, in fact, sending the whole

> kernel config might be an idea.
> 
> Steve
> 

Thanks for your continued help/interest, it is much appreciated.

The server (dual 1Ghz PIII, with 1Gbyte memory) is a dedicated NFS
fileserver (no users have direct access to it). It is serving ~260 NFS
clients (part of a computational/Beowulf system). The server has a
40Gbyte IDE system disk and 2 FC-IDE connected RAID units (each with
~500Gbytes usable storage - RAID 5 configuration). The RAID units are
connected via a Qlogic QLA2200 HBA and a FC switch. The 2 RAID units are
stripped together(RAID 0) using the kernel 'md' RAID driver.

Below is the kernel configuration file:


-- 

Regards and thanks

Ian

--
Ian Hardy                                   Tel: 023 80593577
Research Services                           Fax: 023 80593131
Information Systems Services                email: i.d.hardy@xxxxxxxxxxx

Southampton University                     
Southampton  S017 1BJ, UK.



<Prev in Thread] Current Thread [Next in Thread>