Steve, +
I'm not sure if this provides any more info but I've just had another
crash on this server, again caught by 'slab.c', this time from an 'nfsd'
process:
Reading Oops report from the terminal
kernel BUG at slab.c:1218!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0132aeb>] Tainted: P
EFLAGS: 00010002
eax: c1c0e060 ebx: 00091858 ecx: 00001000 edx: 000005a6
esi: 5a5a5a5a edi: c96c8bcc ebp: ebdfe000 esp: f381fe8c
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 856, stackpage=f381f000)
Stack: ebdfe000 c96c8bcc ebdff000 ebdfe000 c0132e33 c1c0e060 c96c8bcc
ebdfe000
0000001b f7ee5720 f7ee5694 c1c0e060 008c0f70 eeafd000 c013360a
c1c0e060
f7ee5714 0000001e 00000286 eeafd000 ed79ffff c1c0e4d0 c010cef8
000003a0
Call Trace: [<c0132e33>] [<c013360a>] [<c010cef8>] [<c02558a2>]
[<c02558bc>]
[<c0255a36>] [<c02557eb>] [<c0255898>] [<c0255e11>] [<f89d0488>]
[<f89d1820>]
[<f89d14d8>] [<f8a252b4>] [<c0107296>] [<f8a251a0>]
Code: 0f 0b c2 04 a0 54 2b c0 89 d8 0f af c1 8d 04 30 39 c5 74 08
Entering kdb (current=0xf381e000, pid 856) on processor 0 Oops: invalid
operand
due to oops @ 0xc0132aeb
eax = 0xc1c0e060 ebx = 0x00091858 ecx = 0x00001000 edx = 0x000005a6
esi = 0x5a5a5a5a edi = 0xc96c8bcc esp = 0xf381fe8c eip = 0xc0132aeb
ebp = 0xebdfe000 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010002
xds = 0xc0250018 xes = 0x00000018 origeax = 0xffffffff ®s =
0xf381fe58
[0]kdb>
kernel BUG at slab.c:1218!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0132aeb>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: c1c0e060 ebx: 00091858 ecx: 00001000 edx: 000005a6
esi: 5a5a5a5a edi: c96c8bcc ebp: ebdfe000 esp: f381fe8c
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 856, stackpage=f381f000)
Stack: ebdfe000 c96c8bcc ebdff000 ebdfe000 c0132e33 c1c0e060 c96c8bcc
ebdfe000
0000001b f7ee5720 f7ee5694 c1c0e060 008c0f70 eeafd000 c013360a
c1c0e060
f7ee5714 0000001e 00000286 eeafd000 ed79ffff c1c0e4d0 c010cef8
000003a0
Call Trace: [<c0132e33>] [<c013360a>] [<c010cef8>] [<c02558a2>]
[<c02558bc>]
[<c0255a36>] [<c02557eb>] [<c0255898>] [<c0255e11>] [<f89d0488>]
[<f89d1820>]
[<f89d14d8>] [<f8a252b4>] [<c0107296>] [<f8a251a0>]
Code: 0f 0b c2 04 a0 54 2b c0 89 d8 0f af c1 8d 04 30 39 c5 74 08
>>EIP; c0132aea <kmem_extra_free_checks+2a/70> <=====
Trace; c0132e32 <free_block+162/210>
Trace; c013360a <kfree+14a/180>
Trace; c010cef8 <call_do_IRQ+6/e>
Trace; c02558a2 <skb_release_data+72/80>
Trace; c02558bc <kfree_skbmem+c/70>
Trace; c0255a36 <__kfree_skb+116/120>
Trace; c02557ea <skb_drop_fraglist+3a/50>
Trace; c0255898 <skb_release_data+68/80>
Trace; c0255e10 <skb_linearize+90/f0>
Trace; f89d0488 <[sunrpc]svc_udp_recvfrom+128/380>
Trace; f89d1820 <[sunrpc]svc_send+70/1a0>
Trace; f89d14d8 <[sunrpc]svc_recv+2c8/470>
Trace; f8a252b4 <[nfsd]nfsd+114/350>
Trace; c0107296 <kernel_thread+26/30>
Trace; f8a251a0 <[nfsd]nfsd+0/350>
Code; c0132aea <kmem_extra_free_checks+2a/70>
00000000 <_EIP>:
Code; c0132aea <kmem_extra_free_checks+2a/70> <=====
0: 0f 0b ud2a <=====
Code; c0132aec <kmem_extra_free_checks+2c/70>
2: c2 04 a0 ret $0xa004
Code; c0132aee <kmem_extra_free_checks+2e/70>
5: 54 push %esp
Code; c0132af0 <kmem_extra_free_checks+30/70>
6: 2b c0 sub %eax,%eax
Code; c0132af2 <kmem_extra_free_checks+32/70>
8: 89 d8 mov %ebx,%eax
Code; c0132af4 <kmem_extra_free_checks+34/70>
a: 0f af c1 imul %ecx,%eax
Code; c0132af6 <kmem_extra_free_checks+36/70>
d: 8d 04 30 lea (%eax,%esi,1),%eax
Code; c0132afa <kmem_extra_free_checks+3a/70>
10: 39 c5 cmp %eax,%ebp
Code; c0132afc <kmem_extra_free_checks+3c/70>
12: 74 08 je 1c <_EIP+0x1c> c0132b06
<kmem_extra_free_checks+46/70>
Entering kdb (current=0xf381e000, pid 856) on processor 0 Oops: invalid
operand
eax = 0xc1c0e060 ebx = 0x00091858 ecx = 0x00001000 edx = 0x000005a6
esi = 0x5a5a5a5a edi = 0xc96c8bcc esp = 0xf381fe8c eip = 0xc0132aeb
ebp = 0xebdfe000 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010002
-----Original Message-----
From: I.D.Hardy@xxxxxxxxxxx [mailto:I.D.Hardy@xxxxxxxxxxx]
Sent: 18 September 2002 21:49
Steve +,
>
> On Wed, 2002-09-18 at 13:31, Ian D. Hardy wrote:
> > Steve,
> >
> > >On Mon, 2002-09-16 at 11:56, Ian D. Hardy wrote:
> > >> Steve,
> > >>
> > >> Thanks for the quick response. I don't always get a Oops output
> > >> (sometimes the server just hangs and requires a reboot). However
as
> > it
> > >> happens the server has just crashed again with the following Oops
> > >> (through 'ksymoops'):
> > >
> > >This one suggests heap corruption more than anything else.
> > >
> > >Steve
> >
> > I upgraded the kernel to the current CVS (2.4.19-xfs) tree (as of
> > Monday 16th Sept.) today and got a very similar looking Ooops to the
> > one I reported on Monday, see below. I guess it is very difficult to
> > know what would have caused any heap corruption. As I understand it
> > there's nothing in these latest panics to directly link them with
> > XFS? I need to do some more thinking on this. Any pointers would be
> > very welcome.
> >
> > Regards Ian Hardy
> >
> >
> > kernel BUG at slab.c:1439!
>
>
> OK, so you have slab debugging turned on, which I was going to ask you
> to do. Looks like someone walked off the end of an allocation here,
> that is progress.
>
> So, the question is what did it, thats the really hard part! Is this
> machine just running XFS via NFS, or is it doing anything else? Also,
> which options do you have turned on in XFS, in fact, sending the whole
> kernel config might be an idea.
>
> Steve
>
Thanks for your continued help/interest, it is much appreciated.
The server (dual 1Ghz PIII, with 1Gbyte memory) is a dedicated NFS
fileserver (no users have direct access to it). It is serving ~260 NFS
clients (part of a computational/Beowulf system). The server has a
40Gbyte IDE system disk and 2 FC-IDE connected RAID units (each with
~500Gbytes usable storage - RAID 5 configuration). The RAID units are
connected via a Qlogic QLA2200 HBA and a FC switch. The 2 RAID units are
stripped together(RAID 0) using the kernel 'md' RAID driver.
Below is the kernel configuration file:
--
Regards and thanks
Ian
--
Ian Hardy Tel: 023 80593577
Research Services Fax: 023 80593131
Information Systems Services email: i.d.hardy@xxxxxxxxxxx
Southampton University
Southampton S017 1BJ, UK.
|