[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
XFS NFS server Oops
Hi,
Anyone any ideas on the following Oops (processed with ksymoops 2.4.3). It is
from a NFS server (Dual 1Ghz Supermicro LE, 1Gbyte RAM, 40Gbyte Maxtor IDE
system disk, Zero-D/GForce RI Fibrechannel to IDE hardware RAID-5 500Gbyte
disk unit). It is running the Linux 2.4.17-xfs kernel taken as a CVS image
on 27th January. The main area of disk it is serving is on the HW RAID unit,
which is the only XFS filesystem on the system. The system had been up
for just over 3 days when it crashed.
I reported a very similar failure a few weeks ago, at that time running a
2.4.9 based kernel, Steve Lord suggested that we tried the latest CVS image
as this had fixed some memory alloacation problems.
The machine is essentially an NFS fileserver to a computational cluster. Though
of possible interest is the 'save' process that was running on one of the
processes, this is the Legato Networker backup client process (which was
performing a full backup of the XFS filesystem at the time). I don't think
this is significant as I was seeing these crashes (at ~4 to 12 day intervals)
with the 2.4.9 kernel not dependant upon a 'save' session running.
Oops details:
c012fefb
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c012fefb>] Not tainted
EFLAGS: 00010002
eax: 00000000 ebx: 00000001 ecx: c3969910 edx: c1000000
esi: f8e34000 edi: 00000286 ebp: 00000000 esp: f7ee3e2c
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 5, stackpage=f7ee3000)
Stack: f8e34000 d7b574e8 c01beb99 c01bee54 d7b57574 d7b57534 d7b574e8 00000000
c01c1d49 f8e34000 00014010 d7b574e8 00000001 00000000 c01c1db7 d7b574e8
00000000 d7b574e8 c01dbdb5 d7b574e8 d7b574e8 c6b36f00 00000000 c01dbc72
Call Trace: [<c01beb99>] [<c01bee54>] [<c01c1d49>] [<c01c1db7>] [<c01dbdb5>]
[<c01dbc72>] [<c01eb431>] [<c01eb8b6>] [<c01eba03>] [<c01ea9da>] [<c014c125>]
[<c014b4ff>] [<c014c1b0>] [<c014c409>] [<c014c450>] [<c013138e>] [<c01313ec>]
[<c0131491>] [<c0131506>] [<c0131641>] [<c01315a0>] [<c0105000>] [<c0105826>]
[<c01315a0>]
Code: 8b 13 3b 53 04 73 0e 89 74 93 08 ff 03 eb 3c 8d b6 00 00 00
<1>Unable to handle kernel paging request at virtual address fcdff82f
printing eip:
c012a147
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c012a147>] Not taintedUnable to handle kernel NULL pointer
dereference at virtual address 00000001
EFLAGS: 00010286
eax: fcdff827 ebx: c5d334d0 ecx: 00000012 c012fefb
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c012fefb>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: 00000000 ebx: 00000001 ecx: c3969910 edx: c1000000
esi: f8e34000 edi: 00000286 ebp: 00000000 esp: f7ee3e2c
ds: 0018 es: 0018 ss: 0018
edx: 0002ef8f
esi: 00005451 edi: 0000545e ebp: f7fbbe3c Process kswapd (pid: 5,
stackpage=f7ee3000)
Stack: f8e34000 d7b574e8 c01beb99 c01bee54 d7b57574 d7b57534 d7b574e8 00000000
esp: cd6b3e40
ds: 0018 es: 0018 ss: 0018
Process save (pid: c01c1d49 f8e34000 00014010 d7b574e8 00000001 00000000
c01c1db7 d7b574e8
00000000 d7b574e8 c01dbdb5 d7b574e8 d7b574e8 c6b36f00 00000000 c01dbc72
Call Trace: [<c01beb99>] [<c01bee54>] [<c01c1d49>] [<c01c1db7>] [<c01dbdb5>]
[<c01dbc72>] [<c01eb431>] [<c01eb8b6>] [<c01eba03>] [<c01ea9da>] [<c014c125>]
20987, stackpage=cd6b3000)
Stack: dffc54e0 0000000d 00005451 dff [<c014b4ff>] [<c014c1b0>] [<c014c409>]
[<c014c450>] [<c013138e>] [<c01313ec>]
[<c0131491>] [<c0131506>] [<c0131641>] [<c01315a0>] [<c0105000>] [<c0105826>]
[<c01315a0>]
Code: 8b 13 3b 53 04 73 0e 89 74 93 08 ff 03 eb 3c 8d b6 00 00 00
c54e0 00000020 c012a7c5 0000001f 0000828f
c5d33420 c1f38440 c5d334d0 00005432 c012aa37 00000001 dffc54e0 c5d33420
c1f38440 00001000 00000001 00000000 00000000 c5d33420 af244e98 ffffffff
Call Trace: [<c012a7c5>] [<c012aa37>] [<c012afdc>] [<c012ae80>] [<c01e6d31>]
[<c01e324e>] [<c0138496>] [<c010712b>]
Code: 39 58 08 75 f4 39 78 0c 75 ef c6 05 00 e6 32 c0 01 85 c0 75
>>EIP; c012fefa <kfree+3a/90> <=====
Trace; c01beb98 <xfs_ireclaim+18/70>
Trace; c01bee54 <xfs_ilock_ra+74/b0>
Trace; c01c1d48 <xfs_idestroy_fork+98/d0>
Trace; c01c1db6 <xfs_idestroy+36/90>
Trace; c01dbdb4 <xfs_finish_reclaim+114/120>
Trace; c01dbc72 <xfs_reclaim+1c2/1f0>
Trace; c01eb430 <vn_reclaim+20/60>
Trace; c01eb8b6 <vn_purge+a6/d0>
Trace; c01eba02 <vn_remove+42/60>
Trace; c01ea9da <linvfs_clear_inode+a/30>
Trace; c014c124 <clear_inode+b4/100>
Trace; c014b4fe <destroy_inode+1e/30>
Trace; c014c1b0 <dispose_list+40/60>
Trace; c014c408 <prune_icache+b8/e0>
Trace; c014c450 <shrink_icache_memory+20/40>
Trace; c013138e <shrink_caches+6e/90>
Trace; c01313ec <try_to_free_pages+3c/60>
Trace; c0131490 <kswapd_balance_pgdat+50/a0>
Trace; c0131506 <kswapd_balance+26/40>
Trace; c0131640 <kswapd+a0/c0>
Trace; c01315a0 <kswapd+0/c0>
Trace; c0105000 <_stext+0/0>
Trace; c0105826 <kernel_thread+26/30>
Trace; c01315a0 <kswapd+0/c0>
Code; c012fefa <kfree+3a/90>
00000000 <_EIP>:
Code; c012fefa <kfree+3a/90> <=====
0: 8b 13 mov (%ebx),%edx <=====
Code; c012fefc <kfree+3c/90>
2: 3b 53 04 cmp 0x4(%ebx),%edx
Code; c012fefe <kfree+3e/90>
5: 73 0e jae 15 <_EIP+0x15> c012ff0e <kfree+4e/90>
Code; c012ff00 <kfree+40/90>
7: 89 74 93 08 mov %esi,0x8(%ebx,%edx,4)
Code; c012ff04 <kfree+44/90>
b: ff 03 incl (%ebx)
Code; c012ff06 <kfree+46/90>
d: eb 3c jmp 4b <_EIP+0x4b> c012ff44 <kfree+84/90>
Code; c012ff08 <kfree+48/90>
f: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
<1>Unable to handle kernel paging request at virtual address fcdff82f
c012a147
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c012a147>] Not tainted
EFLAGS: 00010286
eax: fcdff827 ebx: c5d334d0 ecx: 00000012 edx: 0002ef8f
esi: 00005451 edi: 0000545e ebp: f7fbbe3c esp: cd6b3e40
ds: 0018 es: 0018 ss: 0018
Process save (pid: 20987, stackpage=cd6b3000)
Stack: dffc54e0 0000000d 00005451 dffc54e0 00000020 c012a7c5 0000001f 0000828f
c5d33420 c1f38440 c5d334d0 00005432 c012aa37 00000001 dffc54e0 c5d33420
c1f38440 00001000 00000001 00000000 00000000 c5d33420 af244e98 ffffffff
Call Trace: [<c012a7c5>] [<c012aa37>] [<c012afdc>] [<c012ae80>] [<c01e6d31>]
[<c01e324e>] [<c0138496>] [<c010712b>]
Code: 39 58 08 75 f4 39 78 0c 75 ef c6 05 00 e6 32 c0 01 85 c0 75
>>EIP; c012a146 <page_cache_read+56/c0> <=====
Trace; c012a7c4 <generic_file_readahead+f4/130>
Trace; c012aa36 <do_generic_file_read+1f6/460>
Trace; c012afdc <generic_file_read+7c/130>
Trace; c012ae80 <file_read_actor+0/e0>
Trace; c01e6d30 <xfs_read+190/1f0>
Trace; c01e324e <linvfs_read+7e/b0>
Trace; c0138496 <sys_read+96/d0>
Trace; c010712a <system_call+32/38>
Code; c012a146 <page_cache_read+56/c0>
00000000 <_EIP>:
Code; c012a146 <page_cache_read+56/c0> <=====
0: 39 58 08 cmp %ebx,0x8(%eax) <=====
Code; c012a148 <page_cache_read+58/c0>
3: 75 f4 jne fffffff9 <_EIP+0xfffffff9> c012a13e
<page_cache_read+4e/c0>
Code; c012a14a <page_cache_read+5a/c0>
5: 39 78 0c cmp %edi,0xc(%eax)
Code; c012a14e <page_cache_read+5e/c0>
8: 75 ef jne fffffff9 <_EIP+0xfffffff9> c012a13e
<page_cache_read+4e/c0>
Code; c012a150 <page_cache_read+60/c0>
a: c6 05 00 e6 32 c0 01 movb $0x1,0xc032e600
Code; c012a156 <page_cache_read+66/c0>
11: 85 c0 test %eax,%eax
Code; c012a158 <page_cache_read+68/c0>
13: 75 00 jne 15 <_EIP+0x15> c012a15a
<page_cache_read+6a/c0>
--
Any ideas?
Thanks
Ian Hardy