[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Kernel Oops RedHat 7.1 kernel-2.4.5 xfx-1.0.1



	As I mentioned in a previous post, I'm running RedHat 7.1 with the
manually patched 2.4.5 kernel and xfs-1.0.1 on a dual PII (400) with 1 Gig
of RAM. The XFS filesystems are located on a SAN RAID device accessed
through a qlogic 2100 Fibre Channel card (using the qla2x00 module provided
by Qlogic, ver 4.25). This system acts as a "gateway" by mounting the disks
from the SAN and then exporting them to an array of hosts (Solaris, IRIX,
Linux, AIX, etc) via NFS. So far, we have had a total of four system oops's.


	The first oops was "unprovoked". When we rebooted, the system
generated the same oops ten minutes later. Then everything was OK. Two weeks
later, we used xfs_growfs to grow one of the partitions which worked great
(very cool). A few days later, the machine was accidentaly power cycled
(human oops). During bootup, the system oops'd when trying to mount the XFS
partition we had grown. That disk remains unmountable, and will oops any of
the three machines we have on the SAN when mounted (we had a backup of the
data and sufficient extra, so we just left this partition of death for
testing). The fourth oops just occurred today when I tried to xfsdump a
partition. I've included the ksymoops output for each of the oops's in
order.

Any ideas?

Many thanks,

-poul

	This was the "unprovoked" oops:

Unable to handle kernel paging request at virtual address f8b65960
c025fe90
*pde = 37ddf067
Oops: 0000
CPU:    0
EIP:    0010:[<c025fe90>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: f8b65940   ebx: 00000003   ecx: f8b65940   edx: f7da5980
esi: 000001bc   edi: 0000002a   ebp: c03b68ec   esp: f5041ea4
ds: 0018   es: 0018   ss: 0018
Process insmod (pid: 1258, stackpage=f5041000)
Stack: c0349800 00000000 00000000 f8a89700 c023bd90 00000000 00000004
00000001
       00000001 00000000 c010ee5c 00000000 00000002 00000000 00000000
f7c60000
       f5040000 00000282 00034063 00000282 00000000 c0336e14 f8a42000
00000000
Call Trace: [<f8a89700>] [<c023bd90>] [<c010ee5c>] [<f8a42000>] [<f8a4cd65>]
[<f8a89700>]
       [<f8a42000>] [<c01147f5>] [<f8a42060>] [<c0106e0b>]
Code: 8b 40 20 85 c0 74 08 39 d0 0f 85 f1 ff ff ff 85 c0 75 27 8d

>>EIP; c025fe90 <sd_finish+60/1e0>   <=====
Trace; f8a89700 <END_OF_CODE+18ee2/????>
Trace; c023bd90 <scsi_register_host+2b0/2e0>
Trace; c010ee5c <smp_apic_timer_interrupt+ec/100>
Trace; f8a42000 <[qla2x00]fw2200tp_code01+e78a/1397e>
Trace; f8a4cd65 <[qla2x00]fw2300tp_code01+5b67/143aa>
Trace; f8a89700 <END_OF_CODE+18ee2/????>
Trace; f8a42000 <[qla2x00]fw2200tp_code01+e78a/1397e>
Trace; c01147f5 <sys_init_module+545/630>
Trace; f8a42060 <[qla2x00]fw2200tp_code01+e7ea/1397e>
Trace; c0106e0b <system_call+33/38>
Code;  c025fe90 <sd_finish+60/1e0>
00000000 <_EIP>:
Code;  c025fe90 <sd_finish+60/1e0>   <=====
   0:   8b 40 20                  mov    0x20(%eax),%eax   <=====
Code;  c025fe93 <sd_finish+63/1e0>
   3:   85 c0                     test   %eax,%eax
Code;  c025fe95 <sd_finish+65/1e0>
   5:   74 08                     je     f <_EIP+0xf> c025fe9f
<sd_finish+6f/1e0>
Code;  c025fe97 <sd_finish+67/1e0>
   7:   39 d0                     cmp    %edx,%eax
Code;  c025fe99 <sd_finish+69/1e0>
   9:   0f 85 f1 ff ff ff         jne    0 <_EIP>
Code;  c025fe9f <sd_finish+6f/1e0>
   f:   85 c0                     test   %eax,%eax
Code;  c025fea1 <sd_finish+71/1e0>
  11:   75 27                     jne    3a <_EIP+0x3a> c025feca
<sd_finish+9a/1e0>
Code;  c025fea3 <sd_finish+73/1e0>
  13:   8d 00                     lea    (%eax),%eax

	Here is the oops we received on bootup:

Unable to handle kernel NULL pointer dereference at virtual address 0000009c
c01f1e98
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<c01f1e98>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: 00000074   ebx: 00000074   ecx: 00000004   edx: ffffffe8
esi: ffffffe8   edi: c01c88a9   ebp: f3f9a54c   esp: f4ae5aac
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 682, stackpage=f4ae5000)
Stack: 00000004 ffffffe8 c01c88a9 c01c8d94 00000074 00000288 ffffffe8
f3f9a560
       c033bf20 c01c8de3 ffffffe8 00000004 c01c88a9 c01c88a9 ffffffe8
00000004
       00000000 00000000 10130323 f7c5d800 f3d14e40 00000004 c01ddb16
f7c5d800
Call Trace: [<c01c88a9>] [<c01c8d94>] [<c01c8de3>] [<c01c88a9>] [<c01c88a9>]
[<c01ddb16>]
       [<c01de60c>] [<c018e811>] [<c01e317e>] [<c019c239>] [<c01eb692>]
[<c019371d>] [<c019371d>] [<c027dd6e>]
       [<c02896c3>] [<c02a1d1e>] [<c028a35b>] [<c02a21b1>] [<c02a1cd0>]
[<c01eb858>] [<c013e06d>] [<c016c740>]
       [<c016aa24>] [<c0171062>] [<c0168671>] [<c02b63c3>] [<c0168489>]
[<c0105546>] [<c0168290>]
Code: f0 fe 4b 28 0f 88 1a f3 0c 00 8b 0b 85 c9 74 20 8d 7b 28 8d

>>EIP; c01f1e98 <mrupdatef+8/60>   <=====
Trace; c01c88a9 <xfs_iget+d9/130>
Trace; c01c8d94 <xfs_ilock_ra+74/b0>
Trace; c01c8de3 <xfs_ilock+13/20>
Trace; c01c88a9 <xfs_iget+d9/130>
Trace; c01c88a9 <xfs_iget+d9/130>
Trace; c01ddb16 <xfs_trans_iget+a6/120>
Trace; c01de60c <xfs_dir_ialloc+8c/290>
Trace; c018e811 <xfs_trans_reserve_quota_bydquots+71/a0>
Trace; c01e317e <xfs_create+3fe/aa0>
Trace; c019c239 <xfs_attr_fetch+69/e0>
Trace; c01eb692 <linvfs_common_cr+f2/2a0>
Trace; c019371d <xfs_acl_iaccess+2d/90>
Trace; c019371d <xfs_acl_iaccess+2d/90>
Trace; c027dd6e <dev_queue_xmit+11e/280>
Trace; c02896c3 <ip_output+c3/110>
Trace; c02a1d1e <udp_getfrag+4e/d0>
Trace; c028a35b <ip_build_xmit+2db/370>
Trace; c02a21b1 <udp_sendmsg+3b1/450>
Trace; c02a1cd0 <udp_getfrag+0/d0>
Trace; c01eb858 <linvfs_create+18/20>
Trace; c013e06d <vfs_create+dd/140>
Trace; c016c740 <nfsd_create_v3+2c0/430>
Trace; c016aa24 <fh_verify+484/4d0>
Trace; c0171062 <nfsd3_proc_create+132/140>
Trace; c0168671 <nfsd_dispatch+c1/160>
Trace; c02b63c3 <svc_process+353/4e0>
Trace; c0168489 <nfsd+1f9/320>
Trace; c0105546 <kernel_thread+26/30>
Trace; c0168290 <nfsd+0/320>
Code;  c01f1e98 <mrupdatef+8/60>
00000000 <_EIP>:
Code;  c01f1e98 <mrupdatef+8/60>   <=====
   0:   f0 fe 4b 28               lock decb 0x28(%ebx)   <=====
Code;  c01f1e9c <mrupdatef+c/60>
   4:   0f 88 1a f3 0c 00         js     cf324 <_EIP+0xcf324> c02c11bc
<stext_lock+45dc/7862>
Code;  c01f1ea2 <mrupdatef+12/60>
   a:   8b 0b                     mov    (%ebx),%ecx
Code;  c01f1ea4 <mrupdatef+14/60>
   c:   85 c9                     test   %ecx,%ecx
Code;  c01f1ea6 <mrupdatef+16/60>
   e:   74 20                     je     30 <_EIP+0x30> c01f1ec8
<mrupdatef+38/60>
Code;  c01f1ea8 <mrupdatef+18/60>
  10:   8d 7b 28                  lea    0x28(%ebx),%edi
Code;  c01f1eab <mrupdatef+1b/60>
  13:   8d 00                     lea    (%eax),%eax

	This is the oops we get whenever we try to mount the partition we
used xfs_growfs on:

Unable to handle kernel NULL pointer dereference at virtual address 00000152
c01c88ab
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[xfs_iget+219/304]
EIP:    0010:[<c01c88ab>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: ffffffe8   ecx: 00000000   edx: c033bf20
esi: ea284980   edi: c033bf20   ebp: ea28496c   esp: ea2e7aa8
ds: 0018   es: 0018   ss: 0018
Process mount (pid: 976, stackpage=ea2e7000)
Stack: 00000000 00000000 00000000 18000000 00000000 f7f2c000 c01d4fbf
f7f2c000
       00000000 18000000 00000000 00000000 ea2e7b00 00000000 00000000
18000000
       00000000 00000000 00000000 00000000 00000001 00000018 c01cf529
eb6700c0
Call Trace: [xlog_recover_process_iunlinks+319/608] [xfs_log_force+57/96]
[xlog_recover_finish+51/128] [xfs_log_mount_finish+30/48]
[xfs_mountfs+3445/3744] [xfs_readsb+143/224] [pagebuf_rele+49/128]
Call Trace: [<c01d4fbf>] [<c01cf529>] [<c01d5953>] [<c01cf70e>] [<c01d70a5>]
[<c01d5f8f>]
       [<c01d5fc2>] [<c01df22b>] [<c01df442>] [<c01df473>] [<c01f02be>]
[<c01575dc>] [<c0121700>] [<c0157b72>]
       [<c0146890>] [<c01481d7>] [<c0139e57>] [<c0137923>] [<c0137b20>]
[<c01386d6>] [<c01384ec>] [<c01388f4>]
       [<c0106e0b>]
Code: 66 83 bb 6a 01 00 00 00 75 1a 0f b7 83 50 01 00 00 25 f7 ff

>>EIP; c01c88ab <xfs_iget+db/130>   <=====
Trace; c01d4fbf <xlog_recover_process_iunlinks+13f/260>
Trace; c01cf529 <xfs_log_force+39/60>
Trace; c01d5953 <xlog_recover_finish+33/80>
Trace; c01cf70e <xfs_log_mount_finish+1e/30>
Trace; c01d70a5 <xfs_mountfs+d75/ea0>
Trace; c01d5f8f <xfs_readsb+8f/e0>
Trace; c01d5fc2 <xfs_readsb+c2/e0>
Trace; c01df22b <xfs_cmountfs+4eb/590>
Trace; c01df442 <xfs_mount+92/a0>
Trace; c01df473 <xfs_vfsmount+23/40>
Trace; c01f02be <linvfs_read_super+19e/290>
Trace; c01575dc <ext2_get_block+2c/550>
Trace; c0121700 <do_no_page+a0/d0>
Trace; c0157b72 <ext2_getblk+72/120>
Trace; c0146890 <destroy_inode+30/40>
Trace; c01481d7 <iput+167/170>
Trace; c0139e57 <blkdev_get+d7/100>
Trace; c0137923 <read_super+63/b0>
Trace; c0137b20 <get_sb_bdev+140/1a0>
Trace; c01386d6 <do_mount+196/310>
Trace; c01384ec <copy_mount_options+4c/a0>
Trace; c01388f4 <sys_mount+a4/110>
Trace; c0106e0b <system_call+33/38>
Code;  c01c88ab <xfs_iget+db/130>
00000000 <_EIP>:
Code;  c01c88ab <xfs_iget+db/130>   <=====
   0:   66 83 bb 6a 01 00 00      cmpw   $0x0,0x16a(%ebx)   <=====
Code;  c01c88b2 <xfs_iget+e2/130>
   7:   00 
Code;  c01c88b3 <xfs_iget+e3/130>
   8:   75 1a                     jne    24 <_EIP+0x24> c01c88cf
<xfs_iget+ff/130>
Code;  c01c88b5 <xfs_iget+e5/130>
   a:   0f b7 83 50 01 00 00      movzwl 0x150(%ebx),%eax
Code;  c01c88bc <xfs_iget+ec/130>
  11:   25 f7 ff 00 00            and    $0xfff7,%eax

	And this is the oops we got with xfsdump:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
00000000
*pde = 00000000
Oops: 0000
CPU:    1
EIP:    0010:[<00000000>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: c0396360   ebx: e8b9dbe0   ecx: cd838240   edx: c0339a20
esi: e8b9dc60   edi: e8b9dbe0   ebp: e8b9dbe0   esp: ead03ed0
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 688, stackpage=ead03000)
Stack: c0169fe4 cd838240 e8b9dc60 200074c3 00000000 c016a456 e8b9dbe0
f787a740 
       ffffff8c 00000000 200074c3 00000000 ead15604 ed857000 c016a831
e6e1ee00 
       200074c3 00000000 00000000 00000001 00000286 ead15614 11270000
ead15604 
Call Trace: [<c0169fe4>] [<c016a456>] [<c016a831>] [<c0170957>] [<c0168671>]
[<c02b63c3>] [<c0168489>] 
       [<c0105546>] [<c0168290>] 
Code:  Bad EIP value.

>>EIP; 00000000 Before first symbol
Trace; c0169fe4 <nfsd_findparent+34/100>
Trace; c016a456 <find_fh_dentry+246/390>
Trace; c016a831 <fh_verify+291/4d0>
Trace; c0170957 <nfsd3_proc_getattr+97/b0>
Trace; c0168671 <nfsd_dispatch+c1/160>
Trace; c02b63c3 <svc_process+353/4e0>
Trace; c0168489 <nfsd+1f9/320>
Trace; c0105546 <kernel_thread+26/30>
Trace; c0168290 <nfsd+0/320>