As I mentioned in a previous post, I'm running RedHat 7.1 with the
manually patched 2.4.5 kernel and xfs-1.0.1 on a dual PII (400) with 1 Gig
of RAM. The XFS filesystems are located on a SAN RAID device accessed
through a qlogic 2100 Fibre Channel card (using the qla2x00 module provided
by Qlogic, ver 4.25). This system acts as a "gateway" by mounting the disks
from the SAN and then exporting them to an array of hosts (Solaris, IRIX,
Linux, AIX, etc) via NFS. So far, we have had a total of four system oops's.
The first oops was "unprovoked". When we rebooted, the system
generated the same oops ten minutes later. Then everything was OK. Two weeks
later, we used xfs_growfs to grow one of the partitions which worked great
(very cool). A few days later, the machine was accidentaly power cycled
(human oops). During bootup, the system oops'd when trying to mount the XFS
partition we had grown. That disk remains unmountable, and will oops any of
the three machines we have on the SAN when mounted (we had a backup of the
data and sufficient extra, so we just left this partition of death for
testing). The fourth oops just occurred today when I tried to xfsdump a
partition. I've included the ksymoops output for each of the oops's in
order.
Any ideas?
Many thanks,
-poul
This was the "unprovoked" oops:
Unable to handle kernel paging request at virtual address f8b65960
c025fe90
*pde = 37ddf067
Oops: 0000
CPU: 0
EIP: 0010:[<c025fe90>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: f8b65940 ebx: 00000003 ecx: f8b65940 edx: f7da5980
esi: 000001bc edi: 0000002a ebp: c03b68ec esp: f5041ea4
ds: 0018 es: 0018 ss: 0018
Process insmod (pid: 1258, stackpage=f5041000)
Stack: c0349800 00000000 00000000 f8a89700 c023bd90 00000000 00000004
00000001
00000001 00000000 c010ee5c 00000000 00000002 00000000 00000000
f7c60000
f5040000 00000282 00034063 00000282 00000000 c0336e14 f8a42000
00000000
Call Trace: [<f8a89700>] [<c023bd90>] [<c010ee5c>] [<f8a42000>] [<f8a4cd65>]
[<f8a89700>]
[<f8a42000>] [<c01147f5>] [<f8a42060>] [<c0106e0b>]
Code: 8b 40 20 85 c0 74 08 39 d0 0f 85 f1 ff ff ff 85 c0 75 27 8d
>>EIP; c025fe90 <sd_finish+60/1e0> <=====
Trace; f8a89700 <END_OF_CODE+18ee2/????>
Trace; c023bd90 <scsi_register_host+2b0/2e0>
Trace; c010ee5c <smp_apic_timer_interrupt+ec/100>
Trace; f8a42000 <[qla2x00]fw2200tp_code01+e78a/1397e>
Trace; f8a4cd65 <[qla2x00]fw2300tp_code01+5b67/143aa>
Trace; f8a89700 <END_OF_CODE+18ee2/????>
Trace; f8a42000 <[qla2x00]fw2200tp_code01+e78a/1397e>
Trace; c01147f5 <sys_init_module+545/630>
Trace; f8a42060 <[qla2x00]fw2200tp_code01+e7ea/1397e>
Trace; c0106e0b <system_call+33/38>
Code; c025fe90 <sd_finish+60/1e0>
00000000 <_EIP>:
Code; c025fe90 <sd_finish+60/1e0> <=====
0: 8b 40 20 mov 0x20(%eax),%eax <=====
Code; c025fe93 <sd_finish+63/1e0>
3: 85 c0 test %eax,%eax
Code; c025fe95 <sd_finish+65/1e0>
5: 74 08 je f <_EIP+0xf> c025fe9f
<sd_finish+6f/1e0>
Code; c025fe97 <sd_finish+67/1e0>
7: 39 d0 cmp %edx,%eax
Code; c025fe99 <sd_finish+69/1e0>
9: 0f 85 f1 ff ff ff jne 0 <_EIP>
Code; c025fe9f <sd_finish+6f/1e0>
f: 85 c0 test %eax,%eax
Code; c025fea1 <sd_finish+71/1e0>
11: 75 27 jne 3a <_EIP+0x3a> c025feca
<sd_finish+9a/1e0>
Code; c025fea3 <sd_finish+73/1e0>
13: 8d 00 lea (%eax),%eax
Here is the oops we received on bootup:
Unable to handle kernel NULL pointer dereference at virtual address 0000009c
c01f1e98
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c01f1e98>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: 00000074 ebx: 00000074 ecx: 00000004 edx: ffffffe8
esi: ffffffe8 edi: c01c88a9 ebp: f3f9a54c esp: f4ae5aac
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 682, stackpage=f4ae5000)
Stack: 00000004 ffffffe8 c01c88a9 c01c8d94 00000074 00000288 ffffffe8
f3f9a560
c033bf20 c01c8de3 ffffffe8 00000004 c01c88a9 c01c88a9 ffffffe8
00000004
00000000 00000000 10130323 f7c5d800 f3d14e40 00000004 c01ddb16
f7c5d800
Call Trace: [<c01c88a9>] [<c01c8d94>] [<c01c8de3>] [<c01c88a9>] [<c01c88a9>]
[<c01ddb16>]
[<c01de60c>] [<c018e811>] [<c01e317e>] [<c019c239>] [<c01eb692>]
[<c019371d>] [<c019371d>] [<c027dd6e>]
[<c02896c3>] [<c02a1d1e>] [<c028a35b>] [<c02a21b1>] [<c02a1cd0>]
[<c01eb858>] [<c013e06d>] [<c016c740>]
[<c016aa24>] [<c0171062>] [<c0168671>] [<c02b63c3>] [<c0168489>]
[<c0105546>] [<c0168290>]
Code: f0 fe 4b 28 0f 88 1a f3 0c 00 8b 0b 85 c9 74 20 8d 7b 28 8d
>>EIP; c01f1e98 <mrupdatef+8/60> <=====
Trace; c01c88a9 <xfs_iget+d9/130>
Trace; c01c8d94 <xfs_ilock_ra+74/b0>
Trace; c01c8de3 <xfs_ilock+13/20>
Trace; c01c88a9 <xfs_iget+d9/130>
Trace; c01c88a9 <xfs_iget+d9/130>
Trace; c01ddb16 <xfs_trans_iget+a6/120>
Trace; c01de60c <xfs_dir_ialloc+8c/290>
Trace; c018e811 <xfs_trans_reserve_quota_bydquots+71/a0>
Trace; c01e317e <xfs_create+3fe/aa0>
Trace; c019c239 <xfs_attr_fetch+69/e0>
Trace; c01eb692 <linvfs_common_cr+f2/2a0>
Trace; c019371d <xfs_acl_iaccess+2d/90>
Trace; c019371d <xfs_acl_iaccess+2d/90>
Trace; c027dd6e <dev_queue_xmit+11e/280>
Trace; c02896c3 <ip_output+c3/110>
Trace; c02a1d1e <udp_getfrag+4e/d0>
Trace; c028a35b <ip_build_xmit+2db/370>
Trace; c02a21b1 <udp_sendmsg+3b1/450>
Trace; c02a1cd0 <udp_getfrag+0/d0>
Trace; c01eb858 <linvfs_create+18/20>
Trace; c013e06d <vfs_create+dd/140>
Trace; c016c740 <nfsd_create_v3+2c0/430>
Trace; c016aa24 <fh_verify+484/4d0>
Trace; c0171062 <nfsd3_proc_create+132/140>
Trace; c0168671 <nfsd_dispatch+c1/160>
Trace; c02b63c3 <svc_process+353/4e0>
Trace; c0168489 <nfsd+1f9/320>
Trace; c0105546 <kernel_thread+26/30>
Trace; c0168290 <nfsd+0/320>
Code; c01f1e98 <mrupdatef+8/60>
00000000 <_EIP>:
Code; c01f1e98 <mrupdatef+8/60> <=====
0: f0 fe 4b 28 lock decb 0x28(%ebx) <=====
Code; c01f1e9c <mrupdatef+c/60>
4: 0f 88 1a f3 0c 00 js cf324 <_EIP+0xcf324> c02c11bc
<stext_lock+45dc/7862>
Code; c01f1ea2 <mrupdatef+12/60>
a: 8b 0b mov (%ebx),%ecx
Code; c01f1ea4 <mrupdatef+14/60>
c: 85 c9 test %ecx,%ecx
Code; c01f1ea6 <mrupdatef+16/60>
e: 74 20 je 30 <_EIP+0x30> c01f1ec8
<mrupdatef+38/60>
Code; c01f1ea8 <mrupdatef+18/60>
10: 8d 7b 28 lea 0x28(%ebx),%edi
Code; c01f1eab <mrupdatef+1b/60>
13: 8d 00 lea (%eax),%eax
This is the oops we get whenever we try to mount the partition we
used xfs_growfs on:
Unable to handle kernel NULL pointer dereference at virtual address 00000152
c01c88ab
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[xfs_iget+219/304]
EIP: 0010:[<c01c88ab>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000 ebx: ffffffe8 ecx: 00000000 edx: c033bf20
esi: ea284980 edi: c033bf20 ebp: ea28496c esp: ea2e7aa8
ds: 0018 es: 0018 ss: 0018
Process mount (pid: 976, stackpage=ea2e7000)
Stack: 00000000 00000000 00000000 18000000 00000000 f7f2c000 c01d4fbf
f7f2c000
00000000 18000000 00000000 00000000 ea2e7b00 00000000 00000000
18000000
00000000 00000000 00000000 00000000 00000001 00000018 c01cf529
eb6700c0
Call Trace: [xlog_recover_process_iunlinks+319/608] [xfs_log_force+57/96]
[xlog_recover_finish+51/128] [xfs_log_mount_finish+30/48]
[xfs_mountfs+3445/3744] [xfs_readsb+143/224] [pagebuf_rele+49/128]
Call Trace: [<c01d4fbf>] [<c01cf529>] [<c01d5953>] [<c01cf70e>] [<c01d70a5>]
[<c01d5f8f>]
[<c01d5fc2>] [<c01df22b>] [<c01df442>] [<c01df473>] [<c01f02be>]
[<c01575dc>] [<c0121700>] [<c0157b72>]
[<c0146890>] [<c01481d7>] [<c0139e57>] [<c0137923>] [<c0137b20>]
[<c01386d6>] [<c01384ec>] [<c01388f4>]
[<c0106e0b>]
Code: 66 83 bb 6a 01 00 00 00 75 1a 0f b7 83 50 01 00 00 25 f7 ff
>>EIP; c01c88ab <xfs_iget+db/130> <=====
Trace; c01d4fbf <xlog_recover_process_iunlinks+13f/260>
Trace; c01cf529 <xfs_log_force+39/60>
Trace; c01d5953 <xlog_recover_finish+33/80>
Trace; c01cf70e <xfs_log_mount_finish+1e/30>
Trace; c01d70a5 <xfs_mountfs+d75/ea0>
Trace; c01d5f8f <xfs_readsb+8f/e0>
Trace; c01d5fc2 <xfs_readsb+c2/e0>
Trace; c01df22b <xfs_cmountfs+4eb/590>
Trace; c01df442 <xfs_mount+92/a0>
Trace; c01df473 <xfs_vfsmount+23/40>
Trace; c01f02be <linvfs_read_super+19e/290>
Trace; c01575dc <ext2_get_block+2c/550>
Trace; c0121700 <do_no_page+a0/d0>
Trace; c0157b72 <ext2_getblk+72/120>
Trace; c0146890 <destroy_inode+30/40>
Trace; c01481d7 <iput+167/170>
Trace; c0139e57 <blkdev_get+d7/100>
Trace; c0137923 <read_super+63/b0>
Trace; c0137b20 <get_sb_bdev+140/1a0>
Trace; c01386d6 <do_mount+196/310>
Trace; c01384ec <copy_mount_options+4c/a0>
Trace; c01388f4 <sys_mount+a4/110>
Trace; c0106e0b <system_call+33/38>
Code; c01c88ab <xfs_iget+db/130>
00000000 <_EIP>:
Code; c01c88ab <xfs_iget+db/130> <=====
0: 66 83 bb 6a 01 00 00 cmpw $0x0,0x16a(%ebx) <=====
Code; c01c88b2 <xfs_iget+e2/130>
7: 00
Code; c01c88b3 <xfs_iget+e3/130>
8: 75 1a jne 24 <_EIP+0x24> c01c88cf
<xfs_iget+ff/130>
Code; c01c88b5 <xfs_iget+e5/130>
a: 0f b7 83 50 01 00 00 movzwl 0x150(%ebx),%eax
Code; c01c88bc <xfs_iget+ec/130>
11: 25 f7 ff 00 00 and $0xfff7,%eax
And this is the oops we got with xfsdump:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
00000000
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<00000000>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: c0396360 ebx: e8b9dbe0 ecx: cd838240 edx: c0339a20
esi: e8b9dc60 edi: e8b9dbe0 ebp: e8b9dbe0 esp: ead03ed0
ds: 0018 es: 0018 ss: 0018
Process nfsd (pid: 688, stackpage=ead03000)
Stack: c0169fe4 cd838240 e8b9dc60 200074c3 00000000 c016a456 e8b9dbe0
f787a740
ffffff8c 00000000 200074c3 00000000 ead15604 ed857000 c016a831
e6e1ee00
200074c3 00000000 00000000 00000001 00000286 ead15614 11270000
ead15604
Call Trace: [<c0169fe4>] [<c016a456>] [<c016a831>] [<c0170957>] [<c0168671>]
[<c02b63c3>] [<c0168489>]
[<c0105546>] [<c0168290>]
Code: Bad EIP value.
>>EIP; 00000000 Before first symbol
Trace; c0169fe4 <nfsd_findparent+34/100>
Trace; c016a456 <find_fh_dentry+246/390>
Trace; c016a831 <fh_verify+291/4d0>
Trace; c0170957 <nfsd3_proc_getattr+97/b0>
Trace; c0168671 <nfsd_dispatch+c1/160>
Trace; c02b63c3 <svc_process+353/4e0>
Trace; c0168489 <nfsd+1f9/320>
Trace; c0105546 <kernel_thread+26/30>
Trace; c0168290 <nfsd+0/320>
|