xfs
[Top] [All Lists]

Random Kernel Panic on Boot

To: linux-xfs@xxxxxxxxxxx
Subject: Random Kernel Panic on Boot
From: Mike Baptiste <mike.baptiste@xxxxxxxx>
Date: Thu, 27 Dec 2001 20:52:22 -0500
Organization: Duke University, Pratt School of Engineering
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7) Gecko/20011221
I've got a new system I'm working to put into production and I have one nagging problem that I'm trying to resolve and perhaps one of you has seen this or can point me in the right direction. I know this may not be an XFS problem but since I'm using the XFS installer I figured I'd start here.

I've got a Dual Athlon server (APPRO 1124 - Tyan Thunder K7 firmware 2.08) with a Mylex AccelRaid 170 (32MB) with dual 18GB and dual 36GB Seagate Cheetahs in a RAID1 config.

DAC960: ***** DAC960 RAID Driver Version 2.4.10 of 23 July 2001 *****
DAC960: Copyright 1998-2001 by Leonard N. Zubkoff <lnz@xxxxxxxxxxxxx>
DAC960#0: Configuring Mylex AcceleRAID 170 PCI RAID Controller
DAC960#0: Firmware Version: 6.00-13, Channels: 1, Memory Size: 32MB
DAC960#0: PCI Bus: 0, Device: 8, Function: 1, I/O Address: Unassigned
DAC960#0: PCI Address: 0xF4004000 mapped at 0xF8832000, IRQ Channel: 10
DAC960#0: Controller Queue Depth: 512, Maximum Blocks per Command: 2048
DAC960#0: Driver Queue Depth: 511, Scatter/Gather Limit: 128 of 257 Segments
DAC960#0: Physical Devices:
DAC960#0: 0:0 Vendor: SEAGATE Model: ST318406LC Revision: 0108
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number: 3FE00G5L00002216GSGY
DAC960#0: Disk Status: Online, 35807232 blocks
DAC960#0: 0:1 Vendor: SEAGATE Model: ST318406LC Revision: 0108
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number: 3FE00KDJ00007216A142
DAC960#0: Disk Status: Online, 35807232 blocks
DAC960#0: 0:2 Vendor: SEAGATE Model: ST336706LC Revision: 0108
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number: 3FD08JH200007216E1CD
DAC960#0: Disk Status: Online, 71651328 blocks
DAC960#0: 0:3 Vendor: SEAGATE Model: ST336706LC Revision: 0108
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number: 3FD08H4000002217FADN
DAC960#0: Disk Status: Online, 71651328 blocks
DAC960#0: 0:7 Vendor: MYLEX Model: AcceleRAID 170 Revision: 0600
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number:
DAC960#0: 0:9 Vendor: QLogic Model: GEM359 Revision: 1.07
DAC960#0: Asynchronous
DAC960#0: Serial Number: 1
DAC960#0: Logical Drives:
DAC960#0: /dev/rd/c0d0: RAID-1, Online, 35807232 blocks
DAC960#0: Logical Device Initialized, BIOS Geometry: 255/63
DAC960#0: Stripe Size: 64KB, Segment Size: 8KB
DAC960#0: Read Cache Disabled, Write Cache Disabled
DAC960#0: /dev/rd/c0d1: RAID-1, Online, 71651328 blocks
DAC960#0: Logical Device Initialized, BIOS Geometry: 255/63
DAC960#0: Stripe Size: 64KB, Segment Size: 8KB
DAC960#0: Read Cache Disabled, Write Cache Disabled



I installed RH 7.2 using the SGI XFS Installer with no problems. I configured a fairly normal setup for partitions (all XFS):


Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/rd/c0d0p2         4188164   1296152   2892012  31% /
/dev/rd/c0d0p1           59428      8476     50952  15% /boot
none                    513540         0    513540   0% /dev/shm
/dev/vg00/homelv       9432384       180   9432204   1% /home
/dev/vg00/varlv        9432384     42548   9389836   1% /var
/dev/rd/c0d0p5          260240     53040    207200  21% /tmp

Note the LVM partitions were added later (/var & /home used to be contained in c0d0p2) and this problem showed up prior to that so disregard LVM in this case.

The problem is this: When booting either the smp or enterprise kernels (stock RH 2.4.9-XFS from the XFS installer CD), I get random kernel panics. Maybe 3 out of 4 boots, at varying points in Interactive boot (from LVM activiation forward), I'll get a Kernel Panic from the DAC960 driver:

DAC960#0: SegmentNumber != SegmentCount

These panics never happen in the same place, but its always after the root filesystem has been mounted while the init.d scripts are running (though a couple times I've seen it happen during LVM activiation in rc.sysinit) I've never seen it happen booting the uni-proc kernel.

I've got APIC disabled due to the AMD Interrupt Errata (#22) with the 760MP chipset.

The RAID Segment size is 8KB.

If the system boots without a panic - its been rock solid - no super heavy load, but lots of compiling and pkg installation without a single hiccup. But rebooting is always a shot in teh dark since the majority of the time it'll panic, but if I hard reset once or twice, it'll boot normally and run fine.

Any ideas? Could this be XFS related? I searched all over with Google and Marc - Couldn't find any mention of a problem like this. I know Mylex 170's are fairly common in Linux configs so if this was a common non-XFS problem I figure I'd have come across something, but then again, maybe not :)

Mike
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Baptiste            202 Hudson Hall, Box 90271, Durham, NC 27708
Director of Information Technology             mike.baptiste@xxxxxxxx
Pratt School of Engineering @ Duke University      Phone:919-660-5404


<Prev in Thread] Current Thread [Next in Thread>
  • Random Kernel Panic on Boot, Mike Baptiste <=