I've got a new system I'm working to put into production and I have one
nagging problem that I'm trying to resolve and perhaps one of you has
seen this or can point me in the right direction. I know this may not
be an XFS problem but since I'm using the XFS installer I figured I'd
start here.
I've got a Dual Athlon server (APPRO 1124 - Tyan Thunder K7 firmware
2.08) with a Mylex AccelRaid 170 (32MB) with dual 18GB and dual 36GB
Seagate Cheetahs in a RAID1 config.
DAC960: ***** DAC960 RAID Driver Version 2.4.10 of 23 July 2001 *****
DAC960: Copyright 1998-2001 by Leonard N. Zubkoff <lnz@xxxxxxxxxxxxx>
DAC960#0: Configuring Mylex AcceleRAID 170 PCI RAID Controller
DAC960#0: Firmware Version: 6.00-13, Channels: 1, Memory Size: 32MB
DAC960#0: PCI Bus: 0, Device: 8, Function: 1, I/O Address: Unassigned
DAC960#0: PCI Address: 0xF4004000 mapped at 0xF8832000, IRQ Channel: 10
DAC960#0: Controller Queue Depth: 512, Maximum Blocks per Command: 2048
DAC960#0: Driver Queue Depth: 511, Scatter/Gather Limit: 128 of 257
Segments
DAC960#0: Physical Devices:
DAC960#0: 0:0 Vendor: SEAGATE Model: ST318406LC Revision: 0108
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number: 3FE00G5L00002216GSGY
DAC960#0: Disk Status: Online, 35807232 blocks
DAC960#0: 0:1 Vendor: SEAGATE Model: ST318406LC Revision: 0108
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number: 3FE00KDJ00007216A142
DAC960#0: Disk Status: Online, 35807232 blocks
DAC960#0: 0:2 Vendor: SEAGATE Model: ST336706LC Revision: 0108
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number: 3FD08JH200007216E1CD
DAC960#0: Disk Status: Online, 71651328 blocks
DAC960#0: 0:3 Vendor: SEAGATE Model: ST336706LC Revision: 0108
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number: 3FD08H4000002217FADN
DAC960#0: Disk Status: Online, 71651328 blocks
DAC960#0: 0:7 Vendor: MYLEX Model: AcceleRAID 170 Revision: 0600
DAC960#0: Wide Synchronous at 160 MB/sec
DAC960#0: Serial Number:
DAC960#0: 0:9 Vendor: QLogic Model: GEM359 Revision: 1.07
DAC960#0: Asynchronous
DAC960#0: Serial Number: 1
DAC960#0: Logical Drives:
DAC960#0: /dev/rd/c0d0: RAID-1, Online, 35807232 blocks
DAC960#0: Logical Device Initialized, BIOS Geometry:
255/63
DAC960#0: Stripe Size: 64KB, Segment Size: 8KB
DAC960#0: Read Cache Disabled, Write Cache Disabled
DAC960#0: /dev/rd/c0d1: RAID-1, Online, 71651328 blocks
DAC960#0: Logical Device Initialized, BIOS Geometry:
255/63
DAC960#0: Stripe Size: 64KB, Segment Size: 8KB
DAC960#0: Read Cache Disabled, Write Cache Disabled
I installed RH 7.2 using the SGI XFS Installer with no problems. I
configured a fairly normal setup for partitions (all XFS):
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/rd/c0d0p2 4188164 1296152 2892012 31% /
/dev/rd/c0d0p1 59428 8476 50952 15% /boot
none 513540 0 513540 0% /dev/shm
/dev/vg00/homelv 9432384 180 9432204 1% /home
/dev/vg00/varlv 9432384 42548 9389836 1% /var
/dev/rd/c0d0p5 260240 53040 207200 21% /tmp
Note the LVM partitions were added later (/var & /home used to be
contained in c0d0p2) and this problem showed up prior to that so
disregard LVM in this case.
The problem is this: When booting either the smp or enterprise kernels
(stock RH 2.4.9-XFS from the XFS installer CD), I get random kernel
panics. Maybe 3 out of 4 boots, at varying points in Interactive boot
(from LVM activiation forward), I'll get a Kernel Panic from the DAC960
driver:
DAC960#0: SegmentNumber != SegmentCount
These panics never happen in the same place, but its always after the
root filesystem has been mounted while the init.d scripts are running
(though a couple times I've seen it happen during LVM activiation in
rc.sysinit) I've never seen it happen booting the uni-proc kernel.
I've got APIC disabled due to the AMD Interrupt Errata (#22) with the
760MP chipset.
The RAID Segment size is 8KB.
If the system boots without a panic - its been rock solid - no super
heavy load, but lots of compiling and pkg installation without a single
hiccup. But rebooting is always a shot in teh dark since the majority
of the time it'll panic, but if I hard reset once or twice, it'll boot
normally and run fine.
Any ideas? Could this be XFS related? I searched all over with Google
and Marc - Couldn't find any mention of a problem like this. I know
Mylex 170's are fairly common in Linux configs so if this was a common
non-XFS problem I figure I'd have come across something, but then again,
maybe not :)
Mike
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Mike Baptiste 202 Hudson Hall, Box 90271, Durham, NC 27708
Director of Information Technology mike.baptiste@xxxxxxxx
Pratt School of Engineering @ Duke University Phone:919-660-5404
|