I've been running a similar server for almost a year now.
It's RH 7.1 + XFS, Promise Ultra100TX2 with 4 Quantum 15G drives,
software RAID5. I have a second server, RH 7.2 + XFS, Promise
Ultra100TX2 with 4 IBM 60G drives, software RAID5.
No problems so far, only 3 of 8 IBM drives dead, but that's another
story.
Can you try a current RH based kernel with XFS? At least for me it has
always worked very well.
ftp://oss.sgi.com/projects/xfs/download/testing/xfs-1.1/kernel_rpms/2.4.9-31-RH/
This is from the small server:
[root@xxl pub]# cat /proc/ide/pdc202xx
PDC20268 TX2 Chipset.
------------------------------- General Status
---------------------------------
Burst Mode : enabled
Host Mode : Tri-Stated
Bus Clocking : 100 External
IO pad select : 10 mA
Status Polling Period : 15
Interrupt Check Status Polling Delay : 15
--------------- Primary Channel ---------------- Secondary Channel
-------------
enabled enabled
66 Clocking enabled enabled
Mode MASTER Mode MASTER
--------------- drive0 --------- drive1 -------- drive0 ----------
drive1 ------
DMA enabled: yes yes yes yes
--------------- Cannot Decode HOST ---------------
[root@xxl pub]# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3] (rev
04)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo
MVP3/Pro133x AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA
[Apollo VP] (rev 47)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.3 Host bridge: VIA Technologies, Inc. VT82C586B ACPI (rev 10)
00:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 64)
00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100]
(rev 08)
00:0a.0 Unknown mass storage controller: Promise Technology, Inc.:
Unknown device 4d68 (rev 01)
00:0b.0 SCSI storage controller: Adaptec AIC-7881U
01:00.0 VGA compatible controller: S3 Inc. Savage 4 (rev 03)
[root@xxl pub]# cat /proc/mdstat
Personalities : [raid1] [raid5]
read_ahead 1024 sectors
md8 : active raid5 hde10[0] hdf7[2] hdg10[1] hdh7[3]
31406976 blocks level 5, 128k chunk, algorithm 0 [4/4] [UUUU]
md7 : active raid1 hdf6[0] hdh6[1]
1024000 blocks [2/2] [UU]
md5 : active raid1 hdf5[0] hdh5[1]
3072256 blocks [2/2] [UU]
md0 : active raid1 hde1[0] hdf1[1] hdg1[2] hdh1[3]
102720 blocks [4/4] [UUUU]
md6 : active raid1 hde9[0] hdg9[1]
1024000 blocks [2/2] [UU]
md4 : active raid1 hde8[0] hdg8[1]
511936 blocks [2/2] [UU]
md3 : active raid1 hde7[0] hdg7[1]
511936 blocks [2/2] [UU]
md2 : active raid1 hde6[0] hdg6[1]
1755328 blocks [2/2] [UU]
md1 : active raid1 hde5[0] hdg5[1]
292672 blocks [2/2] [UU]
unused devices: <none>
Daryl Herzmann schrieb:
>
> Hi,
> I have a box that had been running RH 7.1 + XFS for a year without
> a single problem. Recently, I put 3 120G Maxtor 4G120J6 drives on the
> onboard Promise Controller (pdc202xx) and did software RAID 5. And wow,
> have the problems started! I went from running a 2.4.3 something kernel
> to a custom compiled 2.4.18 w/ SGI patch dated March 3. Still no luck.
> It seems under heavy NFS load, that these problems will start occuring.
>
> Any thoughts? I have been trying to follow the ongoing NFS +
> XFS + RAID 5 discussions, but I am not sure where I should be at
> regarding kernel + patches.
>
> My ksymoops is below.
>
> Thanks! Daryl
>
> Other info that may be helpful.
>
> # lspci
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev
> 03)
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
> (rev 40)
> 00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
> 00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
> (rev 40)
> 00:08.0 VGA compatible controller: Trident Microsystems Blade 3D PCI/AGP
> (rev 3a)
> 00:0c.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
> (rev 30)
> 00:0f.0 RAID bus controller: Promise Technology, Inc. 20265 (rev 02)
>
> # free
> total used free shared buffers cached
> Mem: 900644 897436 3208 0 0 842932
> -/+ buffers/cache: 54504 846140
> Swap: 1028152 0 1028152
>
> # cat /proc/ide/pdc202xx
>
> PDC20265 Chipset.
> ------------------------------- General Status
> ---------------------------------
> Burst Mode : enabled
> Host Mode : Normal
> Bus Clocking : 33 PCI Internal
> IO pad select : 10 mA
> Status Polling Period : 0
> Interrupt Check Status Polling Delay : 2
> --------------- Primary Channel ---------------- Secondary Channel
> -------------
> enabled enabled
> 66 Clocking enabled enabled
> Mode MASTER Mode MASTER
> FIFO Empty ????????????
> --------------- drive0 --------- drive1 -------- drive0 ---------- drive1
> ------
> DMA enabled: no yes yes yes
> DMA Mode: NOTSET UDMA 4 UDMA 4 UDMA 4
> PIO Mode: NOTSET PIO ? PIO ? PIO ?
>
> Here is my ksymoops from this morning's crash.
>
> >>EIP; c013f736 <iput+26/1a0> <=====
> Trace; c013d70c <prune_dcache+cc/130>
> Trace; c0126e78 <kmem_find_general_cachep+cf8/1760>
> Trace; c013da00 <shrink_dcache_parent+50/60>
> Trace; c0127037 <kmem_find_general_cachep+eb7/1760>
> Trace; c012708c <kmem_find_general_cachep+f0c/1760>
> Trace; c0127141 <kmem_find_general_cachep+fc1/1760>
> Trace; c01271b6 <kmem_find_general_cachep+1036/1760>
> Trace; c0127311 <kmem_find_general_cachep+1191/1760>
> Trace; c0127270 <kmem_find_general_cachep+10f0/1760>
> Trace; c0105000 <gdt+4dcc/4f4c>
> Trace; c0105536 <kernel_thread+26/1d0>
> Trace; c0127270 <kmem_find_general_cachep+10f0/1760>
> Code; c013f736 <iput+26/1a0>
> 00000000 <_EIP>:
> Code; c013f736 <iput+26/1a0> <=====
> 0: 8b 46 20 mov 0x20(%esi),%eax <=====
> Code; c013f739 <iput+29/1a0>
> 3: 85 c0 test %eax,%eax
> Code; c013f73b <iput+2b/1a0>
> 5: 0f 45 f8 cmovne %eax,%edi
> Code; c013f73e <iput+2e/1a0>
> 8: 85 ff test %edi,%edi
> Code; c013f740 <iput+30/1a0>
> a: 74 0b je 17 <_EIP+0x17> c013f74d
> <iput+3d/1a0>
> Code; c013f742 <iput+32/1a0>
> c: 8b 47 10 mov 0x10(%edi),%eax
> Code; c013f745 <iput+35/1a0>
> f: 85 c0 test %eax,%eax
> Code; c013f747 <iput+37/1a0>
> 11: 74 04 je 17 <_EIP+0x17> c013f74d
> <iput+3d/1a0>
> Code; c013f749 <iput+39/1a0>
> 13: 53 push %ebx
>
> and then the next immediate oops
>
> >>EIP; c013f736 <iput+26/1a0> <=====
> Trace; c013d70c <prune_dcache+cc/130>
> Trace; c0126e78 <kmem_find_general_cachep+cf8/1760>
> Trace; c013da00 <shrink_dcache_parent+50/60>
> Trace; c0127037 <kmem_find_general_cachep+eb7/1760>
> Trace; c012708c <kmem_find_general_cachep+f0c/1760>
> Trace; c0127951 <_alloc_pages+71/1c0>
> Trace; c0127bbb <__alloc_pages+11b/180>
> Trace; c0127c30 <__get_free_pages+10/20>
> Trace; c0139d73 <__pollwait+33/1040>
> Trace; c025137e <ip_cmsg_recv+199e/182c0>
> Trace; c023abdf <sock_recvmsg+3df/640>
> Trace; c0139fcb <__pollwait+28b/1040>
> Trace; c013a43c <__pollwait+6fc/1040>
> Trace; c0106cfb <__up_wakeup+110f/23e4>
> Code; c013f736 <iput+26/1a0>
> 00000000 <_EIP>:
> Code; c013f736 <iput+26/1a0> <=====
> 0: 8b 46 20 mov 0x20(%esi),%eax <=====
> Code; c013f739 <iput+29/1a0>
> 3: 85 c0 test %eax,%eax
> Code; c013f73b <iput+2b/1a0>
> 5: 0f 45 f8 cmovne %eax,%edi
> Code; c013f73e <iput+2e/1a0>
> 8: 85 ff test %edi,%edi
> Code; c013f740 <iput+30/1a0>
> a: 74 0b je 17 <_EIP+0x17> c013f74d
> <iput+3d/1a0>
> Code; c013f742 <iput+32/1a0>
> c: 8b 47 10 mov 0x10(%edi),%eax
> Code; c013f745 <iput+35/1a0>
> f: 85 c0 test %eax,%eax
> Code; c013f747 <iput+37/1a0>
> 11: 74 04 je 17 <_EIP+0x17> c013f74d
> <iput+3d/1a0>
> Code; c013f749 <iput+39/1a0>
> 13: 53 push %ebx
>
> --
> /**
> * Daryl Herzmann (akrherz@xxxxxxxxxxx)
> * Program Assistant -- Iowa Environmental Mesonet
> * http://mesonet.agron.iastate.edu
> */
|