Daryl Herzmann schrieb:
>
> Hi!
> Thanks to those that responded to my orginal email. I installed
> the 2.4.9-31SGI_XFS_1.1_PR4 RPM as suggested below and my machine just
> locked and I had to hard reset it. Again, it was under heavy NFS load.
Is it only NFS which lets the machine lock up? If it is hardware related
in any way, it smells like a problem between disk and network
controller. Shared IRQ handling comes to mind. Sometimes even moving PCI
cards to different slots can help here. It shouldn't happen anyway.
> It was also suggested to me to get a 3ware controller, so I may try that.
> I would be more inclined to move the data off and then reformat ext3 once
> and see what happens. This all makes me very nervous, since I now have a
> TB fileserver in RAID 5 running XFS that has been stable thus far, but it
> has not been subjected to NFS load (it will be soon) :(
>
> I would send along a ksymopps, but nothing was sent to syslog. I
> did have a oops on the screen, but I could not scroll up or anything to
> get it all. My 2.4.18 crashes did not kill the system as bad as this one
> did.
>
> Daryl
>
> On Wed, 10 Apr 2002, Simon Matter wrote:
>
> >I've been running a similar server for almost a year now.
> >It's RH 7.1 + XFS, Promise Ultra100TX2 with 4 Quantum 15G drives,
> >software RAID5. I have a second server, RH 7.2 + XFS, Promise
> >Ultra100TX2 with 4 IBM 60G drives, software RAID5.
> >
> >No problems so far, only 3 of 8 IBM drives dead, but that's another
> >story.
> >Can you try a current RH based kernel with XFS? At least for me it has
> >always worked very well.
> >
> >ftp://oss.sgi.com/projects/xfs/download/testing/xfs-1.1/kernel_rpms/2.4.9-31-RH/
> >
> >This is from the small server:
> >
> >[root@xxl pub]# cat /proc/ide/pdc202xx
> >
> > PDC20268 TX2 Chipset.
> >------------------------------- General Status
> >---------------------------------
> >Burst Mode : enabled
> >Host Mode : Tri-Stated
> >Bus Clocking : 100 External
> >IO pad select : 10 mA
> >Status Polling Period : 15
> >Interrupt Check Status Polling Delay : 15
> >--------------- Primary Channel ---------------- Secondary Channel
> >-------------
> > enabled enabled
> >66 Clocking enabled enabled
> > Mode MASTER Mode MASTER
> >--------------- drive0 --------- drive1 -------- drive0 ----------
> >drive1 ------
> >DMA enabled: yes yes yes yes
> >--------------- Cannot Decode HOST ---------------
> >
> >[root@xxl pub]# lspci
> >00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3] (rev
> >04)
> >00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo
> >MVP3/Pro133x AGP]
> >00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA
> >[Apollo VP] (rev 47)
> >00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
> >00:07.3 Host bridge: VIA Technologies, Inc. VT82C586B ACPI (rev 10)
> >00:08.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
> >(rev 64)
> >00:09.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100]
> >(rev 08)
> >00:0a.0 Unknown mass storage controller: Promise Technology, Inc.:
> >Unknown device 4d68 (rev 01)
> >00:0b.0 SCSI storage controller: Adaptec AIC-7881U
> >01:00.0 VGA compatible controller: S3 Inc. Savage 4 (rev 03)
> >
> >[root@xxl pub]# cat /proc/mdstat
> >Personalities : [raid1] [raid5]
> >read_ahead 1024 sectors
> >md8 : active raid5 hde10[0] hdf7[2] hdg10[1] hdh7[3]
> > 31406976 blocks level 5, 128k chunk, algorithm 0 [4/4] [UUUU]
> >
> >md7 : active raid1 hdf6[0] hdh6[1]
> > 1024000 blocks [2/2] [UU]
> >
> >md5 : active raid1 hdf5[0] hdh5[1]
> > 3072256 blocks [2/2] [UU]
> >
> >md0 : active raid1 hde1[0] hdf1[1] hdg1[2] hdh1[3]
> > 102720 blocks [4/4] [UUUU]
> >
> >md6 : active raid1 hde9[0] hdg9[1]
> > 1024000 blocks [2/2] [UU]
> >
> >md4 : active raid1 hde8[0] hdg8[1]
> > 511936 blocks [2/2] [UU]
> >
> >md3 : active raid1 hde7[0] hdg7[1]
> > 511936 blocks [2/2] [UU]
> >
> >md2 : active raid1 hde6[0] hdg6[1]
> > 1755328 blocks [2/2] [UU]
> >
> >md1 : active raid1 hde5[0] hdg5[1]
> > 292672 blocks [2/2] [UU]
> >
> >unused devices: <none>
> >
> >
> >
> >Daryl Herzmann schrieb:
> >>
> >> Hi,
> >> I have a box that had been running RH 7.1 + XFS for a year without
> >> a single problem. Recently, I put 3 120G Maxtor 4G120J6 drives on the
> >> onboard Promise Controller (pdc202xx) and did software RAID 5. And wow,
> >> have the problems started! I went from running a 2.4.3 something kernel
> >> to a custom compiled 2.4.18 w/ SGI patch dated March 3. Still no luck.
> >> It seems under heavy NFS load, that these problems will start occuring.
> >>
> >> Any thoughts? I have been trying to follow the ongoing NFS +
> >> XFS + RAID 5 discussions, but I am not sure where I should be at
> >> regarding kernel + patches.
> >>
> >> My ksymoops is below.
> >>
> >> Thanks! Daryl
> >>
> >> Other info that may be helpful.
> >>
> >> # lspci
> >> 00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev
> >> 03)
> >> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
> >> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
> >> (rev 40)
> >> 00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
> >> 00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
> >> (rev 40)
> >> 00:08.0 VGA compatible controller: Trident Microsystems Blade 3D PCI/AGP
> >> (rev 3a)
> >> 00:0c.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
> >> (rev 30)
> >> 00:0f.0 RAID bus controller: Promise Technology, Inc. 20265 (rev 02)
> >>
> >> # free
> >> total used free shared buffers cached
> >> Mem: 900644 897436 3208 0 0 842932
> >> -/+ buffers/cache: 54504 846140
> >> Swap: 1028152 0 1028152
> >>
> >> # cat /proc/ide/pdc202xx
> >>
> >> PDC20265 Chipset.
> >> ------------------------------- General Status
> >> ---------------------------------
> >> Burst Mode : enabled
> >> Host Mode : Normal
> >> Bus Clocking : 33 PCI Internal
> >> IO pad select : 10 mA
> >> Status Polling Period : 0
> >> Interrupt Check Status Polling Delay : 2
> >> --------------- Primary Channel ---------------- Secondary Channel
> >> -------------
> >> enabled enabled
> >> 66 Clocking enabled enabled
> >> Mode MASTER Mode MASTER
> >> FIFO Empty ????????????
> >> --------------- drive0 --------- drive1 -------- drive0 ---------- drive1
> >> ------
> >> DMA enabled: no yes yes yes
> >> DMA Mode: NOTSET UDMA 4 UDMA 4 UDMA 4
> >> PIO Mode: NOTSET PIO ? PIO ? PIO ?
> >>
> >> Here is my ksymoops from this morning's crash.
> >>
> >> >>EIP; c013f736 <iput+26/1a0> <=====
> >> Trace; c013d70c <prune_dcache+cc/130>
> >> Trace; c0126e78 <kmem_find_general_cachep+cf8/1760>
> >> Trace; c013da00 <shrink_dcache_parent+50/60>
> >> Trace; c0127037 <kmem_find_general_cachep+eb7/1760>
> >> Trace; c012708c <kmem_find_general_cachep+f0c/1760>
> >> Trace; c0127141 <kmem_find_general_cachep+fc1/1760>
> >> Trace; c01271b6 <kmem_find_general_cachep+1036/1760>
> >> Trace; c0127311 <kmem_find_general_cachep+1191/1760>
> >> Trace; c0127270 <kmem_find_general_cachep+10f0/1760>
> >> Trace; c0105000 <gdt+4dcc/4f4c>
> >> Trace; c0105536 <kernel_thread+26/1d0>
> >> Trace; c0127270 <kmem_find_general_cachep+10f0/1760>
> >> Code; c013f736 <iput+26/1a0>
> >> 00000000 <_EIP>:
> >> Code; c013f736 <iput+26/1a0> <=====
> >> 0: 8b 46 20 mov 0x20(%esi),%eax <=====
> >> Code; c013f739 <iput+29/1a0>
> >> 3: 85 c0 test %eax,%eax
> >> Code; c013f73b <iput+2b/1a0>
> >> 5: 0f 45 f8 cmovne %eax,%edi
> >> Code; c013f73e <iput+2e/1a0>
> >> 8: 85 ff test %edi,%edi
> >> Code; c013f740 <iput+30/1a0>
> >> a: 74 0b je 17 <_EIP+0x17> c013f74d
> >> <iput+3d/1a0>
> >> Code; c013f742 <iput+32/1a0>
> >> c: 8b 47 10 mov 0x10(%edi),%eax
> >> Code; c013f745 <iput+35/1a0>
> >> f: 85 c0 test %eax,%eax
> >> Code; c013f747 <iput+37/1a0>
> >> 11: 74 04 je 17 <_EIP+0x17> c013f74d
> >> <iput+3d/1a0>
> >> Code; c013f749 <iput+39/1a0>
> >> 13: 53 push %ebx
> >>
> >> and then the next immediate oops
> >>
> >> >>EIP; c013f736 <iput+26/1a0> <=====
> >> Trace; c013d70c <prune_dcache+cc/130>
> >> Trace; c0126e78 <kmem_find_general_cachep+cf8/1760>
> >> Trace; c013da00 <shrink_dcache_parent+50/60>
> >> Trace; c0127037 <kmem_find_general_cachep+eb7/1760>
> >> Trace; c012708c <kmem_find_general_cachep+f0c/1760>
> >> Trace; c0127951 <_alloc_pages+71/1c0>
> >> Trace; c0127bbb <__alloc_pages+11b/180>
> >> Trace; c0127c30 <__get_free_pages+10/20>
> >> Trace; c0139d73 <__pollwait+33/1040>
> >> Trace; c025137e <ip_cmsg_recv+199e/182c0>
> >> Trace; c023abdf <sock_recvmsg+3df/640>
> >> Trace; c0139fcb <__pollwait+28b/1040>
> >> Trace; c013a43c <__pollwait+6fc/1040>
> >> Trace; c0106cfb <__up_wakeup+110f/23e4>
> >> Code; c013f736 <iput+26/1a0>
> >> 00000000 <_EIP>:
> >> Code; c013f736 <iput+26/1a0> <=====
> >> 0: 8b 46 20 mov 0x20(%esi),%eax <=====
> >> Code; c013f739 <iput+29/1a0>
> >> 3: 85 c0 test %eax,%eax
> >> Code; c013f73b <iput+2b/1a0>
> >> 5: 0f 45 f8 cmovne %eax,%edi
> >> Code; c013f73e <iput+2e/1a0>
> >> 8: 85 ff test %edi,%edi
> >> Code; c013f740 <iput+30/1a0>
> >> a: 74 0b je 17 <_EIP+0x17> c013f74d
> >> <iput+3d/1a0>
> >> Code; c013f742 <iput+32/1a0>
> >> c: 8b 47 10 mov 0x10(%edi),%eax
> >> Code; c013f745 <iput+35/1a0>
> >> f: 85 c0 test %eax,%eax
> >> Code; c013f747 <iput+37/1a0>
> >> 11: 74 04 je 17 <_EIP+0x17> c013f74d
> >> <iput+3d/1a0>
> >> Code; c013f749 <iput+39/1a0>
> >> 13: 53 push %ebx
> >>
> >> --
> >> /**
> >> * Daryl Herzmann (akrherz@xxxxxxxxxxx)
> >> * Program Assistant -- Iowa Environmental Mesonet
> >> * http://mesonet.agron.iastate.edu
> >> */
> >
> >
> >
>
> --
> /**
> * Daryl Herzmann (akrherz@xxxxxxxxxxx)
> * Program Assistant -- Iowa Environmental Mesonet
> * http://mesonet.agron.iastate.edu
> */
|