xfs
[Top] [All Lists]

2.4.18-14SGI_XFS_1.2a1 oops && raid5 troubles

To: linux-xfs@xxxxxxxxxxx
Subject: 2.4.18-14SGI_XFS_1.2a1 oops && raid5 troubles
From: Daryl Herzmann <akrherz@xxxxxxxxxxx>
Date: Mon, 7 Oct 2002 09:22:39 -0500 (CDT)
Sender: linux-xfs-bounce@xxxxxxxxxxx
Hi!
    You all have been great help in the past.  Hopefully you can help me 
save my 800+ GB data partition!

It all started after a sucessful upgrade to RH 8.0 .  I swapped video
cards to play with a Matrox G450 and everything locked hard after starting
X once, screen went dark, no ethernet response.  So I hard reset <sigh>

So once the machine rebooted, my raid5 array (8x120, no spares) started a 
reconstruction.  After about 20 minutes, I got this error

Oct  4 21:05:28 pircsds0 kernel: hdf: dma_intr: status=0x51 { DriveReady 
SeekComplete Error } 
Oct  4 21:05:28 pircsds0 kernel: hdf: dma_intr: error=0x01 { 
AddrMarkNotFound }, LBAsect=43686424, high=2, low=10131992, 
sector=43686315 
Oct  4 21:05:30 pircsds0 kernel: hdf: dma_intr: status=0x51 { DriveReady 
SeekComplete Error } 
Oct  4 21:05:30 pircsds0 kernel: hdf: dma_intr: error=0x01 { 
AddrMarkNotFound }, LBAsect=43686424, high=2, low=10131992, 
sector=43686315 
Oct  4 21:05:31 pircsds0 kernel: hdf: dma_intr: status=0x51 { DriveReady 
SeekComplete Error } 
Oct  4 21:05:31 pircsds0 kernel: hdf: dma_intr: error=0x40 { 
UncorrectableError }, LBAsect=43686424, high=2, low=10131992, 
sector=43686315 
Oct  4 21:05:31 pircsds0 kernel: end_request: I/O error, dev 21:41 (hdf), 
sector 43686315 


From the syslogs, the raid array went into degradded and then the machine 
locked up.  Again no eth0 or video.  So I hard reset again <sigh>

Seeing those DMA errors, I decided to disable DMA for the drives and then 
let it reconstruct that way.  Well, the estimations were about 15 days 
for raid reconstruction, so I just disabled DMA for hdf.  After about 90 
minutes, hdl produced the same DMA errors. Sooo, I stopped everything, 
marked hdf1 as being a failed disk and started raid5 in degradded mode.

At this point, I am getting desperate :)  So I ran xfs_repair on /dev/md0 
and did not get any errors, so I mounted the device, again no errors.  I 
then tried doing a simple 'ls -l' in the top level directory and 
immediately got this (Attached as error.txt)  So I ran ksymoops on it and 
that is attached as well (ksymoops.txt)

Does anybody have any ideas about how to proceed?  Some other bits of 
information are 

  1.  I am using 36 inch 80 pin cables
  2.  The eight drives are on two Promise PDC20269 TX2  Ultra-133 
      controllers
  3.  This array has been functional for over 10 months, but it has never
      experienced a crash/hard reset.
  4.  This is not the same system that I have reported raid5/XFS troubles
      before.

Thanks,
  Daryl

-- 
/**
 * Daryl Herzmann (akrherz@xxxxxxxxxxx)
 * Program Assistant -- Iowa Environmental Mesonet
 * http://mesonet.agron.iastate.edu
 */

Attachment: error.txt
Description: Text document

Attachment: ksymoops.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>