Hi!
You all have been great help in the past. Hopefully you can help me
save my 800+ GB data partition!
It all started after a sucessful upgrade to RH 8.0 . I swapped video
cards to play with a Matrox G450 and everything locked hard after starting
X once, screen went dark, no ethernet response. So I hard reset <sigh>
So once the machine rebooted, my raid5 array (8x120, no spares) started a
reconstruction. After about 20 minutes, I got this error
Oct 4 21:05:28 pircsds0 kernel: hdf: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Oct 4 21:05:28 pircsds0 kernel: hdf: dma_intr: error=0x01 {
AddrMarkNotFound }, LBAsect=43686424, high=2, low=10131992,
sector=43686315
Oct 4 21:05:30 pircsds0 kernel: hdf: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Oct 4 21:05:30 pircsds0 kernel: hdf: dma_intr: error=0x01 {
AddrMarkNotFound }, LBAsect=43686424, high=2, low=10131992,
sector=43686315
Oct 4 21:05:31 pircsds0 kernel: hdf: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Oct 4 21:05:31 pircsds0 kernel: hdf: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=43686424, high=2, low=10131992,
sector=43686315
Oct 4 21:05:31 pircsds0 kernel: end_request: I/O error, dev 21:41 (hdf),
sector 43686315
From the syslogs, the raid array went into degradded and then the machine
locked up. Again no eth0 or video. So I hard reset again <sigh>
Seeing those DMA errors, I decided to disable DMA for the drives and then
let it reconstruct that way. Well, the estimations were about 15 days
for raid reconstruction, so I just disabled DMA for hdf. After about 90
minutes, hdl produced the same DMA errors. Sooo, I stopped everything,
marked hdf1 as being a failed disk and started raid5 in degradded mode.
At this point, I am getting desperate :) So I ran xfs_repair on /dev/md0
and did not get any errors, so I mounted the device, again no errors. I
then tried doing a simple 'ls -l' in the top level directory and
immediately got this (Attached as error.txt) So I ran ksymoops on it and
that is attached as well (ksymoops.txt)
Does anybody have any ideas about how to proceed? Some other bits of
information are
1. I am using 36 inch 80 pin cables
2. The eight drives are on two Promise PDC20269 TX2 Ultra-133
controllers
3. This array has been functional for over 10 months, but it has never
experienced a crash/hard reset.
4. This is not the same system that I have reported raid5/XFS troubles
before.
Thanks,
Daryl
--
/**
* Daryl Herzmann (akrherz@xxxxxxxxxxx)
* Program Assistant -- Iowa Environmental Mesonet
* http://mesonet.agron.iastate.edu
*/
error.txt
Description: Text document
ksymoops.txt
Description: Text document
|