Hi all,
I have 3 disks in a LVM volume with XFS on it. After a recent powerfailiure it
no longer came up. At first it would try to do a recovery and get a lot of:
hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=134840697,
sector=134840696
end_request: I/O error, dev 22:01 (hdg), sector 134840696
As I go through the log now however I see some new errors:
hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdg: read_intr: error=0x40 { UncorrectableError }, LBAsect=134840697,
sector=134840696
end_request: I/O error, dev 22:01 (hdg), sector 134840696
I take it that this means it has gone worse. (read_intr error instead of
dma_intr which I have seen is quite common.) This is on a LVM volume with 220G
of data.
So I have a few questions:
Is there any way of getting the data on the other disks back? From what I've
seen of the logs it's hdg that's bad.
Is there any way of getting warned about this before it happens? I did get a
lot of dma_intr errors first, but it seemed to me then that a lot of other
people were getting them and safely (?) ignoring them. (From the kernel and
LVM
lists.)
Is there any way I can be "proactive" in avoiding this? By storing metadata
redundantly for instance? (I assume that in this particular case it's those
parts of the drive which has gone, which is why I'm left with an unmount and
unrecoverable system.)
Would a check with for instance Bonnie catch a problem like this before it
gets
bad?
I've seen this in a couple of places now, perhaps it would be a good idea to
put it in the FAQ or some documents?
Marcus Hast, Lund, Sweden, Earth.
Living long and prosperous.
|