xfs
[Top] [All Lists]

Re: Harddrive error and XFS corruption

To: "Marcus Hast" <hast@xxxxxxxxx>, linux-xfs@xxxxxxxxxxx
Subject: Re: Harddrive error and XFS corruption
From: Seth Mos <knuffie@xxxxxxxxx>
Date: Tue, 13 Nov 2001 15:48:27 +0100
In-reply-to: <20011113143336109.AAA296.51@xxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
At 15:34 13-11-2001 +0100, Marcus Hast wrote:
Hi all,
I have 3 disks in a LVM volume with XFS on it. After a recent powerfailiure it
no longer came up. At first it would try to do a recovery and get a lot of:

<snip>

hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdg: read_intr: error=0x40 { UncorrectableError }, LBAsect=134840697,
sector=134840696
end_request: I/O error, dev 22:01 (hdg), sector 134840696

I take it that this means it has gone worse. (read_intr error instead of
dma_intr which I have seen is quite common.)

Yes, bad cables.

 This is on a LVM volume with 220G
of data.

Ouch.

So I have a few questions:
Is there any way of getting the data on the other disks back? From what I've
seen of the logs it's hdg that's bad.

Yes the disk is broken. You could try and xfs_repair the device and pray that it can restore anything. Run xfs_repair -n to see what it wants to change.

Is there any way of getting warned about this before it happens?

I would like to have a magic ball in which I can see when a disk will fail :-)
It's called fortune telling. Most IDE disks have something called smart but is not smart enough to warn you most of the time and is off by default in most bioses even.

 I did get a
lot of dma_intr errors first, but it seemed to me then that a lot of other
people were getting them and safely (?) ignoring them. (From the kernel and
LVM
lists.)

DMA intr is problematic communication with the IDE disks and you should investigate. If the cable is bad you will see CRC errors. However this one says "{ UncorrectableError }, LBAsect=134840697," which means it is unable to read from the disk at that sector.

Is there any way I can be "proactive" in avoiding this? By storing metadata
redundantly for instance? (I assume that in this particular case it's those
parts of the drive which has gone, which is why I'm left with an unmount and
unrecoverable system.)

Use a raid level like raid5 or raid1 (mirroring). You can do this with md.

Would a check with for instance Bonnie catch a problem like this before it
gets
bad?

No you're disk just broke down. It happens all the time.

I've seen this in a couple of places now, perhaps it would be a good idea to
put it in the FAQ or some documents?

There is something in the FAQ about this in the xfs_shutdown section. But that one you only see when ti happens with the system running. It might be that the disk broke because of the power failure (not uncommon).

Cheers

--
Seth
Every program has two purposes one for which
it was written and another for which it wasn't
I use the last kind.


<Prev in Thread] Current Thread [Next in Thread>