xfs
[Top] [All Lists]

Re: Recover a XFS on raid -1 (linear) when one disk is broken

To: Jan Banan <b@xxxxxxxxxxxx>
Subject: Re: Recover a XFS on raid -1 (linear) when one disk is broken
From: Chris Wedgwood <cw@xxxxxxxx>
Date: Sat, 17 Jul 2004 16:32:36 -0700
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <40F9321C.7060403@grabbarna.nu>
References: <40F6DBC1.6050909@grabbarna.nu> <20040715205910.GA9948@taniwha.stupidest.org> <40F9321C.7060403@grabbarna.nu>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Sat, Jul 17, 2004 at 04:05:16PM +0200, Jan Banan wrote:

> I suppose the best stradegy is to get a new disk of the same size
> and then try to copy the whole damaged disk with "dd" to the new
> disk and then try to startup the raid again and after that run
> xfs_repair.

that sounds like a good solution if most of the damaged disk is
readable (i assumed it was completely dead)

> What arguments to "dd" would fit best in this case? I think I've
> read that "dd" will normally abort when it can't read from a damaged
> disk and the disk is quite big, 250 GB (Maxtor).

'conv=noerror' i guess, see the man dd page

> Since it is a 4 disk linear raid I hope most of the files are not
> spread over blocks on different disks since I suppose XFS (1.2.0)
> tries to store the files on blocks close to each other(?).

the file-blocks will *usually* be close together and usually within
the same ag

various access patterns can change this though (like writing with a
very full fs)

> Anyone knows what normally has happened to a disk when you suddenly
> can not read from some parts of the disk? I get these kind of
> errors:

> Jul 15 21:18:58 d kernel: hdh: dma_intr: error=0x40 {
> UncorrectableError }, LBAsect=243818407, high=14, low=8937383,
> sector=243818336

disk media error, if there are only a few of these i would stomp over
them (if and there aren't many relocated sectors) in the hopes the
disk will remap them --- i've done this myself with good results and
help various other people to this

> Can I do something to make it better? The disk is only one year old
> but maybe the temperature has been a little bit to high in the
> computer box.

smartctl -a /dev/<disk>

will tell you how man relocated sectors there are and various other
details.  like i said, if there relocated sector count is low and you
don't have *that* bad bad sectors on the disk (badblocks will tell you
this) i would write over the bad-blocks (keeping a record of which
blocks were bad), hope the disk relocates those sectors sanely and
then run xs_repair to see how well that does. if you know which
sectors (well blocks) were bad you can work out which files (well,
parts of files were damaged)

maybe i should write something up on this?


  --cw


<Prev in Thread] Current Thread [Next in Thread>