[Top] [All Lists]

Re: Oops - XFS mount after replacing wrong RAID5 drive

To: linux-xfs@xxxxxxxxxxx, linux-raid@xxxxxxxxxxxxxxx
Subject: Re: Oops - XFS mount after replacing wrong RAID5 drive
From: Andrew Klaassen <ak@xxxxxxx>
Date: Mon, 5 Nov 2001 13:01:15 -0500
In-reply-to: <1004980862.10860.54.camel@jen.americas.sgi.com>
Mail-followup-to: linux-xfs@xxxxxxxxxxx, linux-raid@xxxxxxxxxxxxxxx
References: <20011105103521.A3864@dkp.com> <1004974636.7318.5.camel@jen.americas.sgi.com> <20011105112231.B3864@dkp.com> <1004980862.10860.54.camel@jen.americas.sgi.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mutt/1.3.23i
On Mon, Nov 05, 2001 at 11:21:02AM -0600,
Steve Lord wrote:

> What you should actually do at this point sort of depends on
> what the raid folks think your chances are. If all you did was
> mount the fs and run recovery with the bad drive still in
> there then things may not be so bad. I am not sure you can
> flip out one drive and then do another in the middle of raid
> rebuild, that would almost certainly toast the volume. You
> might need to let raid rebuild complete on the current set of
> drives, and then replace the real bad drive with a good one.

Unfortunately, there was an unclean unmount the second time,
too.  Here's the full sordid sequence of events:

 - hdp giving errors (SectorIdNotFound).
 - hdn fails (dma_status=0x00, or something like that); the
   array goes into degraded mode.
 - The system hangs on shutdown, and has to be taken down hard.
 - I assume that hdp is actually the problem; I replace hdp. 
   (IDE is just that way sometimes...)
 - When the box comes back up, the array isn't recognized.
 - I mark hdp as a failed-drive and run mkraid -f.  Now the
   array is recognized.
 - I mount the filesystem read-write.  The data appears to be
 - I raidhotadd hdp to the array.  Reconstruction begins, but
   stalls almost immediately.  (/proc/mdstat reports 0K done and
   a long, long time to finish.)
 - I attempt to unmount the filesystem.  It stalls.  I attempt
   to reboot; again, it stalls.  I wait for a couple of minutes
   before taking the box down hard.

And now, it looks like the probably-good drive may be heading
for a failure itself.  :( I'm attempting to clone it before I
try putting it back in.

> Wait for the raid experts to respond on this point, do not take
> my word for it!

I think I might take a look at the raid code myself before I go
too far, just for the fun of it.  Might be a useless exercise,
but worth a shot.  Anyone know of any design docs other than the
code itself, to ease me into it?

> > > > How do I mount the filesystem without writing anything at
> > > > all to the array?

> > > mount -o ro,norecovery
> > > 
> > > Even a readonly mount without the norecovery will attempt to run
> > > recovery.

> > So there's no way at all to mount the filesystem without some
> > writing occuring?

> No, if you use the combination of the two options then there should be
> no disk I/O at all.

Sorry; I misread your first reply.

Andrew Klaassen

<Prev in Thread] Current Thread [Next in Thread>