xfs
[Top] [All Lists]

Re: xfs_repair breaks with assertion

To: Victor K <kvic45@xxxxxxxxx>
Subject: Re: xfs_repair breaks with assertion
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 11 Apr 2013 17:02:01 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CAPaMSRCq0f+GqTbRRCXBFUDdtmpBx=VjBaOLpdDytXunL9dfmQ@xxxxxxxxxxxxxx>
References: <CAPaMSRCGSyhmnjrXpFFkEpmKrjsHqLn0kJ1xLGyf-WZosV7mmQ@xxxxxxxxxxxxxx> <20130411062515.GH10481@dastard> <CAPaMSRCq0f+GqTbRRCXBFUDdtmpBx=VjBaOLpdDytXunL9dfmQ@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Apr 11, 2013 at 02:34:32PM +0800, Victor K wrote:
> > Running xfs_repair /dev/md1 the first time resulted in suggestion to
> 
> > > mount/unmount to replay log, but mounting would not work. After running
> > > xfs_repair -v -L -P /dev/md1 this happens:
> > > (lots of output on stderr, moving to Phase 3, then more output - not sure
> > > if it is relevant, the log file is ~170Mb in size), then stops and prints
> > > the only line on stdout:
> >
> > Oh dear. A log file that big indicates that something *bad* has
> > happened to the array. i.e that it has most likely been put back
> > together wrong.
> >
> > Before going any further with xfs_repair, please verify that the
> > array has been put back together correctly....
> >
> >
> The raid array did not suffer, at least, not according to mdadm; it is now
> happily recovering the one disk that officially failed, but the whole thing
> assembled without a problem

Yeah, we see this often enough that all I can say is this: don't
trust what mdadm is telling you. Validate it by hand.  Massive
corruption does not occur when everything is put back together
correctly.

> There was a similar crash several weeks ago on this same array, but had
> ext4 system back then.
> I was able to save some of the latest stuff, and decided to move to xfs as
> something more reliable.

If the storage below the filesystem is unreliable, then changing
filesystems won't magically fix the problem.

> I suspect now I should also had replaced the disk controller then.

Well, that depends on whether it is the problem or not. if you are
not using hardware raid, then disk controller problems rarely result
in massive corruption of filesystems. A busted block here or there,
but they generally do not cause entire disks to suddenly becoe
corrupted.

I'd still be looking to a RAID reassembly problem than a filesystem
or a storage hardware issue...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>