xfs
[Top] [All Lists]

Re: raid5: I lost a XFS file system due to a minor IDE cable problem

To: Alberto Alonso <alberto@xxxxxxxxx>
Subject: Re: raid5: I lost a XFS file system due to a minor IDE cable problem
From: David Chinner <dgc@xxxxxxx>
Date: Fri, 25 May 2007 18:36:50 +1000
Cc: David Chinner <dgc@xxxxxxx>, Pallai Roland <dap@xxxxxxxxxxxxx>, Linux-Raid <linux-raid@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <1180071831.21028.125.camel@w100>
References: <200705241318.30711.dap@mail.index.hu> <Pine.LNX.4.64.0705240720040.16751@p34.internal.lan> <20070525000547.GH85884050@sgi.com> <1180056948.6183.10.camel@daptopfc.localdomain> <20070525045500.GF86004887@sgi.com> <1180071831.21028.125.camel@w100>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4.2.1i
On Fri, May 25, 2007 at 12:43:51AM -0500, Alberto Alonso wrote:
> > >  The difference between ext3 and XFS is that ext3 will remount to
> > > read-only on the first write error but the XFS won't, XFS only fails
> > > only the current operation, IMHO. The method of ext3 isn't perfect, but
> > > in practice, it's working well.
> > 
> > XFS will shutdown the filesystem if metadata corruption will occur
> > due to a failed write. We don't immediately fail the filesystem on
> > data write errors because on large systems you can get *transient*
> > I/O errors (e.g. FC path failover) and so retrying failed data
> > writes is useful for preventing unnecessary shutdowns of the
> > filesystem.
> > 
> > Different design criteria, different solutions...
> 
> I think his point was that going into a read only mode causes a
> less catastrophic situation (ie. a web server can still serve
> pages).

Sure - but once you've detected one corruption or had metadata
I/O errors, can you trust the rest of the filesystem?

> I think that is a valid point, rather than shutting down
> the file system completely, an automatic switch to where the least
> disruption of service can occur is always desired.

I consider the possibility of serving out bad data (i.e after
a remount to readonly) to be the worst possible disruption of
service that can happen ;)

> Maybe the automatic failure mode could be something that is 
> configurable via the mount options.

If only it were that simple. Have you looked to see how many
hooks there are in XFS to shutdown without causing further
damage?

% grep FORCED_SHUTDOWN fs/xfs/*.[ch] fs/xfs/*/*.[ch] | wc -l
116

Changing the way we handle shutdowns would take a lot of time,
effort and testing. When can I expect a patch? ;)

> I personally have found the XFS file system to be great for
> my needs (except issues with NFS interaction, where the bug report
> never got answered), but that doesn't mean it can not be improved.

Got a pointer?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


<Prev in Thread] Current Thread [Next in Thread>