xfs
[Top] [All Lists]

Re: [PATCH 0/7] Configurable error behavior [V2]

To: xfs@xxxxxxxxxxx
Subject: Re: [PATCH 0/7] Configurable error behavior [V2]
From: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
Date: Wed, 4 May 2016 11:53:28 +0200
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20160503225948.GU26977@dastard>
Mail-followup-to: xfs@xxxxxxxxxxx
References: <1462298140-12411-1-git-send-email-cmaiolino@xxxxxxxxxx> <20160503225948.GU26977@dastard>
User-agent: Mutt/1.5.24 (2015-08-30)
On Wed, May 04, 2016 at 08:59:48AM +1000, Dave Chinner wrote:
> On Tue, May 03, 2016 at 07:55:33PM +0200, Carlos Maiolino wrote:
> > Hi folks,
> > 
> > I spoke with Dave and to offload him a bit, I took over his patchset for 
> > enable
> > configurable error handlers.
> > 
> > Since he did most of the design, I kept his signed-off to the patches.
> > 
> > Here are core the changes I did from his last patchset to this one:
> > 
> >  - Removed fail_speed configuration
> >     According with what has been discussed, there is no real need to have a
> > fail_speed configuration, but instead, use the "max_retries" configuration 
> > to
> > get for how long the filesystem should retrying, with the only special case 
> > for
> >     "-1", which will make the filesystem to retry forever.
> > 
> >  - Fail at unmount is no longer a config option
> >     Having a filesystem stuck forever during unmount due errors is not a
> >     good thing, so just enforce it if we are trying to unmount a failed
> >     filesystem.
> 
> I think this is wrong - the option should still remain to let the
> filesystem retry for a long while if desired. We have lazy unmounts
> to deal with this situation (i.e. it won't block an unmount command)
> and there are cases where leaving the unmount retrying in the
> background is useful.
> 
> If you are particularly worried about not being able to shut down a
> filesystem that has been unmounted lazily and hence terminate the
> retry forever loop, then we should be looking at ensuring the fs is
> still visible in /sys/fs/xfs/<dev> when it is in this state and
> providing a new shutdown hook through that interface....
> 

Right, I will keep this option, although, I don't think it's useful to have this
option configurable with a per-error granularity, a single fail_at_unmount might
suffice.

> > I reduced by now, the amount of patches into this patchset, once, our 
> > priority
> > here IMHO is to enable the possibility to shutdown the filesystem when we 
> > have
> > metadata errors, and I can work on disabling specific error configs and add
> > memory errors later, I hope that with a reduced amount of patches it can be
> > easier for people to review the core of the error handlers configuration and
> > speed up the inclusion of this patchset.
> 
> We'll see - the issue here is that once we settle on a sysfs
> interface, we can't change it easily (it's part of the user facing
> ABI). Hence if we don't consider all the different types of errors
> we want to handle from the start, we may miss something that we
> can't easily fix in future. Hence we at least have to consider the
> different constraints for the different error types now to determine
> if the abstractions and presentation will handle everything we think
> we might need...
> 

Agreed, the interface for now looks good IMO, and adding the possibility to hide
some specific error handlers from userspace can be done later without a real
impact in the ABI, I just believe that implementing the interface for errors
that will be visible is more important for now, specifically for EIO and ENOSPC

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

-- 
Carlos

<Prev in Thread] Current Thread [Next in Thread>