xfs
[Top] [All Lists]

Re: [PATCH 6/9] xfs: add "fail at unmount" error handling configuration

To: xfs@xxxxxxxxxxx
Subject: Re: [PATCH 6/9] xfs: add "fail at unmount" error handling configuration
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Tue, 16 Feb 2016 11:09:22 -0600
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20160216164451.GC39655@xxxxxxxxxxxxxxx>
References: <1454635407-22276-1-git-send-email-david@xxxxxxxxxxxxx> <1454635407-22276-7-git-send-email-david@xxxxxxxxxxxxx> <20160216164451.GC39655@xxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

On 2/16/16 10:44 AM, Brian Foster wrote:
> On Fri, Feb 05, 2016 at 12:23:24PM +1100, Dave Chinner wrote:
>> > From: Dave Chinner <dchinner@xxxxxxxxxx>
>> > 
>> > If we take "retry forever" literally on metadata IO errors, we can
>> > hang an unmount retries those writes forever. This is the default
>> > behaviour, unfortunately. Add a error configuration option for this
>> > behaviour and default it to "fail" so that an unmount will trigger
>> > actual errors, a shutdown and allow the unmount to succeed. It will
>> > be noisy, though, as it will log the errors and shutdown that
>> > occurs.
>> > 
>> > To do this, we need to mark the filesystem as being in the process
>> > of unmounting. Do this with a mount flag that is added at the
>> > appropriate time (i.e. before the blocking AIL sync). We also need
>> > to add this flag if mount fails after the initial phase of log
>> > recovery has been run.
>> > 
>> > The config is done by a separate boolean sysfs option rather than a
>> > new fail_speed enum, as fail_at_unmount is relevant to both
>> > XFS_ERR_FAIL_NEVER and XFS_ERR_FAIL_SLOW options.
>> > 
>> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
>> > ---
> Similar question of scope/granularity here... why would one want to set
> this option for a particular error and not any others? In other words,
> it seems more useful as a global (or per-mount) option.

I guess my question here is higher-level than that.  Why make this
(fail_at_unmount) configurable at all.  When would one *want* unmount
blocked by pending failure retries?

I guess I could imagine it as sort of a safety net, "I told it to retry
for a day, and the day's not up yet, so we shouldn't stop trying
just because I said unmount!" - but that seems a bit contrived to me.

-Eric

<Prev in Thread] Current Thread [Next in Thread>