xfs
[Top] [All Lists]

Re: [PATCH] xfs: Introduce permanent async buffer write IO failures

To: Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs@xxxxxxxxxxx
Subject: Re: [PATCH] xfs: Introduce permanent async buffer write IO failures
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 20 Feb 2015 08:18:52 +1100
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150219190419.GA8862@xxxxxxxxxxxxxxxxxx>
References: <1424298740-25821-1-git-send-email-david@xxxxxxxxxxxxx> <54E51CC7.8040709@xxxxxxxxxxx> <20150218235220.GQ4251@dastard> <20150219190419.GA8862@xxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Feb 19, 2015 at 05:04:19PM -0200, Carlos Maiolino wrote:
> > 
> > Well, the switch is simple characterisation. What we do with that
> > error type can be much more complex, and that's why I haven't tried
> > to address those issues here. When we've sorted out what we need
> > and how we are going to configure the error handling, then we can
> > add it.
> > 
> > e.g. if we need configurable error handling, it needs to be
> > configurable for different error types, and it needs to be
> > configurable on a per-mount basis. And it needs to be configurable
> > at runtime, not just at mount time. That kind of leads to using
> > sysfs for this. e.g. for each error type we ned to handle different
> > behaviour for:
> > 
> > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/type
> > [transient] permanent
> > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_timeout_seconds
> > 300
> > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts
> > 50
> > $ cat /sys/fs/xfs/vda/meta_write_errors/enospc/transient_fail_at_umount
> > 1
> > 
> > And then have generic infrastructure to set it up and handle the
> > buffer errors according to the config?
> > 
> > > (I think that's accurately summing up irc-and-side-channel discussions) ;)
> > 
> > Pretty much.
> > 
> 
> talking about possible configurable error handlers, what about leave this 
> choice
> of failure to the sysadmin? Instead a time or type based configuration what
> about something that the administrator could just say "next IO to device X
> should fail permanently"?

How is this different to just shutting down the filesystem
immediately via 'xfs_io -x -c shutdown /path/to/mnt/pt' ?

Regardless of this, leave failures as transient, then when an
error condition occurs (say thinp device ENOSPC), this will error
out on the next IO that is retried:

# echo permanent > /sys/fs/xfs/vda/meta_write_errors/enospc/type
# echo 0 > /sys/fs/xfs/vda/meta_write_errors/enospc/perm_max_retry_attempts

Will make the next device ENOSPC IO error shut the filesystem down.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>