[PATCH] xfs: Document error handlers behavior
Eric Sandeen
sandeen at sandeen.net
Thu Sep 8 09:29:18 CDT 2016
On 9/8/16 4:23 AM, Carlos Maiolino wrote:
> Document the implementation of error handlers into sysfs.
>
> Changelog:
>
> V2:
> - Add a description of the precedence order of each option, focusing on
> the behavior of "fail_at_unmount" which was not well explained in V1
>
> V3:
> - Fix English spelling mistakes suggested by Eric
Please put the patch version changelog after the "---" so it doesn't become
part of the permanent commit log; it's for current patch reviewers, not for
future code archaeologists.
> Signed-off-by: Carlos Maiolino <cmaiolino at redhat.com>
> ---
> Documentation/filesystems/xfs.txt | 70 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 70 insertions(+)
>
> diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
> index 8146e9f..8b6c861 100644
> --- a/Documentation/filesystems/xfs.txt
> +++ b/Documentation/filesystems/xfs.txt
> @@ -348,3 +348,73 @@ Removed Sysctls
> ---- -------
> fs.xfs.xfsbufd_centisec v4.0
> fs.xfs.age_buffer_centisecs v4.0
> +
> +Error handling
> +==============
> +
> +XFS can act differently according to the type of error found
> +during its operation. The implementation introduces the following
> +concepts to the error handler:
> +
> + -failure speed:
> + Defines how fast XFS should shut down when of a specific error is found
when a specific error is found
> + during the filesystem operation. It can shut down immediately, after a
> + defined number of retries, after a set time period, or simply retry
> + forever. The old "retry forever" behavior is still the default, except
> + during unmount, where any IOs retrying due to errors will be cancelled
> + and unmount will be allowed to proceed.
> +
> + -error classes:
> + Specifies the subsystem/location where the error handlers, such as
location of the error handlers
> + metadata or memory allocation. Only metadata IO errors are handled
> + at this time.
> +
> + -error handlers:
> + Defines the behavior for a specific error.
> +
> +The filesystem behavior during an error can be set via sysfs files, where the
> +errors are organized with the structure below. Each configuration option works
> +independently, the first condition met for a specific configuration will cause
> +the filesystem to shut down:
> +
> + /sys/fs/xfs/<dev>/error/<class>/<error>/
The above line kind of hangs there oddly, because the first thing you do below
is describe a file which isn't in the above hierarchy.
Maybe we should show something like:
+ /sys/fs/xfs/<dev>/error/fail_at_unmount
+ /sys/fs/xfs/<dev>/error/<class>/<error>/<configuration>
to show everything that might be under it? Not sure if that's better.
> +
> +Each directory contains:
> +
> + /sys/fs/xfs/<dev>/error/
> +
> + fail_at_unmount (Min: 0 Default: 1 Max: 1)
> + Defines the global error behavior at unmount time. If set to the
> + default value of 1, XFS will cancel any pending IO retries, shut
> + down, and unmount. If set to 0, pending IO retries may prevent
> + the filesystem from unmounting.
> +
> + <class> subdirectories
> + Contains specific error handlers configuration
> + (Ex: /sys/fs/xfs/<dev>/error/metadata, see below).
> +
> + /sys/fs/xfs/<dev>/error/<class>/
> +
> + Directory containing configuration for a specific error <class>;
> + currently only the "metadata" <class> is implemented.
> + The contents of this directory are <class> specific, since each <class>
> + might need to handle different types of errors.
> +
> + /sys/fs/xfs/<dev>/error/<class>/<error>/
> +
> + Contains the failure speed configuration files for specific errors in
> + this <class, as well as a "default" behavior. Each <error> directory
<class>
> + contains the following configuration files:
> +
> + max_retries (Min: -1 Default: -1 Max: INTMAX)
> + Defines the allowed number of retries of a specific error before
> + the filesystem will shut down. The default value of "-1" will
> + cause XFS to retry forever for this specific error. Setting it
> + to "0" will cause XFS to fail immediately when the specific
> + error is found, and setting it to "N," where N is greater than 0,
> + will make XFS retry "N" times before shutting down.
> +
> + retry_timeout_seconds (Min: 0 Default: 0 Max: INTMAX)
> + Define the amount of time (in seconds) that the filesystem is
> + allowed to retry its operations when the specific error is
> + found. The default value of "0" will cause XFS to retry forever.
The default for ENODEV is different ... tricky to document that. Good luck. ;)
The maximum for retry_timeout_seconds is 86400 (1 day), not INTMAX:
retry_timeout_seconds_store()
{
...
/* 1 day timeout maximum */
if (val < 0 || val > 86400)
return -EINVAL;
...
}
The default of -1 vs. 0 might change with the other patch I sent, but we can
fix this up if it's accepted.
-Eric
More information about the xfs
mailing list