On Wed, May 04, 2016 at 05:43:18PM +0200, Carlos Maiolino wrote:
> On reception of an error, we can fail immediately, perform some
> bound amount of retries or retry indefinitely. The current behaviour
> we have is to retry forever.
>
> However, we'd like the ability to choose how long the filesystem should try
> after an error, it can either fail immediately, retry a few times, or retry
> forever. This is implemented by using max_retries sysfs attribute, to hold the
> amount of times we allow the filesystem to retry after an error. Being -1 a
> special case where the filesystem will retry indefinitely.
>
> Add both a maximum retry count and a retry timeout so that we can bound by
> time and/or physical IO attempts.
>
> Finally, plumb these into xfs_buf_iodone error processing so that
> the error behaviour follows the selected configuration.
>
> Changelog:
>
> V3:
> - In xfs_buf_iodone_callback_error, use max_retries to decide how long
> the filesystem should retry on errors instead of XFS_ERR_FAIL enums
> and fail_speed
>
> - Remove all code implementing fail_speed attribute from the original
> patch
> -- failure_speed_show/store attributes function implementation
> -- max_retries_store() now accepts values from -1 up to INT_MAX
>
> - retry_timeout_seconds_show() print fixed:
> -- jiffies_to_msecs() should be divided by MSEC_PER_SEC
> -- trailing whitespace removed
Where's XFS_ERR_RETRY_FOREVER?
> @@ -1095,8 +1098,12 @@ xfs_buf_iodone_callback_error(
> * Repeated failure on an async write. Take action according to the
> * error configuration we have been set up to use.
> */
> - if (!cfg->max_retries)
> - goto permanent_error;
> + if ((cfg->max_retries >= 0) &&
> + (++bp->b_retries > cfg->max_retries))
> + goto permanent_error;
I suggested:
if (cfg->max_retries != XFS_ERR_RETRY_FOREVER &&
++bp->b_retries > cfg->max_retries)
goto permanent_error;
so that we document that there is a "retry forever" case being
handled here. I really don't like magic "-1", ">= 0" or other
implicit comparisions that don't document that it is valid to retry
forever in these cases.
> + if (cfg->retry_timeout &&
> + time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time))
> + goto permanent_error;
>
> /* still a transient error, higher layers will retry */
> xfs_buf_ioerror(bp, 0);
> @@ -1139,6 +1146,7 @@ xfs_buf_iodone_callbacks(
> * retry state here in preparation for the next error that may occur.
> */
> bp->b_last_error = 0;
> + bp->b_retries = 0;
>
> xfs_buf_do_callbacks(bp);
> bp->b_fspriv = NULL;
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 0c5a976..0382140 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -54,7 +54,8 @@ enum {
>
> struct xfs_error_cfg {
> struct xfs_kobj kobj;
> - int max_retries;
> + int max_retries; /* -1 = retry forever */
as per my last review, remove the comment, add XFS_ERR_RETRY_FOREVER
to document that "-1 = retry forever" and use that in the code so
it's explicit that the code is intended to handle this case.
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|