xfs
[Top] [All Lists]

Re: [PATCH 5/7] xfs: add configuration of error failure speed

To: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
Subject: Re: [PATCH 5/7] xfs: add configuration of error failure speed
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 6 May 2016 10:04:33 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1462376600-8617-6-git-send-email-cmaiolino@xxxxxxxxxx>
References: <1462376600-8617-1-git-send-email-cmaiolino@xxxxxxxxxx> <1462376600-8617-6-git-send-email-cmaiolino@xxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, May 04, 2016 at 05:43:18PM +0200, Carlos Maiolino wrote:
> On reception of an error, we can fail immediately, perform some
> bound amount of retries or retry indefinitely. The current behaviour
> we have is to retry forever.
> 
> However, we'd like the ability to choose how long the filesystem should try
> after an error, it can either fail immediately, retry a few times, or retry
> forever. This is implemented by using max_retries sysfs attribute, to hold the
> amount of times we allow the filesystem to retry after an error. Being -1 a
> special case where the filesystem will retry indefinitely.
> 
> Add both a maximum retry count and a retry timeout so that we can bound by
> time and/or physical IO attempts.
> 
> Finally, plumb these into xfs_buf_iodone error processing so that
> the error behaviour follows the selected configuration.
> 
> Changelog:
> 
> V3:
>       - In xfs_buf_iodone_callback_error, use max_retries to decide how long
>         the filesystem should retry on errors instead of XFS_ERR_FAIL enums
>         and fail_speed
> 
>       - Remove all code implementing fail_speed attribute from the original
>         patch
>               -- failure_speed_show/store attributes function implementation
>               -- max_retries_store() now accepts values from -1 up to INT_MAX
> 
>       - retry_timeout_seconds_show() print fixed:
>               -- jiffies_to_msecs() should be divided by MSEC_PER_SEC
>               -- trailing whitespace removed

Where's XFS_ERR_RETRY_FOREVER?

> @@ -1095,8 +1098,12 @@ xfs_buf_iodone_callback_error(
>        * Repeated failure on an async write. Take action according to the
>        * error configuration we have been set up to use.
>        */
> -     if (!cfg->max_retries)
> -             goto permanent_error;
> +     if ((cfg->max_retries >= 0) &&
> +         (++bp->b_retries > cfg->max_retries))
> +                     goto permanent_error;

I suggested:

        if (cfg->max_retries != XFS_ERR_RETRY_FOREVER &&
            ++bp->b_retries > cfg->max_retries)
                goto permanent_error;

so that we document that there is a "retry forever" case being
handled here. I really don't like magic "-1", ">= 0" or other
implicit comparisions that don't document that it is valid to retry
forever in these cases.

> +     if (cfg->retry_timeout &&
> +         time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time))
> +                     goto permanent_error;
>  
>       /* still a transient error, higher layers will retry */
>       xfs_buf_ioerror(bp, 0);
> @@ -1139,6 +1146,7 @@ xfs_buf_iodone_callbacks(
>        * retry state here in preparation for the next error that may occur.
>        */
>       bp->b_last_error = 0;
> +     bp->b_retries = 0;
>  
>       xfs_buf_do_callbacks(bp);
>       bp->b_fspriv = NULL;
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 0c5a976..0382140 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -54,7 +54,8 @@ enum {
>  
>  struct xfs_error_cfg {
>       struct xfs_kobj kobj;
> -     int             max_retries;
> +     int             max_retries;    /* -1 = retry forever */

as per my last review, remove the comment, add XFS_ERR_RETRY_FOREVER
to document that "-1 = retry forever" and use that in the code so
it's explicit that the code is intended to handle this case.

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>