xfs
[Top] [All Lists]

Re: [PATCH 5/7] xfs: add configuration of error failure speed

To: xfs@xxxxxxxxxxx
Subject: Re: [PATCH 5/7] xfs: add configuration of error failure speed
From: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
Date: Fri, 6 May 2016 12:59:44 +0200
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20160506000433.GG26977@dastard>
Mail-followup-to: xfs@xxxxxxxxxxx
References: <1462376600-8617-1-git-send-email-cmaiolino@xxxxxxxxxx> <1462376600-8617-6-git-send-email-cmaiolino@xxxxxxxxxx> <20160506000433.GG26977@dastard>
User-agent: Mutt/1.5.24 (2015-08-30)
On Fri, May 06, 2016 at 10:04:33AM +1000, Dave Chinner wrote:
> On Wed, May 04, 2016 at 05:43:18PM +0200, Carlos Maiolino wrote:
> > On reception of an error, we can fail immediately, perform some
> > bound amount of retries or retry indefinitely. The current behaviour
> > we have is to retry forever.
> > 
> > However, we'd like the ability to choose how long the filesystem should try
> > after an error, it can either fail immediately, retry a few times, or retry
> > forever. This is implemented by using max_retries sysfs attribute, to hold 
> > the
> > amount of times we allow the filesystem to retry after an error. Being -1 a
> > special case where the filesystem will retry indefinitely.
> > 
> > Add both a maximum retry count and a retry timeout so that we can bound by
> > time and/or physical IO attempts.
> > 
> > Finally, plumb these into xfs_buf_iodone error processing so that
> > the error behaviour follows the selected configuration.
> > 
> > Changelog:
> > 
> > V3:
> >     - In xfs_buf_iodone_callback_error, use max_retries to decide how long
> >       the filesystem should retry on errors instead of XFS_ERR_FAIL enums
> >       and fail_speed
> > 
> >     - Remove all code implementing fail_speed attribute from the original
> >       patch
> >             -- failure_speed_show/store attributes function implementation
> >             -- max_retries_store() now accepts values from -1 up to INT_MAX
> > 
> >     - retry_timeout_seconds_show() print fixed:
> >             -- jiffies_to_msecs() should be divided by MSEC_PER_SEC
> >             -- trailing whitespace removed
> 
> Where's XFS_ERR_RETRY_FOREVER?

Hi, I didn't understand you literally meant to have XFS_ERR_RETRY_FOREVER macro
implemented, sorry about that, I'll implement it, add fail_at_unmount to
<dev>/error and re-send the patchset

> 
> > @@ -1095,8 +1098,12 @@ xfs_buf_iodone_callback_error(
> >      * Repeated failure on an async write. Take action according to the
> >      * error configuration we have been set up to use.
> >      */
> > -   if (!cfg->max_retries)
> > -           goto permanent_error;
> > +   if ((cfg->max_retries >= 0) &&
> > +       (++bp->b_retries > cfg->max_retries))
> > +                   goto permanent_error;
> 
> I suggested:
> 
>         if (cfg->max_retries != XFS_ERR_RETRY_FOREVER &&
>             ++bp->b_retries > cfg->max_retries)
>                 goto permanent_error;
> 
> so that we document that there is a "retry forever" case being
> handled here. I really don't like magic "-1", ">= 0" or other
> implicit comparisions that don't document that it is valid to retry
> forever in these cases.
> 
> > +   if (cfg->retry_timeout &&
> > +       time_after(jiffies, cfg->retry_timeout + bp->b_first_retry_time))
> > +                   goto permanent_error;
> >  
> >     /* still a transient error, higher layers will retry */
> >     xfs_buf_ioerror(bp, 0);
> > @@ -1139,6 +1146,7 @@ xfs_buf_iodone_callbacks(
> >      * retry state here in preparation for the next error that may occur.
> >      */
> >     bp->b_last_error = 0;
> > +   bp->b_retries = 0;
> >  
> >     xfs_buf_do_callbacks(bp);
> >     bp->b_fspriv = NULL;
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 0c5a976..0382140 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -54,7 +54,8 @@ enum {
> >  
> >  struct xfs_error_cfg {
> >     struct xfs_kobj kobj;
> > -   int             max_retries;
> > +   int             max_retries;    /* -1 = retry forever */
> 
> as per my last review, remove the comment, add XFS_ERR_RETRY_FOREVER
> to document that "-1 = retry forever" and use that in the code so
> it's explicit that the code is intended to handle this case.
> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

-- 
Carlos

<Prev in Thread] Current Thread [Next in Thread>