XFS handling of synchronous buffers in case of EIO error

Ajeet Yadav ajeet.yadav.77 at gmail.com
Wed Jan 5 02:26:31 CST 2011


Thanks, I think its better to end this mail by rerering to your patch.

http://oss.sgi.com/archives/xfs/2011-01/msg00020.html



On Tue, Jan 4, 2011 at 2:19 PM, Dave Chinner <david at fromorbit.com> wrote:

>  On Fri, Dec 31, 2010 at 12:17:12PM +0530, Ajeet Yadav wrote:
> > Dear Dave,
> >
> > Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.
> > But I have seen similar behaviour in another post related to process ls
> hang
> > in 2.6.35.9
> > *
> >
> > http://oss.sgi.com/pipermail/xfs/2010-December/048691.html
> >
> > *I have always seen the hang problem comes only if comes when b_relse !=
> > NULL, and b_hold > 2
> >
> > I have made below workaround it solved the problem in our case because
> when
> > USB is removed we know we get EIO error.
> >
> > But I think we need to review xfs_buf_error_relse() and xfs_buf_relse()
> > considering  XBF_LOCK flow path.
> >
> > @@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(
> >                         /* We actually overwrite the existing b-relse
> >                            function at times, but we're gonna be shutting
> > down
> >                            anyway. */
> > -                       XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> > -                       XFS_BUF_DONE(bp);
> > -                       XFS_BUF_FINISH_IOWAIT(bp);
> > +                       if (XFS_BUF_GETERROR(bp) == EIO){
> > +                               ASSERT(XFS_BUF_TARGET(bp) ==
> > mp->m_ddev_targp);
> > +                               XFS_BUF_SUPER_STALE(bp);
> > +                               trace_xfs_buf_item_iodone(bp, _RET_IP_);
> > +                               xfs_buf_do_callbacks(bp, lip);
> > +                               XFS_BUF_SET_FSPRIVATE(bp, NULL);
> > +                               XFS_BUF_CLR_IODONE_FUNC(bp);
> > +                               xfs_biodone(bp);
> > +                       } else {
> > +
> > XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> > +                               XFS_BUF_DONE(bp);
> > +                               XFS_BUF_FINISH_IOWAIT(bp);
> > +                       }
> >                 }
> >                 return;
> >         }
>
> This won't work reliably because it only handles one specific type
> of error. We can get more than just EIO back from the lower layers,
> and so if the superblock write gets a different error then we'll
> still get the same hang.
>
> Effectively what you are doing here is running the
> xfs_buf_error_relse() callback directly in line. This will result in
> the buffer being unlocked before the error is pulled off the buffer
> after xfs_buf_iowait() completes. Essentially that means that some
> other thread can reuse the buffer and clear the error before the
> waiter has received the error.
>
> I think the correct fix is to call the bp->b_relse function when the
> waiter is woken to clear the error and unlock the buffer. I've just
> posted a patch to do this for 2.6.38, but it won't trivially backport
> to 2.6.34 or 2.6.30 as the synchronous write interfaces into the
> buffer cache have been cleaned up and simplified recently. It should
> still be relatively easy to handle, though.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20110105/2b14316b/attachment.htm>


More information about the xfs mailing list