XFS handling of synchronous buffers in case of EIO error

Ajeet Yadav ajeet.yadav.77 at gmail.com
Fri Dec 31 00:47:12 CST 2010


Dear Dave,

Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.
But I have seen similar behaviour in another post related to process ls hang
in 2.6.35.9
*

http://oss.sgi.com/pipermail/xfs/2010-December/048691.html

*I have always seen the hang problem comes only if comes when b_relse !=
NULL, and b_hold > 2

I have made below workaround it solved the problem in our case because when
USB is removed we know we get EIO error.

But I think we need to review xfs_buf_error_relse() and xfs_buf_relse()
considering  XBF_LOCK flow path.

@@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(
                        /* We actually overwrite the existing b-relse
                           function at times, but we're gonna be shutting
down
                           anyway. */
-                       XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
-                       XFS_BUF_DONE(bp);
-                       XFS_BUF_FINISH_IOWAIT(bp);
+                       if (XFS_BUF_GETERROR(bp) == EIO){
+                               ASSERT(XFS_BUF_TARGET(bp) ==
mp->m_ddev_targp);
+                               XFS_BUF_SUPER_STALE(bp);
+                               trace_xfs_buf_item_iodone(bp, _RET_IP_);
+                               xfs_buf_do_callbacks(bp, lip);
+                               XFS_BUF_SET_FSPRIVATE(bp, NULL);
+                               XFS_BUF_CLR_IODONE_FUNC(bp);
+                               xfs_biodone(bp);
+                       } else {
+
XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
+                               XFS_BUF_DONE(bp);
+                               XFS_BUF_FINISH_IOWAIT(bp);
+                       }
                }
                return;
        }



 Dec 31, 2010 at 4:43 AM, Dave Chinner <david at fromorbit.com> wrote:

> On Thu, Dec 30, 2010 at 05:58:36PM +0530, Ajeet Yadav wrote:
> > Kernel: 2.6.30.9
> >
> > I am trouble shooting a hang in XFS during umount.
> > Test scenerio: Copy large number of files files using below script, and
> > remove the USB after 3-5 second
>
> FWIW, in future can you please report what kernel you are testing on?
>
> >
> > index=0
> > while [ "$?" == 0 ]
> > do
> >         index=$((index+1))
> >         sync
> >         cp $1/1KB.txt $2/"$index".test
> > done
> >
> > In rare scenerio during USB unplug the umount process hang at
> xfs_buf_lock.
> > Below log shows the hung process
> >
> > We have put printk to buffer handling functions
> xfs_buf_iodone_callbacks(),
> > xfs_buf_error_relse(), xfs_buf_relse() and xfs_buf_rele()
> >
> > We always observed the hang only comes when bp->b_relse =
> > xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() execute
> > XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> > XFS_BUF_DONE(bp);
> > XFS_BUF_FINISH_IOWAIT(bp);
> >
> >  buf its never called by xfs_buf_relse() because b_hold = 3.
> >
> > Also we have seen that this problem always comes when bp->relse != NULL
> &&
> > bp->hold > 1.
>
> This appears to be the same problem as reported here:
>
> http://oss.sgi.com/archives/xfs/2010-12/msg00380.html
>
>
> > I do not know whether below prints will help you, but I have taken printk
> > for super block buffer tracing
> > S-functionname ( Start of function)
> > E-functionname (End of function)
>
> If you have a recent enough kernel, you can get all this information
> from the tracing built into XFS.
>
> As it is, the cause of the problem is that setting bp->b_relse
> changes the behaviour of xfs_buf_relse() - if bp->b_relse is set, it
> doesn't unlock the buffer. This is normally just fine, because
> xfs_buf_rele() has a special case to handle buffers with
> bp->b_relse(), which adds a hold count and call the release function
> when the hold count drops to zero. The b_relse function is supposed
> to unlock the buffer by calling xfs_buf_relse() again.
>
> Unfortunately, the superblock buffer is special - the hold count on
> it never drops to zero until very late in the unmont process because
> it is managed by the filesystem.  Hence the bp->b_relse function is
> never called, and hence the buffer is never unlocked in this case.
> Hence future attempts to access it hang.
>
> I'll need to think about this one for a bit...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20101231/652dd32f/attachment.htm>


More information about the xfs mailing list