<div>Thanks, I think its better to end this mail by rerering to your patch.</div>
<div> </div>
<div><a href="http://oss.sgi.com/archives/xfs/2011-01/msg00020.html">http://oss.sgi.com/archives/xfs/2011-01/msg00020.html</a></div>
<div><br><br> </div>
<div class="gmail_quote">On Tue, Jan 4, 2011 at 2:19 PM, Dave Chinner <span dir="ltr">&lt;<a href="mailto:david@fromorbit.com">david@fromorbit.com</a>&gt;</span> wrote:<br>
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">
<div>
<div></div>
<div class="h5">On Fri, Dec 31, 2010 at 12:17:12PM +0530, Ajeet Yadav wrote:<br>&gt; Dear Dave,<br>&gt;<br>&gt; Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.<br>&gt; But I have seen similar behaviour in another post related to process ls hang<br>
&gt; in 2.6.35.9<br>&gt; *<br>&gt;<br>&gt; <a href="http://oss.sgi.com/pipermail/xfs/2010-December/048691.html" target="_blank">http://oss.sgi.com/pipermail/xfs/2010-December/048691.html</a><br>&gt;<br>&gt; *I have always seen the hang problem comes only if comes when b_relse !=<br>
&gt; NULL, and b_hold &gt; 2<br>&gt;<br>&gt; I have made below workaround it solved the problem in our case because when<br>&gt; USB is removed we know we get EIO error.<br>&gt;<br>&gt; But I think we need to review xfs_buf_error_relse() and xfs_buf_relse()<br>
&gt; considering  XBF_LOCK flow path.<br>&gt;<br>&gt; @@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(<br>&gt;                         /* We actually overwrite the existing b-relse<br>&gt;                            function at times, but we&#39;re gonna be shutting<br>
&gt; down<br>&gt;                            anyway. */<br>&gt; -                       XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);<br>&gt; -                       XFS_BUF_DONE(bp);<br>&gt; -                       XFS_BUF_FINISH_IOWAIT(bp);<br>
&gt; +                       if (XFS_BUF_GETERROR(bp) == EIO){<br>&gt; +                               ASSERT(XFS_BUF_TARGET(bp) ==<br>&gt; mp-&gt;m_ddev_targp);<br>&gt; +                               XFS_BUF_SUPER_STALE(bp);<br>
&gt; +                               trace_xfs_buf_item_iodone(bp, _RET_IP_);<br>&gt; +                               xfs_buf_do_callbacks(bp, lip);<br>&gt; +                               XFS_BUF_SET_FSPRIVATE(bp, NULL);<br>
&gt; +                               XFS_BUF_CLR_IODONE_FUNC(bp);<br>&gt; +                               xfs_biodone(bp);<br>&gt; +                       } else {<br>&gt; +<br>&gt; XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);<br>
&gt; +                               XFS_BUF_DONE(bp);<br>&gt; +                               XFS_BUF_FINISH_IOWAIT(bp);<br>&gt; +                       }<br>&gt;                 }<br>&gt;                 return;<br>&gt;         }<br>
<br></div></div>This won&#39;t work reliably because it only handles one specific type<br>of error. We can get more than just EIO back from the lower layers,<br>and so if the superblock write gets a different error then we&#39;ll<br>
still get the same hang.<br><br>Effectively what you are doing here is running the<br>xfs_buf_error_relse() callback directly in line. This will result in<br>the buffer being unlocked before the error is pulled off the buffer<br>
after xfs_buf_iowait() completes. Essentially that means that some<br>other thread can reuse the buffer and clear the error before the<br>waiter has received the error.<br><br>I think the correct fix is to call the bp-&gt;b_relse function when the<br>
waiter is woken to clear the error and unlock the buffer. I&#39;ve just<br>posted a patch to do this for 2.6.38, but it won&#39;t trivially backport<br>to 2.6.34 or 2.6.30 as the synchronous write interfaces into the<br>buffer cache have been cleaned up and simplified recently. It should<br>
still be relatively easy to handle, though.<br>
<div>
<div></div>
<div class="h5"><br>Cheers,<br><br>Dave.<br>--<br>Dave Chinner<br><a href="mailto:david@fromorbit.com">david@fromorbit.com</a><br></div></div></blockquote></div><br>