<div>Dear Dave,</div>
<div> </div>
<div>Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.</div>
<div>But I have seen similar behaviour in another post related to process ls hang in 2.6.35.9</div>
<div><u><font color="#0000ff" size="2"><font color="#0000ff" size="2">
<p><a href="http://oss.sgi.com/pipermail/xfs/2010-December/048691.html">http://oss.sgi.com/pipermail/xfs/2010-December/048691.html</a></p>
<p></p></font></font></u>I have always seen the hang problem comes only if comes when b_relse != NULL, and b_hold > 2
<p>I have made below workaround it solved the problem in our case because when USB is removed we know we get EIO error.</p>
<p>But I think we need to review xfs_buf_error_relse() and xfs_buf_relse() considering XBF_LOCK flow path.</p>
<p>@@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(<br> /* We actually overwrite the existing b-relse<br> function at times, but we're gonna be shutting down<br> anyway. */<br>
- XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);<br>- XFS_BUF_DONE(bp);<br>- XFS_BUF_FINISH_IOWAIT(bp);<br>+ if (XFS_BUF_GETERROR(bp) == EIO){<br>
+ ASSERT(XFS_BUF_TARGET(bp) == mp->m_ddev_targp);<br>+ XFS_BUF_SUPER_STALE(bp);<br>+ trace_xfs_buf_item_iodone(bp, _RET_IP_);<br>
+ xfs_buf_do_callbacks(bp, lip);<br>+ XFS_BUF_SET_FSPRIVATE(bp, NULL);<br>+ XFS_BUF_CLR_IODONE_FUNC(bp);<br>+ xfs_biodone(bp);<br>
+ } else {<br>+ XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);<br>+ XFS_BUF_DONE(bp);<br>+ XFS_BUF_FINISH_IOWAIT(bp);<br>
+ }<br> }<br> return;<br> }</p>
<p> </p>
<p> Dec 31, 2010 at 4:43 AM, Dave Chinner <span dir="ltr"><<a href="mailto:david@fromorbit.com" target="_blank">david@fromorbit.com</a>></span> wrote:<br></p></div>
<div class="gmail_quote">
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">
<div>On Thu, Dec 30, 2010 at 05:58:36PM +0530, Ajeet Yadav wrote:<br>> Kernel: 2.6.30.9<br>><br>> I am trouble shooting a hang in XFS during umount.<br>> Test scenerio: Copy large number of files files using below script, and<br>
> remove the USB after 3-5 second<br><br></div>FWIW, in future can you please report what kernel you are testing on?<br>
<div><br>><br>> index=0<br>> while [ "$?" == 0 ]<br>> do<br>> index=$((index+1))<br>> sync<br>> cp $1/1KB.txt $2/"$index".test<br>> done<br>><br>> In rare scenerio during USB unplug the umount process hang at xfs_buf_lock.<br>
> Below log shows the hung process<br>><br>> We have put printk to buffer handling functions xfs_buf_iodone_callbacks(),<br>> xfs_buf_error_relse(), xfs_buf_relse() and xfs_buf_rele()<br>><br>> We always observed the hang only comes when bp->b_relse =<br>
> xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() execute<br>> XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);<br>> XFS_BUF_DONE(bp);<br>> XFS_BUF_FINISH_IOWAIT(bp);<br>><br>> buf its never called by xfs_buf_relse() because b_hold = 3.<br>
><br>> Also we have seen that this problem always comes when bp->relse != NULL &&<br>> bp->hold > 1.<br><br></div>This appears to be the same problem as reported here:<br><br><a href="http://oss.sgi.com/archives/xfs/2010-12/msg00380.html" target="_blank">http://oss.sgi.com/archives/xfs/2010-12/msg00380.html</a><br>
<div><br><br>> I do not know whether below prints will help you, but I have taken printk<br>> for super block buffer tracing<br>> S-functionname ( Start of function)<br>> E-functionname (End of function)<br><br>
</div>If you have a recent enough kernel, you can get all this information<br>from the tracing built into XFS.<br><br>As it is, the cause of the problem is that setting bp->b_relse<br>changes the behaviour of xfs_buf_relse() - if bp->b_relse is set, it<br>
doesn't unlock the buffer. This is normally just fine, because<br>xfs_buf_rele() has a special case to handle buffers with<br>bp->b_relse(), which adds a hold count and call the release function<br>when the hold count drops to zero. The b_relse function is supposed<br>
to unlock the buffer by calling xfs_buf_relse() again.<br><br>Unfortunately, the superblock buffer is special - the hold count on<br>it never drops to zero until very late in the unmont process because<br>it is managed by the filesystem. Hence the bp->b_relse function is<br>
never called, and hence the buffer is never unlocked in this case.<br>Hence future attempts to access it hang.<br><br>I'll need to think about this one for a bit...<br><br>Cheers,<br><br>Dave.<br><font color="#888888">--<br>
Dave Chinner<br><a href="mailto:david@fromorbit.com" target="_blank">david@fromorbit.com</a><br></font></blockquote></div><br>