xfs
[Top] [All Lists]

Re: XFS handling of synchronous buffers in case of EIO error

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS handling of synchronous buffers in case of EIO error
From: Ajeet Yadav <ajeet.yadav.77@xxxxxxxxx>
Date: Fri, 31 Dec 2010 12:17:12 +0530
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=7O+xbaO6NV4G+eP6UxxVE6TSItVt4L1mEDcipT9Osdc=; b=wKvxouhKayHDn59RTZRSIWbC4ftgsO+dw3fZe9rDp7KK5v8tLVuFZECSMFPUfDZKRo 1HbNH150jmTVOxUbE0d8qV6Y+8VNudWS4L2mlsmnMc1KAnTP1fl5M/ZK0Kqr80RfiqQX m/1DzLzGgoJthM4qQOCfffOGjpy/FCMoMt3VI=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=wbXIqjeUo0K7HtvLiFFW+Uum7/8btw/GIf0V6H9oSYUloSGxcPzI6FFa7dV5cGccMv KgNULnQWlTqKu9HknjctW/3db2msvM/YESNn8TsxCsOm+4FbTfV4TJzumsLWVReOy+SA 52phtERrXHbS+XfB2hayTKooY/gmU+zbvwFuo=
In-reply-to: <20101230231353.GC15179@dastard>
References: <AANLkTi=Tmh9m_Rwy-bUZQEzcZ3M+6X9tZxFMO-J2Rvec@xxxxxxxxxxxxxx> <20101230231353.GC15179@dastard>
Dear Dave,
 
Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.
But I have seen similar behaviour in another post related to process ls hang in 2.6.35.9

http://oss.sgi.com/pipermail/xfs/2010-December/048691.html

I have always seen the hang problem comes only if comes when b_relse != NULL, and b_hold > 2 

I have made below workaround it solved the problem in our case because when USB is removed we know we get EIO error.

But I think we need to review xfs_buf_error_relse() and xfs_buf_relse() considering  XBF_LOCK flow path.

@@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(
                        /* We actually overwrite the existing b-relse
                           function at times, but we're gonna be shutting down
                           anyway. */
-                       XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
-                       XFS_BUF_DONE(bp);
-                       XFS_BUF_FINISH_IOWAIT(bp);
+                       if (XFS_BUF_GETERROR(bp) == EIO){
+                               ASSERT(XFS_BUF_TARGET(bp) == mp->m_ddev_targp);
+                               XFS_BUF_SUPER_STALE(bp);
+                               trace_xfs_buf_item_iodone(bp, _RET_IP_);
+                               xfs_buf_do_callbacks(bp, lip);
+                               XFS_BUF_SET_FSPRIVATE(bp, NULL);
+                               XFS_BUF_CLR_IODONE_FUNC(bp);
+                               xfs_biodone(bp);
+                       } else {
+                               XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
+                               XFS_BUF_DONE(bp);
+                               XFS_BUF_FINISH_IOWAIT(bp);
+                       }
                }
                return;
        }

 

 Dec 31, 2010 at 4:43 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:

On Thu, Dec 30, 2010 at 05:58:36PM +0530, Ajeet Yadav wrote:
> Kernel: 2.6.30.9
>
> I am trouble shooting a hang in XFS during umount.
> Test scenerio: Copy large number of files files using below script, and
> remove the USB after 3-5 second

FWIW, in future can you please report what kernel you are testing on?

>
> index=0
> while [ "$?" == 0 ]
> do
>         index=$((index+1))
>         sync
>         cp $1/1KB.txt $2/"$index".test
> done
>
> In rare scenerio during USB unplug the umount process hang at xfs_buf_lock.
> Below log shows the hung process
>
> We have put printk to buffer handling functions xfs_buf_iodone_callbacks(),
> xfs_buf_error_relse(), xfs_buf_relse() and xfs_buf_rele()
>
> We always observed the hang only comes when bp->b_relse =
> xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() execute
> XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> XFS_BUF_DONE(bp);
> XFS_BUF_FINISH_IOWAIT(bp);
>
>  buf its never called by xfs_buf_relse() because b_hold = 3.
>
> Also we have seen that this problem always comes when bp->relse != NULL &&
> bp->hold > 1.

This appears to be the same problem as reported here:

http://oss.sgi.com/archives/xfs/2010-12/msg00380.html


> I do not know whether below prints will help you, but I have taken printk
> for super block buffer tracing
> S-functionname ( Start of function)
> E-functionname (End of function)

If you have a recent enough kernel, you can get all this information
from the tracing built into XFS.

As it is, the cause of the problem is that setting bp->b_relse
changes the behaviour of xfs_buf_relse() - if bp->b_relse is set, it
doesn't unlock the buffer. This is normally just fine, because
xfs_buf_rele() has a special case to handle buffers with
bp->b_relse(), which adds a hold count and call the release function
when the hold count drops to zero. The b_relse function is supposed
to unlock the buffer by calling xfs_buf_relse() again.

Unfortunately, the superblock buffer is special - the hold count on
it never drops to zero until very late in the unmont process because
it is managed by the filesystem.  Hence the bp->b_relse function is
never called, and hence the buffer is never unlocked in this case.
Hence future attempts to access it hang.

I'll need to think about this one for a bit...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>