xfs
[Top] [All Lists]

[PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37

To: xfs@xxxxxxxxxxx
Subject: [PATCH 4/4] xfs: fix deadlock in xfs_rtfree_extent with kernel v2.6.37
From: kdasu <kdasu.kdev@xxxxxxxxx>
Date: Fri, 17 Feb 2012 15:00:09 -0800 (PST)
In-reply-to: <33346043.post@xxxxxxxxxxxxxxx>
References: <33345988.post@xxxxxxxxxxxxxxx> <33346009.post@xxxxxxxxxxxxxxx> <33346035.post@xxxxxxxxxxxxxxx> <33346043.post@xxxxxxxxxxxxxxx>

On the 2.6.37 kernel, xfs_fs_evict_inode() leads to a deadlock when
freeing multiple realtime extents. On further debugging the root
cause it was determined to be recursive locking of the RT bitmap
inode during evict operation within the same task context.
The same vfs evict sequence is replayed by the xfs log recovery on
mounts on a reboot after the problem happens first time.
This problem exists on kernel v2.6.39 as well.

Call stack:
xfs_ilock   <- simple task deadlock in the xfs_ilock(ip,  XFS_ILOCK_EXCL)
               re-acquired on second iteration when the inode is cached
xfs_iget_cache_hit
xfs_iget
xfs_trans_iget
xfs_rtfree_extent <- Call to xfs_trans_iget()
xfs_bmap_del_extent
xfs_bunmapi  <- while loop based on number of extents to free
xfs_itruncate_finish
xfs_inactive
evict

The deadlock fix has two parts :
1) check if the inode is already locked in xfs_iget.c in the
   xfs_iget_cache_hit() function. Do not acquire the inode lock again
   if ip is already locked with the  XFS_ILOCK_EXCL subclass.
   We use the active transaction structure to detect if the inode is
   already lokced.
2) In addition in xfs_trans_inode.c:xfs_trans_iget() prevent joining
   already active transaction.

The above changes are also needed along with the backport of following
2.6.39 kernel patches to 2.6.37 kernel:

xfs: only lock the rt bitmap inode once per allocation
xfs: fix xfs_get_extsz_hint for a zero extent size hint
xfs: add lockdep annotations for the rt inodes

Signed-off-by: Kamal Dasu <kdasu.kdev@xxxxxxxxx>
---
 fs/xfs/xfs_iget.c        |   12 +++++++++++-
 fs/xfs/xfs_trans_inode.c |    2 +-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index 0cdd269..f05bdc2 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -143,6 +143,7 @@ xfs_inode_free(
 static int
 xfs_iget_cache_hit(
        struct xfs_perag        *pag,
+       xfs_trans_t             *tp,
        struct xfs_inode        *ip,
        int                     flags,
        int                     lock_flags) __releases(pag->pag_ici_lock)
@@ -234,6 +235,15 @@ xfs_iget_cache_hit(
                trace_xfs_iget_hit(ip);
        }
 
+       /* check inode already locked  */
+       spin_lock(&ip->i_flags_lock);
+       if (tp &&  ip->i_transp == tp) {
+               if ((ip->i_itemp->ili_lock_flags & lock_flags) &
+                       (XFS_ILOCK_EXCL))
+                       lock_flags = 0;
+       }
+       spin_unlock(&ip->i_flags_lock);
+
        if (lock_flags != 0)
                xfs_ilock(ip, lock_flags);
 
@@ -379,7 +389,7 @@ again:
        ip = radix_tree_lookup(&pag->pag_ici_root, agino);
 
        if (ip) {
-               error = xfs_iget_cache_hit(pag, ip, flags, lock_flags);
+               error = xfs_iget_cache_hit(pag, tp, ip, flags, lock_flags);
                if (error)
                        goto out_error_or_again;
        } else {
diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index ccb3453..6f8db93 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -58,7 +58,7 @@ xfs_trans_iget(
        int                     error;
 
        error = xfs_iget(mp, tp, ino, flags, lock_flags, ipp);
-       if (!error && tp) {
+       if (!error && tp && !((*ipp)->i_transp)) {
                xfs_trans_ijoin(tp, *ipp);
                (*ipp)->i_itemp->ili_lock_flags = lock_flags;
        }
-- 
1.7.5.4

-- 
View this message in context: 
http://old.nabble.com/-PATCH-0-4--xfs%3A-resurrect-realtime-subvolume-support-on-kernel-2.6.37-tp33345988p33346051.html
Sent from the Xfs - General mailing list archive at Nabble.com.

<Prev in Thread] Current Thread [Next in Thread>