next-20090220: XFS: inconsistent lock state

Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: next-20090220: XFS: inconsistent lock state
Felix Blyakher <felixb@xxxxxxx>
Date: Tue, 3 Mar 2009 10:57:07 -0600
Christoph Hellwig <hch@xxxxxxxxxxxxx>, Alexander Beregalov <a.beregalov@xxxxxxxxx>, "linux-next@xxxxxxxxxxxxxxx" <linux-next@xxxxxxxxxxxxxxx>, LKML <linux-kernel@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <49AD5401.30803@xxxxxxxxxxx>
References: <a4423d670902200952v5dc2fd91w3b54ab1db51a7fe2@xxxxxxxxxxxxxx> <20090224200740.GA9266@xxxxxxxxxxxxx> <49AD5401.30803@xxxxxxxxxxx>

On Mar 3, 2009, at 10:00 AM, Eric Sandeen wrote:

Christoph Hellwig wrote:
On Fri, Feb 20, 2009 at 08:52:59PM +0300, Alexander Beregalov wrote:

[ INFO: inconsistent lock state ]
2.6.29-rc5-next-20090220 #2
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
kswapd0/324 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&(&ip->i_lock)->mr_lock){+++++?}, at: [<ffffffff803ca60a>]
{RECLAIM_FS-ON-W} state was registered at:

That's a false positive.  While the ilock can be taken in reclaim the
allocation here is done before the inode is added to the inode cache.

The patch below should help avoiding the warning:

Seems ok to me. I hate to see the BUG() added but I guess in this case
something truly bizarre would have to happen for the ilock to fail on
this inode.

on irc you sugggested ASSERT(0); instead of BUG();

That would mean that instead of bombing out here, we do it
in xfs debug kernels only, which is a good thing. However,
do we just silently ignore it in non debug kernels, and
later try to unlock without locking first?
Maybe the following be better:

        if (lock_flags) {
                if (!xfs_ilock_nowait(ip, lock_flags)) {
                        error = EAGAIN;
                        goto out_destroy;
Or just keep the BUG(); , as it shouldn't happen (we hope).

Reviewed-by: Felix Blyakher <felixb@xxxxxxx>

I might prefer that
but either way:

Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxxx>

Index: xfs/fs/xfs/xfs_iget.c
--- xfs.orig/fs/xfs/xfs_iget.c  2009-02-24 20:56:00.716027739 +0100
+++ xfs/fs/xfs/xfs_iget.c       2009-02-24 20:56:46.089031360 +0100
@@ -246,9 +246,6 @@ xfs_iget_cache_miss(
                goto out_destroy;

-       if (lock_flags)
-               xfs_ilock(ip, lock_flags);
         * Preload the radix tree so we can insert safely under the
         * write spinlock. Note that we cannot sleep inside the preload
@@ -259,6 +256,15 @@ xfs_iget_cache_miss(
                goto out_unlock;

+       /*
+ * Because the inode hasn't been added to the radix-tree yet it can't + * be found by another thread, so we can do the non-sleeping lock here.
+        */
+       if (lock_flags) {
+               if (!xfs_ilock_nowait(ip, lock_flags))
+                       BUG();
+       }
mask = ~(((XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog)) - 1);
        first_index = agino & mask;

