xfs
[Top] [All Lists]

Re: xfs readdir hang on for-next (3.15.0-rc1)

To: xfs@xxxxxxxxxxx
Subject: Re: xfs readdir hang on for-next (3.15.0-rc1)
From: "Michael L. Semon" <mlsemon35@xxxxxxxxx>
Date: Mon, 14 Apr 2014 17:47:30 -0400 (EDT)
Cc: Brian Foster <bfoster@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version:content-type; bh=1V7yQgxUwOXck0+YRp38ub9eHBJrtkp0Z0jF9b5M1X8=; b=yXUhxR5VtuYuNgKE21N44JTSMrPCCe7H5mJ+ZyVRsKoA5xeOlT2+4noaHcP87pRDAb T1kb67IHpK6ebt9b5n2VDH+D2OAQHHao4NqJiGnGLmeyAaXvcV2dR/vdIr9sqXF/Kct+ MhgGXK2kIkOqB27vi2fXWXnHzPweci4h68A4dIf+j0aNUQFIkjxl4GKXB9+G0nga6LRJ x0OPtOhBYh5YVLSRaWZ+5y/HpO1KrFzQ/2XxyHN0DZaiP15DcpQdTmxsSxndL6qX5l3Z kWw4kBT//7O3NexnDL4JFACnSJCmD4EqBoGUwR8Id2sX/BuTSGgEIwrsRg7FlG55cn4O sJIA==
In-reply-to: <20140414205737.GI26782@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <20140414164313.GA62307@xxxxxxxxxxxxxxx> <20140414190834.GB62307@xxxxxxxxxxxxxxx> <20140414205737.GI26782@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
User-agent: Alpine 2.11 (LNX 23 2013-08-11)
On Mon, 14 Apr 2014, Peter Zijlstra wrote:

> On Mon, Apr 14, 2014 at 03:08:36PM -0400, Brian Foster wrote:
> > On Mon, Apr 14, 2014 at 12:43:14PM -0400, Brian Foster wrote:
> > > Hi all,
> > > 
> > > This is a heads up that I'm seeing a blatant readdir hang on the current
> > > for-next with selinux enabled. To reproduce, I format a clean fs, mount
> > > and attempt an ls.
> > > 
> > > The problem does not occur with selinux disabled, if I back out the
> > > following commit:
> > > 
> > > 40194ecc6d78 xfs: reinstate the ilock in xfs_readdir
> > > 
> > > ... or if I remove the locking around xfs_attr_get(), so I suspect this
> > > is another instance of a recursive deadlock. I'm getting no output
> > > whatsoever in order to confirm this and it also leads to a complete
> > > system lockup. It's also interesting that this hasn't been observed
> > > until now, given the above commit was introduced in 3.14. So the above
> > > commit doesn't appear to be the most recent change that triggers this.
> > > 
> > > I reproduced on the latest linus tree and do not reproduce on 3.14, so
> > > I'm trying to do a bisect to find out what else might have changed to
> > > trigger this.
> > > 
> > 
> > This bisected down to:
> > 
> > commit 6f008e72cd111a119b5d8de8c5438d892aae99eb
> > Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > Date:   Wed Mar 12 13:24:42 2014 +0100
> > 
> >     locking/mutex: Fix debug checks
> > ...
> > 
> > ... which suggests something down in the mutex debug code. Indeed, the
> > problem no longer occurs if I disable kernel debug in my .config. What
> > is also interesting is that it didn't return when I reenable
> > DEBUG_KERNEL and DEBUG_MUTEXES alone. It does return when I start to
> > enable some of the other lock debugging options. FWIW, I also cleared
> > out my tree and rebuilt from scratch just to be sure that I didn't have
> > anything stale/broken lying around.
> > 
> > Peter,
> > 
> > Any insight on this? 
> 
> http://lkml.kernel.org/r/tip-a227960fe0cafcc229a8d6bb8b454a3a0b33719d@xxxxxxxxxxxxxx
> 
> That will make the kernel continue after the lockdep splat. I too see it
> on some of my XFS using machines.

It can happen on JFS, too, but my trusty "untar a system backup
until a splat happens" test barely worked for the merge-window
kernel.  Therefore, I used xfstests generic/113 on XFS (kernel +
xfs-oss/for-next) to cause this situation.  The patch above has been
through xfstests on both v4- and v5-superblock XFS, solving any new
lockdep issues down here on x86.

Sorry to not report it here, therefore costing you time in doing a
bisect.  The second lockdep splat I got after kernel 3.14 wasn't XFS,
and so I treated it as a non-XFS issue.

Good luck!

Michael

<Prev in Thread] Current Thread [Next in Thread>