xfs
[Top] [All Lists]

Re: [PATCH 0/6 v2] xfs: xfs_iflush_cluster vs xfs_reclaim_inode

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 0/6 v2] xfs: xfs_iflush_cluster vs xfs_reclaim_inode
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Mon, 11 Apr 2016 09:37:18 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20160408221706.GB567@dastard>
References: <1460072271-23923-1-git-send-email-david@xxxxxxxxxxxxx> <20160408171843.GC30614@xxxxxxxxxxxxxxx> <20160408221706.GB567@dastard>
User-agent: Mutt/1.5.24 (2015-08-30)
On Sat, Apr 09, 2016 at 08:17:06AM +1000, Dave Chinner wrote:
> On Fri, Apr 08, 2016 at 01:18:44PM -0400, Brian Foster wrote:
> > On Fri, Apr 08, 2016 at 09:37:45AM +1000, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > This is the second version of this patch set, first posted and
> > > described here:
> > > 
> > > http://oss.sgi.com/archives/xfs/2016-04/msg00069.html
> > > 
> > > The only change from the first version is splitting up the first
> > > patch into two as Christoph requested - one for the bug fix, the
> > > other for the variable renaming.
> > > 
> > 
> > Did your xfstests testing for this series include generic/233? I'm
> > seeing a consistently reproducible test hang. The test is hanging on a
> > "xfs_quota -x -c off -ug /mnt/scratch" command. The stack is as follows:
> > 
> > [<ffffffffa0772306>] xfs_qm_dquot_walk.isra.8+0x196/0x1b0 [xfs]
> > [<ffffffffa0774a98>] xfs_qm_dqpurge_all+0x78/0x80 [xfs]
> > [<ffffffffa07713e8>] xfs_qm_scall_quotaoff+0x148/0x640 [xfs]
> > [<ffffffffa077733d>] xfs_quota_disable+0x3d/0x50 [xfs]
> > [<ffffffff812c27e3>] SyS_quotactl+0x3b3/0x8c0
> > [<ffffffff81003e17>] do_syscall_64+0x67/0x190
> > [<ffffffff81763f7f>] return_from_SYSCALL_64+0x0/0x7a
> > [<ffffffffffffffff>] 0xffffffffffffffff
> > 
> > ... and it looks like the kernel is spinning somehow or another between
> > inode reclaim and xfsaild:
> > 
> >     ...
> >      kworker/1:2-210   [001] ...1   895.750591: xfs_perag_get_tag: dev 
> > 253:3 agno 1 refcount 1 caller xfs_reclaim_inodes_ag [xfs]
> >      kworker/1:2-210   [001] ...1   895.750609: xfs_perag_put: dev 253:3 
> > agno 1 refcount 0 caller xfs_reclaim_inodes_ag [xfs]
> >      kworker/1:2-210   [001] ...1   895.750609: xfs_perag_get_tag: dev 
> > 253:3 agno 2 refcount 5 caller xfs_reclaim_inodes_ag [xfs]
> >      kworker/1:2-210   [001] ...1   895.750611: xfs_perag_put: dev 253:3 
> > agno 2 refcount 4 caller xfs_reclaim_inodes_ag [xfs]
> >      kworker/1:2-210   [001] ...1   895.750612: xfs_perag_get_tag: dev 
> > 253:3 agno 3 refcount 1 caller xfs_reclaim_inodes_ag [xfs]
> >      kworker/1:2-210   [001] ...1   895.750613: xfs_perag_put: dev 253:3 
> > agno 3 refcount 0 caller xfs_reclaim_inodes_ag [xfs]
> >     xfsaild/dm-3-12406 [003] ...2   895.760588: xfs_ail_locked: dev 253:3 
> > lip 0xffff8801f8e65d80 lsn 2/5709 type XFS_LI_QUOTAOFF flags IN_AIL
> >     xfsaild/dm-3-12406 [003] ...2   895.810595: xfs_ail_locked: dev 253:3 
> > lip 0xffff8801f8e65d80 lsn 2/5709 type XFS_LI_QUOTAOFF flags IN_AIL
> >     xfsaild/dm-3-12406 [003] ...2   895.860586: xfs_ail_locked: dev 253:3 
> > lip 0xffff8801f8e65d80 lsn 2/5709 type XFS_LI_QUOTAOFF flags IN_AIL
> >     xfsaild/dm-3-12406 [003] ...2   895.910596: xfs_ail_locked: dev 253:3 
> > lip 0xffff8801f8e65d80 lsn 2/5709 type XFS_LI_QUOTAOFF flags IN_AIL
> >     ...
> 
> No deadlock involving the AIL - it doesn't remove the
> XFS_LI_QUOTAOFF from the AIL - the quota code committing the
> quotaoff-end transactions is what removes that. IOWs, the dquot walk
> has not completed, so quotaoff has not completed, so the
> XFS_LI_QUOTAOFF is still in the AIL.
> 
> IOWs, this looks like xfs_qm_dquot_walk() is skipping dquots because
> xfs_qm_dqpurge is hitting this:
> 
>         xfs_dqlock(dqp);
>         if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) {
>                 xfs_dqunlock(dqp);
>                 return -EAGAIN;
>         }
> 
> So that means we've got an inode that probably hasn't been
> reclaimed, because the last thing that happens during reclaim is the
> dquots are detatched from the inode and hence the reference counts
> are dropped.
> 
> > FWIW, this only occurs with patch 6 applied. The test and scratch
> > devices are both 10GB lvm volumes formatted with mkfs defaults (v5).
> 
> I can't see how patch 6 would prevent an inode from being reclaimed,
> as all the changes occur *after* the reclaim decision has been made.
> More investigation needed, I guess...
> 

The attached diff addresses the problem for me. Feel free to fold it
into the original patch.

The regression test I had running failed with an OOM over the weekend.
I hadn't seen that before, but then again I haven't seen this test run
to completion on this system either due to the original problem. I'll
restart it today with this hunk included.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

Attachment: __xfs_inode_clear_reclaim.diff
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>