xfs
[Top] [All Lists]

Re: [PATCH 3/4] xfstests: _check_quota_usage needs to unmount to get XFS

To: xfs@xxxxxxxxxxx
Subject: Re: [PATCH 3/4] xfstests: _check_quota_usage needs to unmount to get XFS quotacheck
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 27 Jul 2012 08:55:04 +1000
In-reply-to: <1343291706-14882-4-git-send-email-david@xxxxxxxxxxxxx>
References: <1343291706-14882-1-git-send-email-david@xxxxxxxxxxxxx> <1343291706-14882-4-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Jul 26, 2012 at 06:35:05PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Remount won't run a quota check - it's only done during mount. Hence
> all quota tests using this check function are not actually
> validating XFS filesystems right now.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>

FWIW, this change is exposing some problems in the new dquot code:

> ---
>  common.quota |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/common.quota b/common.quota
> index 9736306..2fa784b 100644
> --- a/common.quota
> +++ b/common.quota
> @@ -236,6 +236,11 @@ _check_quota_usage()
>  {
>       # Sync to get delalloc to disk
>       sync
> +
> +     # kill caches to guarantee removal speculative delalloc
> +     # XXX: really need an ioctl instead of this big hammer
> +     echo 3 > /proc/sys/vm/drop_caches
> +

Some kind of locking issue is present:

[ 1871.738970] XFS (vdb): Quotacheck: Done.
[ 1877.795774] ------------[ cut here ]------------
[ 1877.797347] WARNING: at kernel/mutex-debug.c:78 
debug_mutex_unlock+0xda/0xe0()
[ 1877.799416] Hardware name: Bochs
[ 1877.799416] Modules linked in:
[ 1877.799416] Pid: 2261, comm: 232 Not tainted 3.5.0-rc5-dgc+ #313
[ 1877.799416] Call Trace:
[ 1877.799416]  [<ffffffff8107a83f>] warn_slowpath_common+0x7f/0xc0
[ 1877.799416]  [<ffffffff8107a89a>] warn_slowpath_null+0x1a/0x20
[ 1877.799416]  [<ffffffff810d022a>] debug_mutex_unlock+0xda/0xe0
[ 1877.799416]  [<ffffffff81b4c97c>] __mutex_unlock_slowpath+0x7c/0x130
[ 1877.799416]  [<ffffffff81b4ca3e>] mutex_unlock+0xe/0x10
[ 1877.799416]  [<ffffffff814b12d8>] xfs_qm_dqreclaim_one+0x178/0x3d0
[ 1877.799416]  [<ffffffff814b1620>] xfs_qm_shake+0xf0/0x170
[ 1877.799416]  [<ffffffff81137789>] shrink_slab+0x169/0x350
[ 1877.799416]  [<ffffffff81709b04>] ? do_raw_spin_lock+0x54/0x120
[ 1877.799416]  [<ffffffff8118a488>] ? iput+0x48/0x210
[ 1877.799416]  [<ffffffff8119b433>] drop_caches_sysctl_handler+0x73/0xa0
[ 1877.799416]  [<ffffffff811de863>] proc_sys_call_handler.isra.11+0xb3/0xd0
[ 1877.799416]  [<ffffffff811de898>] proc_sys_write+0x18/0x20
[ 1877.799416]  [<ffffffff81170298>] vfs_write+0xa8/0x160
[ 1877.799416]  [<ffffffff8117058a>] sys_write+0x4a/0x90
[ 1877.799416]  [<ffffffff81b57269>] system_call_fastpath+0x16/0x1b
[ 1877.799416] ---[ end trace 4f2a89b2cbd5e64f ]---

which is:

        DEBUG_LOCKS_WARN_ON(lock->owner != current);

so something other than the task that locked the mutex unlocked it,
or we are unlocking an unlocked dquot...

>       VFS_QUOTA=0
>       case $FSTYP in
>       ext2|ext3|ext4|ext4dev|reiserfs)
> @@ -253,8 +258,9 @@ _check_quota_usage()
>               quotacheck -u -g $SCRATCH_MNT 2>/dev/null
>       else
>               # use XFS method to force quotacheck
> -             mount -o remount,noquota $SCRATCH_DEV
> -             mount -o remount,usrquota,grpquota $SCRATCH_DEV
> +             xfs_quota -x -c "off -ug" $SCRATCH_MNT

And this is hanging with what appears to be a reference counting bug
when purging dquots in generic/233:

# echo w > /proc/sysrq-trigger 
[53710.206100] SysRq : Show Blocked State
[53710.207213]   task                        PC stack   pid father
[53710.208749] xfs_quota       D ffff88003fc12880  3896 18147  17936 0x00000000
[53710.209738]  ffff88000f3afc18 0000000000000086 ffff88001cb160c0 
ffff88000f3affd8
[53710.209738]  ffff88000f3affd8 ffff88000f3affd8 ffffffff81f9b420 
ffff88001cb160c0
[53710.209738]  ffff88000f3afc08 ffffffff821ece80 ffff88000f3afc50 
0000000100cbbe68
[53710.209738] Call Trace:
[53710.209738]  [<ffffffff81b4dea9>] schedule+0x29/0x70
[53710.209738]  [<ffffffff81b4bcad>] schedule_timeout+0x13d/0x2c0
[53710.209738]  [<ffffffff81089f90>] ? usleep_range+0x50/0x50
[53710.209738]  [<ffffffff814aea90>] ? xfs_qm_need_dqattach+0x70/0x70
[53710.209738]  [<ffffffff81b4be4e>] schedule_timeout_uninterruptible+0x1e/0x20
[53710.209738]  [<ffffffff814aeef3>] xfs_qm_dquot_walk+0x153/0x170
[53710.209738]  [<ffffffff816fb81b>] ? radix_tree_lookup+0xb/0x10
[53710.209738]  [<ffffffff8149772a>] ? xfs_perag_get+0x3a/0x120
[53710.209738]  [<ffffffff814ace60>] ? xfs_trans_free_dqinfo+0x40/0x40
[53710.209738]  [<ffffffff81448aef>] ? xfs_inode_ag_iterator+0x8f/0xa0
[53710.209738]  [<ffffffff814aef93>] xfs_qm_dqpurge_all+0x83/0x90
[53710.209738]  [<ffffffff814ae4b9>] xfs_qm_scall_quotaoff+0x139/0x350
[53710.209738]  [<ffffffff814b2780>] xfs_fs_set_xstate+0xd0/0xf0
[53710.209738]  [<ffffffff811d1088>] sys_quotactl+0x1f8/0x740
[53710.209738]  [<ffffffff81174d7a>] ? sys_newstat+0x2a/0x40
[53710.209738]  [<ffffffff81b52635>] ? do_async_page_fault+0x35/0x90
[53710.209738]  [<ffffffff81b57269>] system_call_fastpath+0x16/0x1b

It's hitting a dquot that either has the FREEING flag set of an
elevated reference count, so is skipping it. It gets stuck in the
loop forever retrying. That's probably related to the above lock
issue.

And generic/231 fails with a significant accounting difference:

generic/231      [failed, exit status 1] - output mismatch (see 
tests/generic/231.out.bad)
--- tests/generic/231.out       2012-07-26 18:42:30.000000000 +1000
+++ results/generic/231.out.bad 2012-07-27 08:24:22.000000000 +1000
@@ -2,15 +2,7 @@
 === FSX Standard Mode, Memory Mapping, 1 Tasks ===
 All operations completed A-OK!
 Comparing user usage
-Comparing group usage
-=== FSX Standard Mode, Memory Mapping, 4 Tasks ===
-All operations completed A-OK!
-All operations completed A-OK!
-All operations completed A-OK!
-All operations completed A-OK!
-Comparing user usage
-Comparing group usage
-=== FSX Standard Mode, Memory Mapping, 1 Tasks ===
-All operations completed A-OK!
-Comparing user usage
-Comparing group usage
+4c4
+< #1001     --     524       0       0              3     0     0
+---
+> #1001     --     316       0       0              3     0     0

generic/270 and generic/233 give a similar mismatch when they don't
hang.

So, yeah, we haven't been verifying the quota accounting code as
well as we should have been for some time now....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>