xfs
[Top] [All Lists]

Rambling noise #1: generic/230 can trigger kernel debug lock detector

To: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Subject: Rambling noise #1: generic/230 can trigger kernel debug lock detector
From: "Michael L. Semon" <mlsemon35@xxxxxxxxx>
Date: Wed, 08 May 2013 22:24:25 -0400
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=Tn6qmF5J/4QS9QApNKUz+1VkrBgtv6XXTb/Ts+TG/Rw=; b=X7Txn2QVAozbQClT7hvARl3FlTlh0UM0Zb+BAg5VZBr4WY52en3qJazQDmuV6Hum3H 54mm7XErUm8TJVZIoMWlQdPWZXGCAhYkA4cQD029VpS4haIBX6/2tNCFu+Ls5Cp0laTn 9frQ7rWTaQduTkkoi/wEMiByEb2nO0u4hf1hpH4aan4RjCSjdoGfterHd9FTjT7IYqd0 SDzDpPjII3pyhCfv5tT2MHJLTOCxLOU2lWU5XE/KWl0H123oX13NSJ3tCvoWTzhJTLk2 1ghdw0o02DLfTUMBpf83gkFw/hwyiT50oiRfhxnkKRLSmlLxwwHETQ/+Nsms2L5sl2K5 Xvew==
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130328 Thunderbird/17.0.5
Hi! I'm trying to come up with a series of ramblings that may or may not be useful in a mailing-list context, with the idea that one bug report might be good, the next might be me thinking aloud with data in hand because I know something's wrong but can't put my finger on it. An ex-girlfriend saw the movie "Rain Man" years ago pointed to the screen and said, "Do you see that guy? That's you!" If only I could be so smart...or act as well as Dustin Hoffman. The noisy thinking is there, just not the brilliant insights...

This report is to pass on a kernel lock detector message that might be reproducible under a certain family of tests. generic/230 may not be at fault, it's just where the detector went off.

It seems like in the few times the detector has gone off lately, it does so at the same instant as I'm doing some very boring operation on a different partition at the same time, such as reloading a file in vi, or piping something to less to read it. Some folks have been working on tty stuff lately for the 3.8 kernels at least--making great improvements overall--but there seems to be no tty hints in this message.

The kernel, AFAIK, to be a git Linux with v3.9.0 + this weekend's xfs-oss checked out, with the following patches applied:

[PATCH v2] xfs: fix assertion failure in xfs_vm_write_failed()
[PATCH] xfs: fix s_max_bytes to MAX_LFS_FILESIZE if needed
[PATCH] xfs: don't return 0 if generic_segment_checks() find nothing

[PATCH 1/2] xfs: fix sub-page blocksize data integrity writes
[PATCH 2/2] xfs: fix rounding in xfs_free_file_space
[PATCH v3 1/2] xfs: Remove XFS_MOUNT_RETERR
[PATCH v3 2/2] xfs: Don't keep silent if sunit/swidth can not be changed via mount

There shouldn't be a need to apply these patches right away. I'm just providing context.

Computer is a Pentium 733 with memory lowered to 160 MB for low-memory testing. It uses the standard VGA console, which can contribute to such issues but not as much as using a DRM framebuffer console.

Thanks!

Michael

[Earlier tests are shown only to provide sequence.]

FSTYP         -- xfs (debug)
PLATFORM      -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS  -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch

xfs/168  [not run] Assuming DMAPI modules are not loaded
generic/053      10s
xfs/043  [not run] No dump tape specified
generic/099      [not run] not suitable for this OS: Linux
xfs/170  47s
xfs/116  3s
generic/020      29s
xfs/175  [not run] Assuming DMAPI modules are not loaded
xfs/066  8s
xfs/037  [not run] No dump tape specified
xfs/292  - output mismatch (see /var/lib/xfstests/results/xfs/292.out.bad)
    --- tests/xfs/292.out       2013-05-08 12:40:14.635752692 -0400
+++ /var/lib/xfstests/results/xfs/292.out.bad 2013-05-08 16:35:33.894218930 -0400
    @@ -1,5 +1,5 @@
     QA output created by 292
     mkfs.xfs without geometry
    -meta-data=FILENAME   isize=256    agcount=4, agsize=16777216 blks
    +meta-data=FILENAME isize=256    agcount=4, agsize=16777216 blks
     mkfs.xfs with cmdline geometry
    -meta-data=FILENAME   isize=256    agcount=16, agsize=4194304 blks
    +meta-data=FILENAME isize=256    agcount=16, agsize=4194304 blks
     ...
(Run 'diff -u tests/xfs/292.out /var/lib/xfstests/results/xfs/292.out.bad' to see the entire diff)
xfs/086  195s
xfs/293  16s
generic/308      2s
xfs/095  [not run] not suitable for this OS: Linux
xfs/096  28s
xfs/022  [not run] No dump tape specified
generic/260      [not run] FITRIM not supported on /dev/sdb6
generic/247      101s
generic/235 - output mismatch (see /var/lib/xfstests/results/generic/235.out.bad)
    --- tests/generic/235.out   2013-05-08 12:39:55.017626952 -0400
+++ /var/lib/xfstests/results/generic/235.out.bad 2013-05-08 16:42:10.527639188 -0400
    @@ -15,7 +15,7 @@
     fsgqa     --       0       0       0              1     0     0


    -touch: cannot touch `SCRATCH_MNT/failed': Read-only file system
    +touch: cannot touch 'SCRATCH_MNT/failed': Read-only file system
     *** Report for user quotas on device SCRATCH_DEV
     Block grace time: 7days; Inode grace time: 7days
     ...
(Run 'diff -u tests/generic/235.out /var/lib/xfstests/results/generic/235.out.bad' to see the entire diff)
xfs/072  7s
xfs/180  441s
xfs/283  25s
xfs/048  1s
generic/076      8s
generic/236      3s
generic/230
=============================================
[ INFO: possible recursive locking detected ]
3.9.0+ #3 Not tainted
---------------------------------------------
setquota/28368 is trying to acquire lock:
 (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50

but task is already holding lock:
 (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(sb_internal);
  lock(sb_internal);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by setquota/28368:
#0: (&type->s_umount_key#20){++++.+}, at: [<c10c660a>] get_super+0x7a/0xc0
 #1:  (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50
#2: (&qinf->qi_quotaofflock){+.+...}, at: [<c11fa44a>] xfs_qm_scall_setqlim+0x9a/0x690

stack backtrace:
CPU: 0 PID: 28368 Comm: setquota Not tainted 3.9.0+ #3
Hardware name: Dell Computer Corporation L733r /CA810E , BIOS A14 09/05/2001
 c6456ca0 c6456ca0 c8f83cc8 c13fe5bd c8f83d40 c1060ee0 c14d241d c6456ad4
 00006ed0 000003eb c196a618 c6456cf0 00000004 00000000 0001f60c c177c801
 c19b033d 00000000 f089e33c 00000000 c6456930 4596f1d4 000003eb 00000000
Call Trace:
 [<c13fe5bd>] dump_stack+0x16/0x18
 [<c1060ee0>] __lock_acquire+0x17b0/0x17f0
 [<c105dfae>] ? trace_hardirqs_off_caller+0x1e/0xc0
 [<c104f795>] ? sched_clock_cpu+0xa5/0x100
 [<c1061580>] lock_acquire+0x80/0x100
 [<c11e8846>] ? xfs_trans_alloc+0x26/0x50
 [<c10c737d>] __sb_start_write+0xad/0x1b0
 [<c11e8846>] ? xfs_trans_alloc+0x26/0x50
 [<c11e8846>] ? xfs_trans_alloc+0x26/0x50
 [<c105df8b>] ? trace_hardirqs_on+0xb/0x10
 [<c11e8846>] xfs_trans_alloc+0x26/0x50
 [<c11f75ad>] xfs_qm_dqread+0xcd/0x360
 [<c11f7b82>] xfs_qm_dqget+0x342/0x520
 [<c11fa469>] xfs_qm_scall_setqlim+0xb9/0x690
 [<c10b45ea>] ? might_fault+0x4a/0xa0
 [<c10b4634>] ? might_fault+0x94/0xa0
 [<c11ff8b4>] xfs_fs_set_dqblk+0x54/0xa0
 [<c110fbf6>] quota_setxquota+0x76/0xc0
 [<c1110233>] SyS_quotactl+0x513/0x5a0
 [<c10c8834>] ? SyS_stat64+0x34/0x40
 [<c1403df2>] ? sysenter_exit+0xf/0x1d
 [<c105deb4>] ? trace_hardirqs_on_caller+0xf4/0x1c0
 [<c1403dbf>] sysenter_do_call+0x12/0x36
XFS (sdb6): Mounting Filesystem
XFS (sdb6): Ending clean mount
XFS (sdb6): Mounting Filesystem
XFS (sdb6): Ending clean mount
XFS (sdb6): Quotacheck needed: Please wait.
XFS (sdb6): Quotacheck: Done.
 - output mismatch (see /var/lib/xfstests/results/generic/230.out.bad)
    --- tests/generic/230.out   2013-05-08 12:39:54.827612822 -0400
+++ /var/lib/xfstests/results/generic/230.out.bad 2013-05-08 16:51:08.063301955 -0400
    @@ -12,9 +12,9 @@
     pwrite64: Disk quota exceeded
     Touch 3+4
     Touch 5+6
    -touch: cannot touch `SCRATCH_MNT/file6': Disk quota exceeded
    +touch: cannot touch 'SCRATCH_MNT/file6': Disk quota exceeded
     Touch 5
    -touch: cannot touch `SCRATCH_MNT/file5': Disk quota exceeded
     ...
(Run 'diff -u tests/generic/230.out /var/lib/xfstests/results/generic/230.out.bad' to see the entire diff)
XFS (sdb5): Mounting Filesystem
XFS (sdb5): Ending clean mount
xfs/155

<Prev in Thread] Current Thread [Next in Thread>