Rambling noise #1: generic/230 can trigger kernel debug lock detector
Michael L. Semon
mlsemon35 at gmail.com
Thu May 9 21:00:10 CDT 2013
On 05/09/2013 03:20 AM, Dave Chinner wrote:
> On Thu, May 09, 2013 at 01:16:46PM +1000, Dave Chinner wrote:
>> On Wed, May 08, 2013 at 10:24:25PM -0400, Michael L. Semon wrote:
>>> Hi! I'm trying to come up with a series of ramblings that may or
>>> may not be useful in a mailing-list context, with the idea that one
>>> bug report might be good, the next might be me thinking aloud with
>>> data in hand because I know something's wrong but can't put my
>>> finger on it. An ex-girlfriend saw the movie "Rain Man" years ago
>>> pointed to the screen and said, "Do you see that guy? That's you!"
>>> If only I could be so smart...or act as well as Dustin Hoffman. The
>>> noisy thinking is there, just not the brilliant insights...
>>>
>>> This report is to pass on a kernel lock detector message that might
>>> be reproducible under a certain family of tests. generic/230 may
>>> not be at fault, it's just where the detector went off.
>>
>> No, there's definitely a bug there. Thanks for the report, Michael.
>> Try the patch below.
>
> Actaully, there's a bug in the error handling in that version - it
> fails to unlock the quotaoff lock properly on failure. The version
> below fixes that problem.
>
> Cheers,
>
> Dave.
OK, I'll try this version as well. The first version seemed to work
just fine.
The rest of the letter has been shortened from a more detailed review to
a casual one with a scripted run of xfstests at the end. If nothing
else, it shows a) a dir3 function in a stack trace on a non-CRC
filesystem, b) loggen might need to be installed in the `make install`
version of xfstests, c) your previous-run-times feature has been
restored, and d) a pwrite64 "file too large" may or may not be there
because J. Liu's bitness patches are doing their jobs.
Good luck!
Michael
Original review (reduced):
As far as logical testing goes, your fix works well. Without your
patch, I got the original lockdep message, and the situation was
replicated by issuing a `du -sx /` while xfstests xfs/054 was running.
With your patch, all seems well for things that mention quotas.
The lock detector went off once more, but that was after attempting to
merge a 2013-05-08 kernel git with xfs-oss/master of the same vintage,
then apply my normal patches and your original quota lockdep fix.
Hardware change: Because `git commit` caused XFS to emit the message
"XFS: possible memory allocation deadlock in xfs_buf_allocate_memory
(mode:0x250)" a few times, RAM was increased to 256 MB. The merge was
reworked by restoring the git from a backup tarball and doing the
merge-and-patch session again.
Script started on Thu May 9 16:00:28 2013
cu -t -l /dev/ttyu0 -s 115200
Connected
# Note: lm256 is a local group
root at oldsvrhw:/var/lib/xfstests# ./check -xfs -g lm256
FSTYP -- xfs (debug)
PLATFORM -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch
xfs/003 1s ... 2s
xfs/004 8s ... 8s
xfs/008 5s ... 5s
xfs/009 8s ... 7s
xfs/012 16s ... 13s
xfs/016 76s ... 62s
xfs/017 [failed, exit status 1] - output mismatch (see
/var/lib/xfstests/results/xfs/017.out.bad)
--- tests/xfs/017.out 2013-05-09 05:24:30.996641679 -0400
+++ /var/lib/xfstests/results/xfs/017.out.bad 2013-05-09
16:03:25.185045262 -0400
@@ -3,8 +3,6 @@
*** test
*** test 0
*** test 1
- *** test 2
- *** test 3
- *** test 4
-*** done
...
(Run 'diff -u tests/xfs/017.out
/var/lib/xfstests/results/xfs/017.out.bad' to see the entire diff)
xfs/019 8s
xfs/021 8s ... 7s
xfs/022 [not run] No dump tape specified
xfs/023 [not run] No dump tape specified
xfs/024 [not run] No dump tape specified
xfs/025 [not run] No dump tape specified
xfs/026 22s
xfs/027 22s ... 22s
xfs/028 31s
xfs/029 11s
xfs/030 65s
xfs/031 102s ... 99s
xfs/033 49s
xfs/034 9s ... 9s
xfs/035 [not run] No dump tape specified
xfs/036 [not run] No dump tape specified
xfs/037 [not run] No dump tape specified
xfs/038 [not run] No dump tape specified
xfs/039 [not run] No dump tape specified
xfs/040 [not run] Can't run srcdiff without KWORKAREA set
xfs/041 121s
xfs/042 163s
xfs/043 [not run] No dump tape specified
xfs/044 [failed, exit status 1] - output mismatch (see
/var/lib/xfstests/results/xfs/044.out.bad)
--- tests/xfs/044.out 2013-05-09 05:24:32.493415709 -0400
+++ /var/lib/xfstests/results/xfs/044.out.bad 2013-05-09
16:16:33.963173264 -0400
@@ -16,51 +16,18 @@
*** mount (expect success)
*** umount
*** zero log
-*** loggen
- *** zero block (1BB) x 100
+./tests/xfs/044: line 138: /var/lib/xfstests/src/loggen: No such
file or directory
*** mount (expect success)
...
(Run 'diff -u tests/xfs/044.out
/var/lib/xfstests/results/xfs/044.out.bad' to see the entire diff)
xfs/045 11s
xfs/046 20s ... 19s
xfs/047 31s
xfs/048 2s ... 1s
xfs/049 [not run] External device testing in progress, skipped this test
xfs/050 49s
xfs/052 7s ... 8s
xfs/054 31s ... 28s
xfs/055 [not run] No dump tape specified
xfs/056 19s
xfs/057 [not run] Place holder for IRIX test 057
xfs/058 [not run] Place holder for IRIX test 058
xfs/059 32s ... 31s
xfs/060 31s ... 30s
xfs/061 20s
xfs/063 21s ... 21s
xfs/064 77s ... 75s
xfs/065 53s ... 51s
xfs/066 8s
xfs/067 12s
xfs/071 - output mismatch (see /var/lib/xfstests/results/xfs/071.out.bad)
--- tests/xfs/071.out 2013-05-09 05:24:33.373478835 -0400
+++ /var/lib/xfstests/results/xfs/071.out.bad 2013-05-09
16:25:07.474096862 -0400
@@ -40,14 +40,14 @@
Writing 512 bytes, offset is +0 (direct=false)
-wrote 512/512 bytes at offset <OFFSET>
+pwrite64: File too large
Reading 512 bytes (direct=false)
-read 512/512 bytes at offset <OFFSET>
...
(Run 'diff -u tests/xfs/071.out
/var/lib/xfstests/results/xfs/071.out.bad' to see the entire diff)
xfs/072 8s ... 7s
xfs/073 [not run] Cannot xfs_copy with external devices
xfs/078 ^CRan: xfs/003 xfs/004 xfs/008 xfs/009 xfs/012 xfs/016 xfs/017
xfs/019 xfs/021 xfs/026 xfs/027 xfs/028 xfs/029 xfs/030 xfs/031 xfs/033
xfs/034 xfs/041 xfs/042 xfs/044 xfs/045 xfs/046 xfs/047 xfs/048 xfs/050
xfs/052 xfs/054 xfs/056 xfs/059 xfs/060 xfs/061 xfs/063 xfs/064 xfs/065
xfs/066 xfs/067 xfs/071 xfs/072
Not run: xfs/022 xfs/023 xfs/024 xfs/025 xfs/035 xfs/036 xfs/037 xfs/038
xfs/039 xfs/040 xfs/043 xfs/049 xfs/055 xfs/057 xfs/058 xfs/073
Failures: xfs/017 xfs/044 xfs/071
Failed 3 of 38 tests
root at oldsvrhw:/var/lib/xfstests# ./check -xfs -g lm256
FSTYP -- xfs (debug)
PLATFORM -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch
xfs/003 2s ... 1s
xfs/004 8s ... 8s
xfs/008 5s ... 5s
xfs/009 7s ... 8s
xfs/012 13s ...[ 1851.323902]
[ 1851.325479] =================================
[ 1851.326551] [ INFO: inconsistent lock state ]
[ 1851.326551] 3.9.0+ #1 Not tainted
[ 1851.326551] ---------------------------------
[ 1851.326551] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
[ 1851.326551] kswapd0/18 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1851.326551] (&(&ip->i_lock)->mr_lock){++++-+}, at: [<c11dcabf>]
xfs_ilock+0x10f/0x190
[ 1851.326551] {RECLAIM_FS-ON-R} state was registered at:
[ 1851.326551] [<c105e10a>] mark_held_locks+0x8a/0xf0
[ 1851.326551] [<c105e69c>] lockdep_trace_alloc+0x5c/0xa0
[ 1851.326551] [<c109c52c>] __alloc_pages_nodemask+0x7c/0x670
[ 1851.326551] [<c10bfd8e>] new_slab+0x6e/0x2a0
[ 1851.326551] [<c14083a9>] __slab_alloc.isra.59.constprop.67+0x1d3/0x40a
[ 1851.326551] [<c10c12cd>] __kmalloc+0x10d/0x180
[ 1851.326551] [<c1199b56>] kmem_alloc+0x56/0xd0
[ 1851.326551] [<c1199be1>] kmem_zalloc+0x11/0xd0
[ 1851.326551] [<c11c666e>] xfs_dabuf_map.isra.2.constprop.5+0x22e/0x520
[ 1851.326551] [<c11c7064>] xfs_da_read_buf+0x54/0x280
[ 1851.326551] [<c11cf2e3>] xfs_dir3_leaf_read.constprop.6+0x43/0x90
[ 1851.326551] [<c11d1d75>] xfs_dir2_leaf_lookup_int+0x45/0x350
[ 1851.326551] [<c11d20b0>] xfs_dir2_leaf_lookup+0x30/0x150
[ 1851.326551] [<c11ca75f>] xfs_dir_lookup+0x13f/0x160
[ 1851.326551] [<c1196d20>] xfs_lookup+0x50/0x120
[ 1851.326551] [<c118fcc8>] xfs_vn_lookup+0x48/0x80
[ 1851.326551] [<c10cc92e>] lookup_real+0x1e/0x50
[ 1851.326551] [<c10d00fa>] do_last.isra.46+0x74a/0xb90
[ 1851.326551] [<c10d05db>] path_openat.isra.47+0x9b/0x3e0
[ 1851.326551] [<c10d0951>] do_filp_open+0x31/0x80
[ 1851.326551] [<c10c9665>] open_exec+0x45/0x110
[ 1851.326551] [<c10ca96a>] do_execve_common+0x1da/0x550
[ 1851.326551] [<c10caddf>] sys_execve+0x2f/0x50
[ 1851.326551] [<c140e1ff>] sysenter_do_call+0x12/0x36
[ 1851.326551] irq event stamp: 789831
[ 1851.326551] hardirqs last enabled at (789831): [<c109e4fa>]
clear_page_dirty_for_io+0xaa/0xe0
[ 1851.326551] hardirqs last disabled at (789830): [<c109e4d4>]
clear_page_dirty_for_io+0x84/0xe0
[ 1851.326551] softirqs last enabled at (789776): [<c102a90e>]
__do_softirq+0x12e/0x1f0
[ 1851.326551] softirqs last disabled at (789769): [<c102ab1d>]
irq_exit+0x6d/0x80
[ 1851.326551]
[ 1851.326551] other info that might help us debug this:
[ 1851.326551] Possible unsafe locking scenario:
[ 1851.326551]
[ 1851.326551] CPU0
[ 1851.326551] ----
[ 1851.326551] lock(&(&ip->i_lock)->mr_lock);
[ 1851.326551] <Interrupt>
[ 1851.326551] lock(&(&ip->i_lock)->mr_lock);
[ 1851.326551]
[ 1851.326551] *** DEADLOCK ***
[ 1851.326551]
[ 1851.326551] 1 lock held by kswapd0/18:
[ 1851.326551] #0: (sb_internal){.+.+.?}, at: [<c11f1aa6>]
xfs_trans_alloc+0x26/0x50
[ 1851.326551]
[ 1851.326551] stack backtrace:
[ 1851.326551] Pid: 18, comm: kswapd0 Not tainted 3.9.0+ #1
[ 1851.326551] Call Trace:
[ 1851.326551] [<c1407048>] print_usage_bug+0x1dc/0x1e6
[ 1851.326551] [<c105d5d0>] ? print_shortest_lock_dependencies+0x1b0/0x1b0
[ 1851.326551] [<c105e05b>] mark_lock+0x26b/0x290
[ 1851.326551] [<c105ff3f>] __lock_acquire+0x64f/0x1690
[ 1851.326551] [<c1199e66>] ? kmem_zone_alloc+0x56/0xd0
[ 1851.326551] [<c1190321>] ? xfs_vn_mknod+0x151/0x170
[ 1851.326551] [<c11f3a3d>] ? xlog_space_left+0x2d/0xf0
[ 1851.326551] [<c11f3a3d>] ? xlog_space_left+0x2d/0xf0
[ 1851.326551] [<c1061580>] lock_acquire+0x80/0x100
[ 1851.326551] [<c11dcabf>] ? xfs_ilock+0x10f/0x190
[ 1851.326551] [<c1047184>] down_write_nested+0x54/0xa0
[ 1851.326551] [<c11dcabf>] ? xfs_ilock+0x10f/0x190
[ 1851.326551] [<c11dcabf>] xfs_ilock+0x10f/0x190
[ 1851.326551] [<c118f212>] xfs_iomap_write_allocate+0x232/0x400
[ 1851.326551] [<c105cce8>] ? lock_release_holdtime.part.27+0x88/0xe0
[ 1851.326551] [<c11dcc4b>] ? xfs_ilock_nowait+0xbb/0x1e0
[ 1851.326551] [<c1046f8b>] ? up_read+0x1b/0x30
[ 1851.326551] [<c11dce1b>] ? xfs_iunlock+0xab/0x170
[ 1851.326551] [<c117c948>] xfs_map_blocks+0x388/0x420
[ 1851.326551] [<c117da45>] xfs_vm_writepage+0x355/0x6e0
[ 1851.326551] [<c1050655>] ? sched_clock_cpu+0xa5/0x100
[ 1851.326551] [<c105e30e>] ? trace_hardirqs_off_caller+0x1e/0xa0
[ 1851.326551] [<c10a4567>] shrink_page_list+0x4c7/0x680
[ 1851.326551] [<c10a4ba1>] shrink_inactive_list+0x181/0x2d0
[ 1851.326551] [<c10a4e6a>] shrink_lruvec+0x17a/0x3b0
[ 1851.326551] [<c1046f8b>] ? up_read+0x1b/0x30
[ 1851.326551] [<c10a3901>] ? shrink_slab+0x1e1/0x2f0
[ 1851.326551] [<c10a5a2e>] kswapd+0x30e/0x680
[ 1851.326551] [<c10a5720>] ? try_to_free_pages+0x680/0x680
[ 1851.326551] [<c10424d5>] kthread+0xa5/0xb0
[ 1851.326551] [<c105cd1c>] ? lock_release_holdtime.part.27+0xbc/0xe0
[ 1851.326551] [<c140e177>] ret_from_kernel_thread+0x1b/0x28
[ 1851.326551] [<c1042430>] ? __kthread_parkme+0x70/0x70
14s
[ 1859.123845] XFS (sdb5): Mounting Filesystem
[ 1859.439419] XFS (sdb5): Ending clean mount
xfs/016 62s ...[ 1862.733835] XFS (sdb6): Mounting Filesystem
[ 1863.049345] XFS (sdb6): Ending clean mount
[ 1866.823293] XFS (sdb6): Mounting Filesystem
More information about the xfs
mailing list