Rambling noise #1: generic/230 can trigger kernel debug lock detector

Michael L. Semon mlsemon35 at gmail.com
Thu May 9 21:00:10 CDT 2013


On 05/09/2013 03:20 AM, Dave Chinner wrote:
> On Thu, May 09, 2013 at 01:16:46PM +1000, Dave Chinner wrote:
>> On Wed, May 08, 2013 at 10:24:25PM -0400, Michael L. Semon wrote:
>>> Hi!  I'm trying to come up with a series of ramblings that may or
>>> may not be useful in a mailing-list context, with the idea that one
>>> bug report might be good, the next might be me thinking aloud with
>>> data in hand because I know something's wrong but can't put my
>>> finger on it.  An ex-girlfriend saw the movie "Rain Man" years ago
>>> pointed to the screen and said, "Do you see that guy?  That's you!"
>>> If only I could be so smart...or act as well as Dustin Hoffman.  The
>>> noisy thinking is there, just not the brilliant insights...
>>>
>>> This report is to pass on a kernel lock detector message that might
>>> be reproducible under a certain family of tests.  generic/230 may
>>> not be at fault, it's just where the detector went off.
>>
>> No, there's definitely a bug there. Thanks for the report, Michael.
>> Try the patch below.
>
> Actaully, there's a bug in the error handling in that version - it
> fails to unlock the quotaoff lock properly on failure. The version
> below fixes that problem.
>
> Cheers,
>
> Dave.

OK, I'll try this version as well.  The first version seemed to work 
just fine.

The rest of the letter has been shortened from a more detailed review to 
a casual one with a scripted run of xfstests at the end.  If nothing 
else, it shows a) a dir3 function in a stack trace on a non-CRC 
filesystem, b) loggen might need to be installed in the `make install` 
version of xfstests, c) your previous-run-times feature has been 
restored, and d) a pwrite64 "file too large" may or may not be there 
because J. Liu's bitness patches are doing their jobs.

Good luck!

Michael

Original review (reduced):

As far as logical testing goes, your fix works well.  Without your 
patch, I got the original lockdep message, and the situation was 
replicated by issuing a `du -sx /` while xfstests xfs/054 was running. 
With your patch, all seems well for things that mention quotas.

The lock detector went off once more, but that was after attempting to 
merge a 2013-05-08 kernel git with xfs-oss/master of the same vintage, 
then apply my normal patches and your original quota lockdep fix.

Hardware change:  Because `git commit` caused XFS to emit the message 
"XFS: possible memory allocation deadlock in xfs_buf_allocate_memory 
(mode:0x250)" a few times, RAM was increased to 256 MB.  The merge was 
reworked by restoring the git from a backup tarball and doing the 
merge-and-patch session again.

Script started on Thu May  9 16:00:28 2013
cu -t -l /dev/ttyu0 -s 115200
Connected
# Note: lm256 is a local group
root at oldsvrhw:/var/lib/xfstests# ./check -xfs -g lm256
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS  -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch

xfs/003 1s ... 2s
xfs/004 8s ... 8s
xfs/008 5s ... 5s
xfs/009 8s ... 7s
xfs/012 16s ... 13s
xfs/016 76s ... 62s
xfs/017	 [failed, exit status 1] - output mismatch (see 
/var/lib/xfstests/results/xfs/017.out.bad)
     --- tests/xfs/017.out	2013-05-09 05:24:30.996641679 -0400
     +++ /var/lib/xfstests/results/xfs/017.out.bad	2013-05-09 
16:03:25.185045262 -0400
     @@ -3,8 +3,6 @@
      *** test
          *** test 0
          *** test 1
     -    *** test 2
     -    *** test 3
     -    *** test 4
     -*** done
      ...
      (Run 'diff -u tests/xfs/017.out 
/var/lib/xfstests/results/xfs/017.out.bad' to see the entire diff)
xfs/019	 8s
xfs/021 8s ... 7s
xfs/022	 [not run] No dump tape specified
xfs/023	 [not run] No dump tape specified
xfs/024	 [not run] No dump tape specified
xfs/025	 [not run] No dump tape specified
xfs/026	 22s
xfs/027 22s ... 22s
xfs/028	 31s
xfs/029	 11s
xfs/030	 65s
xfs/031 102s ... 99s
xfs/033	 49s
xfs/034 9s ... 9s
xfs/035	 [not run] No dump tape specified
xfs/036	 [not run] No dump tape specified
xfs/037	 [not run] No dump tape specified
xfs/038	 [not run] No dump tape specified
xfs/039	 [not run] No dump tape specified
xfs/040	 [not run] Can't run srcdiff without KWORKAREA set
xfs/041	 121s
xfs/042	 163s
xfs/043	 [not run] No dump tape specified
xfs/044	 [failed, exit status 1] - output mismatch (see 
/var/lib/xfstests/results/xfs/044.out.bad)
     --- tests/xfs/044.out	2013-05-09 05:24:32.493415709 -0400
     +++ /var/lib/xfstests/results/xfs/044.out.bad	2013-05-09 
16:16:33.963173264 -0400
     @@ -16,51 +16,18 @@
          *** mount (expect success)
          *** umount
      *** zero log
     -*** loggen
     -   *** zero block (1BB) x 100
     +./tests/xfs/044: line 138: /var/lib/xfstests/src/loggen: No such 
file or directory
          *** mount (expect success)
      ...
      (Run 'diff -u tests/xfs/044.out 
/var/lib/xfstests/results/xfs/044.out.bad' to see the entire diff)
xfs/045	 11s
xfs/046 20s ... 19s
xfs/047	 31s
xfs/048 2s ... 1s
xfs/049	 [not run] External device testing in progress, skipped this test
xfs/050	 49s
xfs/052 7s ... 8s
xfs/054 31s ... 28s
xfs/055	 [not run] No dump tape specified
xfs/056	 19s
xfs/057	 [not run] Place holder for IRIX test 057
xfs/058	 [not run] Place holder for IRIX test 058
xfs/059 32s ... 31s
xfs/060 31s ... 30s
xfs/061	 20s
xfs/063 21s ... 21s
xfs/064 77s ... 75s
xfs/065 53s ... 51s
xfs/066	 8s
xfs/067	 12s
xfs/071	 - output mismatch (see /var/lib/xfstests/results/xfs/071.out.bad)
     --- tests/xfs/071.out	2013-05-09 05:24:33.373478835 -0400
     +++ /var/lib/xfstests/results/xfs/071.out.bad	2013-05-09 
16:25:07.474096862 -0400
     @@ -40,14 +40,14 @@


      Writing 512 bytes, offset is +0 (direct=false)
     -wrote 512/512 bytes at offset <OFFSET>
     +pwrite64: File too large
      Reading 512 bytes (direct=false)
     -read 512/512 bytes at offset <OFFSET>
      ...
      (Run 'diff -u tests/xfs/071.out 
/var/lib/xfstests/results/xfs/071.out.bad' to see the entire diff)
xfs/072 8s ... 7s
xfs/073	 [not run] Cannot xfs_copy with external devices
xfs/078	^CRan: xfs/003 xfs/004 xfs/008 xfs/009 xfs/012 xfs/016 xfs/017 
xfs/019 xfs/021 xfs/026 xfs/027 xfs/028 xfs/029 xfs/030 xfs/031 xfs/033 
xfs/034 xfs/041 xfs/042 xfs/044 xfs/045 xfs/046 xfs/047 xfs/048 xfs/050 
xfs/052 xfs/054 xfs/056 xfs/059 xfs/060 xfs/061 xfs/063 xfs/064 xfs/065 
xfs/066 xfs/067 xfs/071 xfs/072
Not run: xfs/022 xfs/023 xfs/024 xfs/025 xfs/035 xfs/036 xfs/037 xfs/038 
xfs/039 xfs/040 xfs/043 xfs/049 xfs/055 xfs/057 xfs/058 xfs/073
Failures: xfs/017 xfs/044 xfs/071
Failed 3 of 38 tests
root at oldsvrhw:/var/lib/xfstests# ./check -xfs -g lm256
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS  -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch

xfs/003 2s ... 1s
xfs/004 8s ... 8s
xfs/008 5s ... 5s
xfs/009 7s ... 8s
xfs/012 13s ...[ 1851.323902]
[ 1851.325479] =================================
[ 1851.326551] [ INFO: inconsistent lock state ]
[ 1851.326551] 3.9.0+ #1 Not tainted
[ 1851.326551] ---------------------------------
[ 1851.326551] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
[ 1851.326551] kswapd0/18 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1851.326551]  (&(&ip->i_lock)->mr_lock){++++-+}, at: [<c11dcabf>] 
xfs_ilock+0x10f/0x190
[ 1851.326551] {RECLAIM_FS-ON-R} state was registered at:
[ 1851.326551]   [<c105e10a>] mark_held_locks+0x8a/0xf0
[ 1851.326551]   [<c105e69c>] lockdep_trace_alloc+0x5c/0xa0
[ 1851.326551]   [<c109c52c>] __alloc_pages_nodemask+0x7c/0x670
[ 1851.326551]   [<c10bfd8e>] new_slab+0x6e/0x2a0
[ 1851.326551]   [<c14083a9>] __slab_alloc.isra.59.constprop.67+0x1d3/0x40a
[ 1851.326551]   [<c10c12cd>] __kmalloc+0x10d/0x180
[ 1851.326551]   [<c1199b56>] kmem_alloc+0x56/0xd0
[ 1851.326551]   [<c1199be1>] kmem_zalloc+0x11/0xd0
[ 1851.326551]   [<c11c666e>] xfs_dabuf_map.isra.2.constprop.5+0x22e/0x520
[ 1851.326551]   [<c11c7064>] xfs_da_read_buf+0x54/0x280
[ 1851.326551]   [<c11cf2e3>] xfs_dir3_leaf_read.constprop.6+0x43/0x90
[ 1851.326551]   [<c11d1d75>] xfs_dir2_leaf_lookup_int+0x45/0x350
[ 1851.326551]   [<c11d20b0>] xfs_dir2_leaf_lookup+0x30/0x150
[ 1851.326551]   [<c11ca75f>] xfs_dir_lookup+0x13f/0x160
[ 1851.326551]   [<c1196d20>] xfs_lookup+0x50/0x120
[ 1851.326551]   [<c118fcc8>] xfs_vn_lookup+0x48/0x80
[ 1851.326551]   [<c10cc92e>] lookup_real+0x1e/0x50
[ 1851.326551]   [<c10d00fa>] do_last.isra.46+0x74a/0xb90
[ 1851.326551]   [<c10d05db>] path_openat.isra.47+0x9b/0x3e0
[ 1851.326551]   [<c10d0951>] do_filp_open+0x31/0x80
[ 1851.326551]   [<c10c9665>] open_exec+0x45/0x110
[ 1851.326551]   [<c10ca96a>] do_execve_common+0x1da/0x550
[ 1851.326551]   [<c10caddf>] sys_execve+0x2f/0x50
[ 1851.326551]   [<c140e1ff>] sysenter_do_call+0x12/0x36
[ 1851.326551] irq event stamp: 789831
[ 1851.326551] hardirqs last  enabled at (789831): [<c109e4fa>] 
clear_page_dirty_for_io+0xaa/0xe0
[ 1851.326551] hardirqs last disabled at (789830): [<c109e4d4>] 
clear_page_dirty_for_io+0x84/0xe0
[ 1851.326551] softirqs last  enabled at (789776): [<c102a90e>] 
__do_softirq+0x12e/0x1f0
[ 1851.326551] softirqs last disabled at (789769): [<c102ab1d>] 
irq_exit+0x6d/0x80
[ 1851.326551]
[ 1851.326551] other info that might help us debug this:
[ 1851.326551]  Possible unsafe locking scenario:
[ 1851.326551]
[ 1851.326551]        CPU0
[ 1851.326551]        ----
[ 1851.326551]   lock(&(&ip->i_lock)->mr_lock);
[ 1851.326551]   <Interrupt>
[ 1851.326551]     lock(&(&ip->i_lock)->mr_lock);
[ 1851.326551]
[ 1851.326551]  *** DEADLOCK ***
[ 1851.326551]
[ 1851.326551] 1 lock held by kswapd0/18:
[ 1851.326551]  #0:  (sb_internal){.+.+.?}, at: [<c11f1aa6>] 
xfs_trans_alloc+0x26/0x50
[ 1851.326551]
[ 1851.326551] stack backtrace:
[ 1851.326551] Pid: 18, comm: kswapd0 Not tainted 3.9.0+ #1
[ 1851.326551] Call Trace:
[ 1851.326551]  [<c1407048>] print_usage_bug+0x1dc/0x1e6
[ 1851.326551]  [<c105d5d0>] ? print_shortest_lock_dependencies+0x1b0/0x1b0
[ 1851.326551]  [<c105e05b>] mark_lock+0x26b/0x290
[ 1851.326551]  [<c105ff3f>] __lock_acquire+0x64f/0x1690
[ 1851.326551]  [<c1199e66>] ? kmem_zone_alloc+0x56/0xd0
[ 1851.326551]  [<c1190321>] ? xfs_vn_mknod+0x151/0x170
[ 1851.326551]  [<c11f3a3d>] ? xlog_space_left+0x2d/0xf0
[ 1851.326551]  [<c11f3a3d>] ? xlog_space_left+0x2d/0xf0
[ 1851.326551]  [<c1061580>] lock_acquire+0x80/0x100
[ 1851.326551]  [<c11dcabf>] ? xfs_ilock+0x10f/0x190
[ 1851.326551]  [<c1047184>] down_write_nested+0x54/0xa0
[ 1851.326551]  [<c11dcabf>] ? xfs_ilock+0x10f/0x190
[ 1851.326551]  [<c11dcabf>] xfs_ilock+0x10f/0x190
[ 1851.326551]  [<c118f212>] xfs_iomap_write_allocate+0x232/0x400
[ 1851.326551]  [<c105cce8>] ? lock_release_holdtime.part.27+0x88/0xe0
[ 1851.326551]  [<c11dcc4b>] ? xfs_ilock_nowait+0xbb/0x1e0
[ 1851.326551]  [<c1046f8b>] ? up_read+0x1b/0x30
[ 1851.326551]  [<c11dce1b>] ? xfs_iunlock+0xab/0x170
[ 1851.326551]  [<c117c948>] xfs_map_blocks+0x388/0x420
[ 1851.326551]  [<c117da45>] xfs_vm_writepage+0x355/0x6e0
[ 1851.326551]  [<c1050655>] ? sched_clock_cpu+0xa5/0x100
[ 1851.326551]  [<c105e30e>] ? trace_hardirqs_off_caller+0x1e/0xa0
[ 1851.326551]  [<c10a4567>] shrink_page_list+0x4c7/0x680
[ 1851.326551]  [<c10a4ba1>] shrink_inactive_list+0x181/0x2d0
[ 1851.326551]  [<c10a4e6a>] shrink_lruvec+0x17a/0x3b0
[ 1851.326551]  [<c1046f8b>] ? up_read+0x1b/0x30
[ 1851.326551]  [<c10a3901>] ? shrink_slab+0x1e1/0x2f0
[ 1851.326551]  [<c10a5a2e>] kswapd+0x30e/0x680
[ 1851.326551]  [<c10a5720>] ? try_to_free_pages+0x680/0x680
[ 1851.326551]  [<c10424d5>] kthread+0xa5/0xb0
[ 1851.326551]  [<c105cd1c>] ? lock_release_holdtime.part.27+0xbc/0xe0
[ 1851.326551]  [<c140e177>] ret_from_kernel_thread+0x1b/0x28
[ 1851.326551]  [<c1042430>] ? __kthread_parkme+0x70/0x70
  14s
[ 1859.123845] XFS (sdb5): Mounting Filesystem
[ 1859.439419] XFS (sdb5): Ending clean mount
xfs/016 62s ...[ 1862.733835] XFS (sdb6): Mounting Filesystem
[ 1863.049345] XFS (sdb6): Ending clean mount
[ 1866.823293] XFS (sdb6): Mounting Filesystem



More information about the xfs mailing list