xfs
[Top] [All Lists]

Re: Rambling noise #1: generic/230 can trigger kernel debug lock detecto

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Rambling noise #1: generic/230 can trigger kernel debug lock detector
From: "Michael L. Semon" <mlsemon35@xxxxxxxxx>
Date: Thu, 09 May 2013 22:00:10 -0400
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=RI6d5Db0P7WKJmEmHXtY+b46sU4xCsC/is/X7cRhwvM=; b=eoHCjqNK+s6YY0eIT7ZM1F71tvO6XP9S4NVgrwXissb9Qmd7cKp/EkPait/AG08fFD A8U8+7kaHXLBdranQlh4gjJXDrDFuOtV90q9RngDN6Tg1aIYSFjLt5QYQfK9/C3zrI6v aWvLlQavilby8rck7/hCYdDd1ZSEBUZW3D6/I7aWTIv7fOHjP54jCJHWBFEtGvaSFfgL AmY5L72CJrWgUPR/+9+hgUtG88koyTOoZWE5q4ujFgVWVoNW0Ik3j3MN22Sc+119DEAT e144mjxzVH/NZsBSHNHYP7NYE2wCmvh3gY4U82IFLowgPln3nxnny8bKf1hVOQUNV0VA sLrA==
In-reply-to: <20130509072045.GO24635@dastard>
References: <518B08D9.1060906@xxxxxxxxx> <20130509031646.GN24635@dastard> <20130509072045.GO24635@dastard>
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130328 Thunderbird/17.0.5
On 05/09/2013 03:20 AM, Dave Chinner wrote:
On Thu, May 09, 2013 at 01:16:46PM +1000, Dave Chinner wrote:
On Wed, May 08, 2013 at 10:24:25PM -0400, Michael L. Semon wrote:
Hi!  I'm trying to come up with a series of ramblings that may or
may not be useful in a mailing-list context, with the idea that one
bug report might be good, the next might be me thinking aloud with
data in hand because I know something's wrong but can't put my
finger on it.  An ex-girlfriend saw the movie "Rain Man" years ago
pointed to the screen and said, "Do you see that guy?  That's you!"
If only I could be so smart...or act as well as Dustin Hoffman.  The
noisy thinking is there, just not the brilliant insights...

This report is to pass on a kernel lock detector message that might
be reproducible under a certain family of tests.  generic/230 may
not be at fault, it's just where the detector went off.

No, there's definitely a bug there. Thanks for the report, Michael.
Try the patch below.

Actaully, there's a bug in the error handling in that version - it
fails to unlock the quotaoff lock properly on failure. The version
below fixes that problem.

Cheers,

Dave.

OK, I'll try this version as well. The first version seemed to work just fine.

The rest of the letter has been shortened from a more detailed review to a casual one with a scripted run of xfstests at the end. If nothing else, it shows a) a dir3 function in a stack trace on a non-CRC filesystem, b) loggen might need to be installed in the `make install` version of xfstests, c) your previous-run-times feature has been restored, and d) a pwrite64 "file too large" may or may not be there because J. Liu's bitness patches are doing their jobs.

Good luck!

Michael

Original review (reduced):

As far as logical testing goes, your fix works well. Without your patch, I got the original lockdep message, and the situation was replicated by issuing a `du -sx /` while xfstests xfs/054 was running. With your patch, all seems well for things that mention quotas.

The lock detector went off once more, but that was after attempting to merge a 2013-05-08 kernel git with xfs-oss/master of the same vintage, then apply my normal patches and your original quota lockdep fix.

Hardware change: Because `git commit` caused XFS to emit the message "XFS: possible memory allocation deadlock in xfs_buf_allocate_memory (mode:0x250)" a few times, RAM was increased to 256 MB. The merge was reworked by restoring the git from a backup tarball and doing the merge-and-patch session again.

Script started on Thu May  9 16:00:28 2013
cu -t -l /dev/ttyu0 -s 115200
Connected
# Note: lm256 is a local group
root@oldsvrhw:/var/lib/xfstests# ./check -xfs -g lm256
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS  -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch

xfs/003 1s ... 2s
xfs/004 8s ... 8s
xfs/008 5s ... 5s
xfs/009 8s ... 7s
xfs/012 16s ... 13s
xfs/016 76s ... 62s
xfs/017 [failed, exit status 1] - output mismatch (see /var/lib/xfstests/results/xfs/017.out.bad)
    --- tests/xfs/017.out       2013-05-09 05:24:30.996641679 -0400
+++ /var/lib/xfstests/results/xfs/017.out.bad 2013-05-09 16:03:25.185045262 -0400
    @@ -3,8 +3,6 @@
     *** test
         *** test 0
         *** test 1
    -    *** test 2
    -    *** test 3
    -    *** test 4
    -*** done
     ...
(Run 'diff -u tests/xfs/017.out /var/lib/xfstests/results/xfs/017.out.bad' to see the entire diff)
xfs/019  8s
xfs/021 8s ... 7s
xfs/022  [not run] No dump tape specified
xfs/023  [not run] No dump tape specified
xfs/024  [not run] No dump tape specified
xfs/025  [not run] No dump tape specified
xfs/026  22s
xfs/027 22s ... 22s
xfs/028  31s
xfs/029  11s
xfs/030  65s
xfs/031 102s ... 99s
xfs/033  49s
xfs/034 9s ... 9s
xfs/035  [not run] No dump tape specified
xfs/036  [not run] No dump tape specified
xfs/037  [not run] No dump tape specified
xfs/038  [not run] No dump tape specified
xfs/039  [not run] No dump tape specified
xfs/040  [not run] Can't run srcdiff without KWORKAREA set
xfs/041  121s
xfs/042  163s
xfs/043  [not run] No dump tape specified
xfs/044 [failed, exit status 1] - output mismatch (see /var/lib/xfstests/results/xfs/044.out.bad)
    --- tests/xfs/044.out       2013-05-09 05:24:32.493415709 -0400
+++ /var/lib/xfstests/results/xfs/044.out.bad 2013-05-09 16:16:33.963173264 -0400
    @@ -16,51 +16,18 @@
         *** mount (expect success)
         *** umount
     *** zero log
    -*** loggen
    -   *** zero block (1BB) x 100
+./tests/xfs/044: line 138: /var/lib/xfstests/src/loggen: No such file or directory
         *** mount (expect success)
     ...
(Run 'diff -u tests/xfs/044.out /var/lib/xfstests/results/xfs/044.out.bad' to see the entire diff)
xfs/045  11s
xfs/046 20s ... 19s
xfs/047  31s
xfs/048 2s ... 1s
xfs/049  [not run] External device testing in progress, skipped this test
xfs/050  49s
xfs/052 7s ... 8s
xfs/054 31s ... 28s
xfs/055  [not run] No dump tape specified
xfs/056  19s
xfs/057  [not run] Place holder for IRIX test 057
xfs/058  [not run] Place holder for IRIX test 058
xfs/059 32s ... 31s
xfs/060 31s ... 30s
xfs/061  20s
xfs/063 21s ... 21s
xfs/064 77s ... 75s
xfs/065 53s ... 51s
xfs/066  8s
xfs/067  12s
xfs/071  - output mismatch (see /var/lib/xfstests/results/xfs/071.out.bad)
    --- tests/xfs/071.out       2013-05-09 05:24:33.373478835 -0400
+++ /var/lib/xfstests/results/xfs/071.out.bad 2013-05-09 16:25:07.474096862 -0400
    @@ -40,14 +40,14 @@


     Writing 512 bytes, offset is +0 (direct=false)
    -wrote 512/512 bytes at offset <OFFSET>
    +pwrite64: File too large
     Reading 512 bytes (direct=false)
    -read 512/512 bytes at offset <OFFSET>
     ...
(Run 'diff -u tests/xfs/071.out /var/lib/xfstests/results/xfs/071.out.bad' to see the entire diff)
xfs/072 8s ... 7s
xfs/073  [not run] Cannot xfs_copy with external devices
xfs/078 ^CRan: xfs/003 xfs/004 xfs/008 xfs/009 xfs/012 xfs/016 xfs/017 xfs/019 xfs/021 xfs/026 xfs/027 xfs/028 xfs/029 xfs/030 xfs/031 xfs/033 xfs/034 xfs/041 xfs/042 xfs/044 xfs/045 xfs/046 xfs/047 xfs/048 xfs/050 xfs/052 xfs/054 xfs/056 xfs/059 xfs/060 xfs/061 xfs/063 xfs/064 xfs/065 xfs/066 xfs/067 xfs/071 xfs/072 Not run: xfs/022 xfs/023 xfs/024 xfs/025 xfs/035 xfs/036 xfs/037 xfs/038 xfs/039 xfs/040 xfs/043 xfs/049 xfs/055 xfs/057 xfs/058 xfs/073
Failures: xfs/017 xfs/044 xfs/071
Failed 3 of 38 tests
root@oldsvrhw:/var/lib/xfstests# ./check -xfs -g lm256
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/i686 oldsvrhw 3.9.0+
MKFS_OPTIONS  -- -f -llogdev=/dev/sda7 -bsize=4096 /dev/sdb6
MOUNT_OPTIONS -- -ologdev=/dev/sda7 /dev/sdb6 /mnt/xfstests-scratch

xfs/003 2s ... 1s
xfs/004 8s ... 8s
xfs/008 5s ... 5s
xfs/009 7s ... 8s
xfs/012 13s ...[ 1851.323902]
[ 1851.325479] =================================
[ 1851.326551] [ INFO: inconsistent lock state ]
[ 1851.326551] 3.9.0+ #1 Not tainted
[ 1851.326551] ---------------------------------
[ 1851.326551] inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
[ 1851.326551] kswapd0/18 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1851.326551] (&(&ip->i_lock)->mr_lock){++++-+}, at: [<c11dcabf>] xfs_ilock+0x10f/0x190
[ 1851.326551] {RECLAIM_FS-ON-R} state was registered at:
[ 1851.326551]   [<c105e10a>] mark_held_locks+0x8a/0xf0
[ 1851.326551]   [<c105e69c>] lockdep_trace_alloc+0x5c/0xa0
[ 1851.326551]   [<c109c52c>] __alloc_pages_nodemask+0x7c/0x670
[ 1851.326551]   [<c10bfd8e>] new_slab+0x6e/0x2a0
[ 1851.326551]   [<c14083a9>] __slab_alloc.isra.59.constprop.67+0x1d3/0x40a
[ 1851.326551]   [<c10c12cd>] __kmalloc+0x10d/0x180
[ 1851.326551]   [<c1199b56>] kmem_alloc+0x56/0xd0
[ 1851.326551]   [<c1199be1>] kmem_zalloc+0x11/0xd0
[ 1851.326551]   [<c11c666e>] xfs_dabuf_map.isra.2.constprop.5+0x22e/0x520
[ 1851.326551]   [<c11c7064>] xfs_da_read_buf+0x54/0x280
[ 1851.326551]   [<c11cf2e3>] xfs_dir3_leaf_read.constprop.6+0x43/0x90
[ 1851.326551]   [<c11d1d75>] xfs_dir2_leaf_lookup_int+0x45/0x350
[ 1851.326551]   [<c11d20b0>] xfs_dir2_leaf_lookup+0x30/0x150
[ 1851.326551]   [<c11ca75f>] xfs_dir_lookup+0x13f/0x160
[ 1851.326551]   [<c1196d20>] xfs_lookup+0x50/0x120
[ 1851.326551]   [<c118fcc8>] xfs_vn_lookup+0x48/0x80
[ 1851.326551]   [<c10cc92e>] lookup_real+0x1e/0x50
[ 1851.326551]   [<c10d00fa>] do_last.isra.46+0x74a/0xb90
[ 1851.326551]   [<c10d05db>] path_openat.isra.47+0x9b/0x3e0
[ 1851.326551]   [<c10d0951>] do_filp_open+0x31/0x80
[ 1851.326551]   [<c10c9665>] open_exec+0x45/0x110
[ 1851.326551]   [<c10ca96a>] do_execve_common+0x1da/0x550
[ 1851.326551]   [<c10caddf>] sys_execve+0x2f/0x50
[ 1851.326551]   [<c140e1ff>] sysenter_do_call+0x12/0x36
[ 1851.326551] irq event stamp: 789831
[ 1851.326551] hardirqs last enabled at (789831): [<c109e4fa>] clear_page_dirty_for_io+0xaa/0xe0 [ 1851.326551] hardirqs last disabled at (789830): [<c109e4d4>] clear_page_dirty_for_io+0x84/0xe0 [ 1851.326551] softirqs last enabled at (789776): [<c102a90e>] __do_softirq+0x12e/0x1f0 [ 1851.326551] softirqs last disabled at (789769): [<c102ab1d>] irq_exit+0x6d/0x80
[ 1851.326551]
[ 1851.326551] other info that might help us debug this:
[ 1851.326551]  Possible unsafe locking scenario:
[ 1851.326551]
[ 1851.326551]        CPU0
[ 1851.326551]        ----
[ 1851.326551]   lock(&(&ip->i_lock)->mr_lock);
[ 1851.326551]   <Interrupt>
[ 1851.326551]     lock(&(&ip->i_lock)->mr_lock);
[ 1851.326551]
[ 1851.326551]  *** DEADLOCK ***
[ 1851.326551]
[ 1851.326551] 1 lock held by kswapd0/18:
[ 1851.326551] #0: (sb_internal){.+.+.?}, at: [<c11f1aa6>] xfs_trans_alloc+0x26/0x50
[ 1851.326551]
[ 1851.326551] stack backtrace:
[ 1851.326551] Pid: 18, comm: kswapd0 Not tainted 3.9.0+ #1
[ 1851.326551] Call Trace:
[ 1851.326551]  [<c1407048>] print_usage_bug+0x1dc/0x1e6
[ 1851.326551]  [<c105d5d0>] ? print_shortest_lock_dependencies+0x1b0/0x1b0
[ 1851.326551]  [<c105e05b>] mark_lock+0x26b/0x290
[ 1851.326551]  [<c105ff3f>] __lock_acquire+0x64f/0x1690
[ 1851.326551]  [<c1199e66>] ? kmem_zone_alloc+0x56/0xd0
[ 1851.326551]  [<c1190321>] ? xfs_vn_mknod+0x151/0x170
[ 1851.326551]  [<c11f3a3d>] ? xlog_space_left+0x2d/0xf0
[ 1851.326551]  [<c11f3a3d>] ? xlog_space_left+0x2d/0xf0
[ 1851.326551]  [<c1061580>] lock_acquire+0x80/0x100
[ 1851.326551]  [<c11dcabf>] ? xfs_ilock+0x10f/0x190
[ 1851.326551]  [<c1047184>] down_write_nested+0x54/0xa0
[ 1851.326551]  [<c11dcabf>] ? xfs_ilock+0x10f/0x190
[ 1851.326551]  [<c11dcabf>] xfs_ilock+0x10f/0x190
[ 1851.326551]  [<c118f212>] xfs_iomap_write_allocate+0x232/0x400
[ 1851.326551]  [<c105cce8>] ? lock_release_holdtime.part.27+0x88/0xe0
[ 1851.326551]  [<c11dcc4b>] ? xfs_ilock_nowait+0xbb/0x1e0
[ 1851.326551]  [<c1046f8b>] ? up_read+0x1b/0x30
[ 1851.326551]  [<c11dce1b>] ? xfs_iunlock+0xab/0x170
[ 1851.326551]  [<c117c948>] xfs_map_blocks+0x388/0x420
[ 1851.326551]  [<c117da45>] xfs_vm_writepage+0x355/0x6e0
[ 1851.326551]  [<c1050655>] ? sched_clock_cpu+0xa5/0x100
[ 1851.326551]  [<c105e30e>] ? trace_hardirqs_off_caller+0x1e/0xa0
[ 1851.326551]  [<c10a4567>] shrink_page_list+0x4c7/0x680
[ 1851.326551]  [<c10a4ba1>] shrink_inactive_list+0x181/0x2d0
[ 1851.326551]  [<c10a4e6a>] shrink_lruvec+0x17a/0x3b0
[ 1851.326551]  [<c1046f8b>] ? up_read+0x1b/0x30
[ 1851.326551]  [<c10a3901>] ? shrink_slab+0x1e1/0x2f0
[ 1851.326551]  [<c10a5a2e>] kswapd+0x30e/0x680
[ 1851.326551]  [<c10a5720>] ? try_to_free_pages+0x680/0x680
[ 1851.326551]  [<c10424d5>] kthread+0xa5/0xb0
[ 1851.326551]  [<c105cd1c>] ? lock_release_holdtime.part.27+0xbc/0xe0
[ 1851.326551]  [<c140e177>] ret_from_kernel_thread+0x1b/0x28
[ 1851.326551]  [<c1042430>] ? __kthread_parkme+0x70/0x70
 14s
[ 1859.123845] XFS (sdb5): Mounting Filesystem
[ 1859.439419] XFS (sdb5): Ending clean mount
xfs/016 62s ...[ 1862.733835] XFS (sdb6): Mounting Filesystem
[ 1863.049345] XFS (sdb6): Ending clean mount
[ 1866.823293] XFS (sdb6): Mounting Filesystem

<Prev in Thread] Current Thread [Next in Thread>