xfs-masters
[Top] [All Lists]

[Bug 852] New: deadlock in xfsconvertd

To: xfs-masters@xxxxxxxxxxx
Subject: [Bug 852] New: deadlock in xfsconvertd
From: bugzilla-daemon@xxxxxxxxxxx
Date: Mon, 5 Oct 2009 10:19:42 -0500
Auto-submitted: auto-generated
http://oss.sgi.com/bugzilla/show_bug.cgi?id=852

           Summary: deadlock in xfsconvertd
           Product: XFS
           Version: unspecified
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P5
         Component: XFS kernel code
        AssignedTo: xfs-masters@xxxxxxxxxxx
        ReportedBy: tneumann@xxxxxxxxxxxxxxxxxxxxx
   Estimated Hours: 0.0
    Classification: Unclassified


I observe a semi-reproducible deadlock in xfsconvertd when creating massive I/O
(i.e., the deadlock is not perfectly reproducible but frequent). I included
parts of the kernel log below, apparently xfsconvertd deadlocks with my user
space program.
The program is performing massive ammounts of I/O using memory-mapped I/O, so
physical memory is scarce (everything is used for caching). During processing,
the user space program allocates new blocks using posix_fallocate64 (I am not
sure if this is relevant, but it shows up in the logs). Something in this
configuration frequently (but not always) leads to deadlocks.

The kernel version where I can semi-reproduce this is

Linux infno5602 2.6.31-11-generic #38-Ubuntu SMP Fri Oct 2 11:06:40 UTC 2009
x86_64 GNU/Linux

but I think I have seen this with other kernel versions, too. The kernel log
contains the following entries:

Oct  5 09:28:28 boxname kernel: [64440.820117] INFO: task xfsconvertd/0:420
blocked for more than 120 seconds.
Oct  5 09:28:28 boxname kernel: [64440.820122] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 09:28:28 boxname kernel: [64440.820128] xfsconvertd/0 D 0000000000000000
    0   420      2 0x00000000
Oct  5 09:28:28 boxname kernel: [64440.820137]  ffff88007a01d490
0000000000000046 ffff88007a01d410 0000000000015580
Oct  5 09:28:28 boxname kernel: [64440.820147]  ffff88007b243110
0000000000015580 0000000000015580 0000000000015580
Oct  5 09:28:28 boxname kernel: [64440.820155]  0000000000015580
ffff88007b243110 0000000000015580 0000000000015580
Oct  5 09:28:28 boxname kernel: [64440.820163] Call Trace:
Oct  5 09:28:28 boxname kernel: [64440.820180]  [<ffffffff810d5730>] ?
sync_page+0x0/0x50
Oct  5 09:28:28 boxname kernel: [64440.820189]  [<ffffffff81521578>]
io_schedule+0x28/0x40
Oct  5 09:28:28 boxname kernel: [64440.820195]  [<ffffffff810d576d>]
sync_page+0x3d/0x50
Oct  5 09:28:28 boxname kernel: [64440.820201]  [<ffffffff81521ca7>]
__wait_on_bit+0x57/0x80
Oct  5 09:28:28 boxname kernel: [64440.820209]  [<ffffffff811151bd>] ?
__mem_cgroup_uncharge_common+0x9d/0x160
Oct  5 09:28:28 boxname kernel: [64440.820216]  [<ffffffff810d58de>]
wait_on_page_bit+0x6e/0x80
Oct  5 09:28:28 boxname kernel: [64440.820224]  [<ffffffff81073b70>] ?
wake_bit_function+0x0/0x40
Oct  5 09:28:28 boxname kernel: [64440.820233]  [<ffffffff810e3283>]
shrink_page_list+0x163/0x610
Oct  5 09:28:28 boxname kernel: [64440.820242]  [<ffffffff810ed539>] ?
congestion_wait+0x79/0x90
Oct  5 09:28:28 boxname kernel: [64440.820248]  [<ffffffff81073b30>] ?
autoremove_wake_function+0x0/0x40
Oct  5 09:28:28 boxname kernel: [64440.820255]  [<ffffffff810e3d51>]
shrink_inactive_list+0x621/0x660
Oct  5 09:28:28 boxname kernel: [64440.820263]  [<ffffffff810e3031>] ?
shrink_active_list+0x271/0x360
Oct  5 09:28:28 boxname kernel: [64440.820271]  [<ffffffff8104614c>] ?
__enqueue_entity+0x7c/0x80
Oct  5 09:28:28 boxname kernel: [64440.820279]  [<ffffffff810e3ddc>]
shrink_list+0x4c/0xe0
Oct  5 09:28:28 boxname kernel: [64440.820285]  [<ffffffff810e3fb7>]
shrink_zone+0x147/0x200
Oct  5 09:28:28 boxname kernel: [64440.820293]  [<ffffffff810e40eb>]
shrink_zones+0x7b/0x100
Oct  5 09:28:28 boxname kernel: [64440.820300]  [<ffffffff810e41ee>]
do_try_to_free_pages+0x7e/0x330
Oct  5 09:28:28 boxname kernel: [64440.820307]  [<ffffffff810e459a>]
try_to_free_pages+0x6a/0x70
Oct  5 09:28:28 boxname kernel: [64440.820314]  [<ffffffff810e1dc0>] ?
isolate_pages_global+0x0/0x60
Oct  5 09:28:28 boxname kernel: [64440.820321]  [<ffffffff810db65e>]
__alloc_pages_slowpath+0x2ce/0x4e0
Oct  5 09:28:28 boxname kernel: [64440.820328]  [<ffffffff810db9be>]
__alloc_pages_nodemask+0x14e/0x150
Oct  5 09:28:28 boxname kernel: [64440.820335]  [<ffffffff81107e32>]
alloc_pages_current+0x82/0xd0
Oct  5 09:28:28 boxname kernel: [64440.820343]  [<ffffffff8110d018>]
new_slab+0x238/0x300
Oct  5 09:28:28 boxname kernel: [64440.820350]  [<ffffffff8110fe69>]
__slab_alloc+0x169/0x2d0
Oct  5 09:28:28 boxname kernel: [64440.820405]  [<ffffffffa009ef72>] ?
kmem_zone_alloc+0x92/0xe0 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820413]  [<ffffffff8111040d>]
kmem_cache_alloc+0x13d/0x150
Oct  5 09:28:28 boxname kernel: [64440.820448]  [<ffffffffa009ef72>]
kmem_zone_alloc+0x92/0xe0 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820483]  [<ffffffffa009efd9>]
kmem_zone_zalloc+0x19/0x50 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820520]  [<ffffffffa009705f>]
_xfs_trans_alloc+0x2f/0x70 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820556]  [<ffffffffa0097202>]
xfs_trans_alloc+0x92/0xa0 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820563]  [<ffffffff810ec795>] ?
__dec_zone_page_state+0x25/0x30
Oct  5 09:28:28 boxname kernel: [64440.820571]  [<ffffffff8104d139>] ?
update_curr+0x189/0x190
Oct  5 09:28:28 boxname kernel: [64440.820606]  [<ffffffffa0085fb1>]
xfs_iomap_write_unwritten+0x61/0x210 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820615]  [<ffffffff81272da6>] ?
rb_erase+0xd6/0x160
Oct  5 09:28:28 boxname kernel: [64440.820623]  [<ffffffff8101062b>] ?
__switch_to+0xcb/0x370
Oct  5 09:28:28 boxname kernel: [64440.820630]  [<ffffffff8104a155>] ?
finish_task_switch+0x65/0x120
Oct  5 09:28:28 boxname kernel: [64440.820666]  [<ffffffffa009f970>] ?
xfs_end_bio_unwritten+0x0/0x80 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820700]  [<ffffffffa009f9d5>]
xfs_end_bio_unwritten+0x65/0x80 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820708]  [<ffffffff8106e7a5>]
run_workqueue+0x95/0x170
Oct  5 09:28:28 boxname kernel: [64440.820715]  [<ffffffff8106e924>]
worker_thread+0xa4/0x120
Oct  5 09:28:28 boxname kernel: [64440.820721]  [<ffffffff81073b30>] ?
autoremove_wake_function+0x0/0x40
Oct  5 09:28:28 boxname kernel: [64440.820728]  [<ffffffff8106e880>] ?
worker_thread+0x0/0x120
Oct  5 09:28:28 boxname kernel: [64440.820734]  [<ffffffff81073746>]
kthread+0xa6/0xb0
Oct  5 09:28:28 boxname kernel: [64440.820741]  [<ffffffff8101308a>]
child_rip+0xa/0x20
Oct  5 09:28:28 boxname kernel: [64440.820747]  [<ffffffff810736a0>] ?
kthread+0x0/0xb0
Oct  5 09:28:28 boxname kernel: [64440.820752]  [<ffffffff81013080>] ?
child_rip+0x0/0x20
Oct  5 09:28:28 boxname kernel: [64440.820794] INFO: task rdf3xload:15112
blocked for more than 120 seconds.
Oct  5 09:28:28 boxname kernel: [64440.820798] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct  5 09:28:28 boxname kernel: [64440.820803] dataload     D 0000000000000000 
   0 15112  15111 0x00000000
Oct  5 09:28:28 boxname kernel: [64440.820811]  ffff88006e2c9d08
0000000000000086 ffff88007100e608 0000000000015580
Oct  5 09:28:28 boxname kernel: [64440.820820]  ffff8800154b9a60
0000000000015580 0000000000015580 0000000000015580
Oct  5 09:28:28 boxname kernel: [64440.820828]  0000000000015580
ffff8800154b9a60 0000000000015580 0000000000015580
Oct  5 09:28:28 boxname kernel: [64440.820835] Call Trace:
Oct  5 09:28:28 boxname kernel: [64440.820875]  [<ffffffffa00a0495>]
xfs_ioend_wait+0x85/0xc0 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820883]  [<ffffffff81073b30>] ?
autoremove_wake_function+0x0/0x40
Oct  5 09:28:28 boxname kernel: [64440.820920]  [<ffffffffa009cf30>]
xfs_setattr+0x3c0/0xbb0 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820955]  [<ffffffffa009e367>] ?
xfs_change_file_space+0x247/0x300 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820991]  [<ffffffffa00a7589>]
xfs_vn_fallocate+0xf9/0x100 [xfs]
Oct  5 09:28:28 boxname kernel: [64440.820999]  [<ffffffff81032419>] ?
default_spin_lock_flags+0x9/0x10
Oct  5 09:28:28 boxname kernel: [64440.821007]  [<ffffffff811189c3>]
do_fallocate+0xf3/0x100
Oct  5 09:28:28 boxname kernel: [64440.821013]  [<ffffffff81118a19>]
sys_fallocate+0x49/0x70
Oct  5 09:30:28 boxname kernel: [64440.821020]  [<ffffffff81011fc2>]
system_call_fastpath+0x16/0x1b

and more entries of this kind. Afterwards a clean shutdown is no longer
possible.

-- 
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

<Prev in Thread] Current Thread [Next in Thread>