xfs
[Top] [All Lists]

task hung over xfs

To: linux-xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx
Subject: task hung over xfs
From: Raz <raziebe@xxxxxxxxx>
Date: Wed, 30 May 2012 21:44:45 +0300
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=rtPI4GWajzsEWUUIG1b7RPBN3Ux/k8edfHBhKUoC2kA=; b=lYF//Hdz6vBM49MjSW5JcW7GqTbYAdF5DEiyAXCHTQo0ThKkki/KO8iYYjNiyMcsL9 TrfNWVKc1pexOV2gdnS8uxBABfZ8HVby4tCAi/4Y0k+KOdqrx2FF3CBdR2OigfIpEBTf mjf2zdoNGVIcdWdc/nt11AvYq9wJJsrC93T04x3R3ISdx4Kz06+uFibQFn3YfLUgrsH5 UUPHXE/6glRulxRAFtVEouN2fqHnLolNCFBdh7xbAKKOEP5y6dxoYGDeGiDiE4jaln5G FjVGqdBBLOGuQ7cUvIn4mQtAUN3H3I2330ZBYmhiCjW78NTr+95hpWz84sI5eKdkO3EM DghA==
Hello
We using 2.6.32 gentoo 64bit.  and we're getting task_hung timeout stack.

Our server uses direct IO.  It reads files contents to buffers in
memory  and sends them by TCP.  In addition, data is received
by TCP and stored in files on disk.
Most of the IO is reading data and sending it by TCP sockets.

There are 4 threads reading data from disk into memory buffers. One
thread per partition.
There are about 20 threads reading data from the network and saving it
to disk.

In addition, there is an operation that is done on every file once it is
downloaded.  This operation maps data from file to memory.  It is done
in Java. I assume it is mmap.  The mapping is very short.

The bellow is the stack. Is this xfs bug  ? root file system is xfs as
well the data partition.
Was a fix made in this area  ? when was it ?
thank you
raz


Apr 20 04:11:50 hcs05 [499496.793804] INFO: task java:10439 blocked
for more than 120 seconds.
Apr 20 04:11:50 hcs05 [499496.793809] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 04:11:50 hcs05 [499496.793814] java          D 0000000000000001
    0 10439  10408 0x00000000
Apr 20 04:11:50 hcs05 [499496.793821]  ffff88042acd5e78
0000000000000086 0000000000000000 0000000000000081
Apr 20 04:11:50 hcs05 [499496.793828]  0000000000000001
0000000041365a28 ffff88042acd5ee8 ffffffff810a1b76
Apr 20 04:11:50 hcs05 [499496.793835]  0000000000015840
000000000000f9c8 ffff88042c9596b0 ffff88042e4c2d60
Apr 20 04:11:50 hcs05 [499496.793841] Call Trace:
Apr 20 04:11:50 hcs05 [499496.793852]  [<ffffffff810a1b76>] ?
do_futex+0x106/0xb90
Apr 20 04:11:50 hcs05 [499496.793861]  [<ffffffff8101aee6>] ? read_tsc+0x16/0x40
Apr 20 04:11:50 hcs05 [499496.793866]  [<ffffffff815f33f5>]
__down_write_nested+0xa5/0x100
Apr 20 04:11:50 hcs05 [499496.793869]  [<ffffffff815f346e>]
__down_write+0x1e/0x40
Apr 20 04:11:50 hcs05 [499496.793872]  [<ffffffff815f24ac>] down_write+0x1c/0x40
Apr 20 04:11:50 hcs05 [499496.793876]  [<ffffffff81127e39>]
sys_mprotect+0x109/0x2c0
Apr 20 04:11:50 hcs05 [499496.793880]  [<ffffffff81013302>]
system_call_fastpath+0x16/0x1b
Apr 20 04:11:50 hcs05 [499496.793883] INFO: task java:10449 blocked
for more than 120 seconds.
Apr 20 04:11:50 hcs05 [499496.793885] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 04:11:50 hcs05 [499496.793886] java          D 0007ffffffffe708
    0 10449  10408 0x00000000
Apr 20 04:11:50 hcs05 [499496.793890]  ffff88042acd1c28
0000000000000086 0000000000cd1b88 0000000000000000
Apr 20 04:11:50 hcs05 [499496.793893]  0000000000000000
ffff88042acd1d7c 000000002c95c410 0000000000000000
Apr 20 04:11:50 hcs05 [499496.793896]  0000000000015840
000000000000f9c8 ffff88042c95c410 ffff88042e4d96b0
Apr 20 04:11:50 hcs05 [499496.793899] Call Trace:
Apr 20 04:11:50 hcs05 [499496.793922]  [<ffffffffa060bd4a>] ?
kmem_zone_alloc+0xaa/0x110 [xfs]
Apr 20 04:11:50 hcs05 [499496.793926]  [<ffffffff815f33f5>]
__down_write_nested+0xa5/0x100
Apr 20 04:11:50 hcs05 [499496.793928]  [<ffffffff815f346e>]
__down_write+0x1e/0x40
Apr 20 04:11:50 hcs05 [499496.793931]  [<ffffffff815f24ac>] down_write+0x1c/0x40
Apr 20 04:11:50 hcs05 [499496.793947]  [<ffffffffa05e7e2c>]
xfs_ilock+0x9c/0xb0 [xfs]
Apr 20 04:11:50 hcs05 [499496.793962]  [<ffffffffa06093d6>]
xfs_free_eofblocks+0x256/0x290 [xfs]
Apr 20 04:11:50 hcs05 [499496.793978]  [<ffffffffa0609f1d>]
xfs_release+0x14d/0x210 [xfs]
Apr 20 04:11:50 hcs05 [499496.793993]  [<ffffffffa0612303>]
xfs_file_release+0x23/0x40 [xfs]
Apr 20 04:11:50 hcs05 [499496.793996]  [<ffffffff8114f1f9>] __fput+0xe9/0x210
Apr 20 04:11:50 hcs05 [499496.793999]  [<ffffffff8114f34b>] fput+0x2b/0x50
Apr 20 04:11:50 hcs05 [499496.794002]  [<ffffffff81124349>] remove_vma+0x49/0xb0
Apr 20 04:11:50 hcs05 [499496.794004]  [<ffffffff81125a7a>]
do_munmap+0x36a/0x3d0
Apr 20 04:11:50 hcs05 [499496.794007]  [<ffffffff815f346e>] ?
__down_write+0x1e/0x40
Apr 20 04:11:50 hcs05 [499496.794010]  [<ffffffff81125b3c>] sys_munmap+0x5c/0xa0
Apr 20 04:11:50 hcs05 [499496.794013]  [<ffffffff81013302>]
system_call_fastpath+0x16/0x1b
Apr 20 04:11:50 hcs05 [499496.794016] INFO: task java:10467 blocked
for more than 120 seconds.
Apr 20 04:11:50 hcs05 [499496.794017] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 04:11:50 hcs05 [499496.794019] java          D ffff88042cd5df58
    0 10467  10408 0x00000000
Apr 20 04:11:50 hcs05 [499496.794022]  ffff88042cd5de68
0000000000000086 ffff8804249016e0 000000004139b728
Apr 20 04:11:50 hcs05 [499496.794026]  0000000000000000
0000000000000081 0000000000000001 000000004139b728
Apr 20 04:11:50 hcs05 [499496.794029]  0000000000015840
000000000000f9c8 ffff88042c9fdac0 ffff88042e4d96b0
Apr 20 04:11:50 hcs05 [499496.794032] Call Trace:
Apr 20 04:11:50 hcs05 [499496.794035]  [<ffffffff815f353d>]
__down_read+0xad/0xfa
Apr 20 04:11:50 hcs05 [499496.794037]  [<ffffffff815f24ec>] down_read+0x1c/0x40
Apr 20 04:11:50 hcs05 [499496.794040]  [<ffffffff815f6e59>]
do_page_fault+0x379/0x3a0
Apr 20 04:11:50 hcs05 [499496.794043]  [<ffffffff815f4145>] page_fault+0x25/0x30
Apr 20 04:11:50 hcs05 [499496.794054] INFO: task java:10949 blocked
for more than 120 seconds.
Apr 20 04:11:50 hcs05 [499496.794056] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 04:11:50 hcs05 [499496.794058] java          D ffff880429dd3f58
    0 10949  10408 0x00000000
Apr 20 04:11:50 hcs05 [499496.794061]  ffff880429dd3e68
0000000000000086 0000000087237725 0000000000000000
Apr 20 04:11:50 hcs05 [499496.794064]  ffff880429dd3df8
ffffffff811498ab 0000000100022000 0000000087237725
Apr 20 04:11:50 hcs05 [499496.794067]  0000000000015840
000000000000f9c8 ffff880429dd8000 ffffffff81852600
Apr 20 04:11:50 hcs05 [499496.794070] Call Trace:
Apr 20 04:11:50 hcs05 [499496.794075]  [<ffffffff811498ab>] ?
mem_cgroup_charge_common+0x7b/0xb0
Apr 20 04:11:50 hcs05 [499496.794078]  [<ffffffff815f353d>]
__down_read+0xad/0xfa
Apr 20 04:11:50 hcs05 [499496.794081]  [<ffffffff815f24ec>] down_read+0x1c/0x40
Apr 20 04:11:50 hcs05 [499496.794083]  [<ffffffff815f6e59>]
do_page_fault+0x379/0x3a0
Apr 20 04:11:50 hcs05 [499496.794086]  [<ffffffff815f4145>] page_fault+0x25/0x30
Apr 20 04:11:50 hcs05 [499496.794089] INFO: task java:10952 blocked
for more than 120 seconds.
Apr 20 04:11:50 hcs05 [499496.794090] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 04:11:50 hcs05 [499496.794092] java          D ffff880429de5f58
    0 10952  10408 0x00000000
Apr 20 04:11:50 hcs05 [499496.794095]  ffff880429de5e68
0000000000000086 0000000000000000 00007fe2644c33f4
Apr 20 04:11:50 hcs05 [499496.794098]  0000000000006f84
000000000000541b ffff880429de5df8 ffffffff8151e0d8
Apr 20 04:11:50 hcs05 [499496.794101]  0000000000015840
000000000000f9c8 ffff880429ddc410 ffffffff81852600
Apr 20 04:11:50 hcs05 [499496.794104] Call Trace:
Apr 20 04:11:50 hcs05 [499496.794108]  [<ffffffff8151e0d8>] ?
tcp_ioctl+0xb8/0x1d0
Apr 20 04:11:50 hcs05 [499496.794113]  [<ffffffff8157120e>] ?
inet6_ioctl+0x3e/0x100
Apr 20 04:11:50 hcs05 [499496.794117]  [<ffffffff814c7c66>] ?
sock_ioctl+0x86/0x2b0
Apr 20 04:11:50 hcs05 [499496.794120]  [<ffffffff815f353d>]
__down_read+0xad/0xfa
Apr 20 04:11:50 hcs05 [499496.794123]  [<ffffffff815f24ec>] down_read+0x1c/0x40
Apr 20 04:11:50 hcs05 [499496.794125]  [<ffffffff815f6e59>]
do_page_fault+0x379/0x3a0
Apr 20 04:11:50 hcs05 [499496.794128]  [<ffffffff815f4145>] page_fault+0x25/0x30
Apr 20 04:11:50 hcs05 [499496.794139] INFO: task java:11768 blocked
for more than 120 seconds.
Apr 20 04:11:50 hcs05 [499496.794141] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 04:11:50 hcs05 [499496.794143] java          D ffff8803bc495b48
    0 11768  10408 0x00000000
Apr 20 04:11:50 hcs05 [499496.794146]  ffff8803bc495a58
0000000000000086 ffff8803bc4959a8 ffffffff815f38bc
Apr 20 04:11:50 hcs05 [499496.794149]  ffff8803bc4959e8
000000005c14da46 ffff8803bc4959d8 ffffffff811300a2
Apr 20 04:11:50 hcs05 [499496.794152]  0000000000015840
000000000000f9c8 ffff8803bc468000 ffff88042e4c2d60
Apr 20 04:11:50 hcs05 [499496.794155] Call Trace:
Apr 20 04:11:50 hcs05 [499496.794157]  [<ffffffff815f38bc>] ?
_spin_lock+0x1c/0x40
Apr 20 04:11:50 hcs05 [499496.794161]  [<ffffffff811300a2>] ?
swap_info_get+0x82/0x120
Apr 20 04:11:50 hcs05 [499496.794164]  [<ffffffff811488c1>] ?
mem_cgroup_commit_charge_swapin+0x21/0x40
Apr 20 04:11:50 hcs05 [499496.794167]  [<ffffffff815f353d>]
__down_read+0xad/0xfa
Apr 20 04:11:50 hcs05 [499496.794170]  [<ffffffff815f24ec>] down_read+0x1c/0x40
Apr 20 04:11:50 hcs05 [499496.794172]  [<ffffffff815f6e59>]
do_page_fault+0x379/0x3a0
Apr 20 04:11:50 hcs05 [499496.794175]  [<ffffffff815f4145>] page_fault+0x25/0x30
Apr 20 04:11:50 hcs05 [499496.794180]  [<ffffffff810fe39c>] ?
file_read_actor+0x6c/0x180
Apr 20 04:11:50 hcs05 [499496.794183]  [<ffffffff810fe437>] ?
file_read_actor+0x107/0x180
Apr 20 04:11:50 hcs05 [499496.794187]  [<ffffffff81100d02>]
generic_file_aio_read+0x492/0x6b0
Apr 20 04:11:50 hcs05 [499496.794202]  [<ffffffffa06178f8>]
xfs_read+0x138/0x2c0 [xfs]
Apr 20 04:11:50 hcs05 [499496.794218]  [<ffffffffa06124ce>]
xfs_file_aio_read+0x6e/0x90 [xfs]
Apr 20 04:11:50 hcs05 [499496.794221]  [<ffffffff8114d341>]
do_sync_read+0x101/0x160
Apr 20 04:11:50 hcs05 [499496.794226]  [<ffffffff8108c4a0>] ?
autoremove_wake_function+0x0/0x60
Apr 20 04:11:50 hcs05 [499496.794232]  [<ffffffff81276b94>] ?
security_file_permission+0x24/0x40
Apr 20 04:11:50 hcs05 [499496.794234]  [<ffffffff8114dc64>] vfs_read+0xe4/0x1c0
Apr 20 04:11:50 hcs05 [499496.794237]  [<ffffffff8114de5f>] sys_read+0x5f/0xc0
Apr 20 04:11:50 hcs05 [499496.794240]  [<ffffffff81013302>]
system_call_fastpath+0x16/0x1b
Apr 20 04:11:50 hcs05 [499496.794247] INFO: task ps:7313 blocked for
more than 120 seconds.
Apr 20 04:11:50 hcs05 [499496.794249] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 20 04:11:50 hcs05 [499496.794251] ps            D 0000000000000000
    0  7313   7312 0x00000000
Apr 20 04:11:50 hcs05 [499496.794254]  ffff88029df4fd18
0000000000000082 ffff88029df4fc78 0000003800000038
Apr 20 04:11:50 hcs05 [499496.794257]  0000880200000001
0000000000000001 0000000000011b00 0000000000000001
Apr 20 04:11:50 hcs05 [499496.794260]  0000000000015840
000000000000f9c8 ffff8801890d4410 ffff88042e4d96b0
Apr 20 04:11:50 hcs05 [499496.794264] Call Trace:
Apr 20 04:11:50 hcs05 [499496.794268]  [<ffffffff8116c1de>] ?
mntput_no_expire+0x3e/0x140
Apr 20 04:11:50 hcs05 [499496.794271]  [<ffffffff815f353d>]
__down_read+0xad/0xfa
Apr 20 04:11:50 hcs05 [499496.794274]  [<ffffffff815f24ec>] down_read+0x1c/0x40
Apr 20 04:11:50 hcs05 [499496.794277]  [<ffffffff8112219b>]
access_process_vm+0x5b/0x200
Apr 20 04:11:50 hcs05 [499496.794281]  [<ffffffff81068cfd>] ?
get_task_mm+0x4d/0xb0
Apr 20 04:11:50 hcs05 [499496.794285]  [<ffffffff811b07ba>]
proc_pid_cmdline+0xba/0x150
Apr 20 04:11:50 hcs05 [499496.794288]  [<ffffffff811b12d2>]
proc_info_read+0xc2/0x110
Apr 20 04:11:50 hcs05 [499496.794291]  [<ffffffff8114d407>] ?
rw_verify_area+0x67/0xf0
Apr 20 04:11:50 hcs05 [499496.794293]  [<ffffffff8114dc64>] vfs_read+0xe4/0x1c0

<Prev in Thread] Current Thread [Next in Thread>
  • task hung over xfs, Raz <=