xfs hang or slowness while removing files
Avi Kivity
avi at scylladb.com
Thu Dec 3 06:02:58 CST 2015
On 12/02/2015 11:09 PM, Dave Chinner wrote:
> On Wed, Dec 02, 2015 at 01:07:56PM +0200, Avi Kivity wrote:
>> Removing a directory with ~900 32MB files, we saw this:
>>
>> [ 5645.684464] INFO: task xfsaild/md0:12247 blocked for more than
>> 120 seconds.
>> [ 5645.686488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 5645.687713] xfsaild/md0 D ffff88103f9d3680 0 12247 2
>> 0x00000080
>> [ 5645.687729] ffff8810136f7d40 0000000000000046 ffff882026d82220
>> ffff8810136f7fd8
>> [ 5645.687732] ffff8810136f7fd8 ffff8810136f7fd8 ffff882026d82220
>> ffff882026d82220
>> [ 5645.687734] ffff88103f9d44c0 0000000000000001 0000000000000000
>> ffff8820285aa928
>> [ 5645.687737] Call Trace:
>> [ 5645.687747] [<ffffffff816098d9>] schedule+0x29/0x70
>> [ 5645.687768] [<ffffffffa06cd880>] _xfs_log_force+0x230/0x290 [xfs]
>> [ 5645.687773] [<ffffffff810a9510>] ? wake_up_state+0x20/0x20
>> [ 5645.687796] [<ffffffffa06cd906>] xfs_log_force+0x26/0x80 [xfs]
>> [ 5645.687808] [<ffffffffa06d2380>] ?
>> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>> [ 5645.687818] [<ffffffffa06d24d1>] xfsaild+0x151/0x5e0 [xfs]
>> [ 5645.687828] [<ffffffffa06d2380>] ?
>> xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
>> [ 5645.687831] [<ffffffff8109727f>] kthread+0xcf/0xe0
>> [ 5645.687834] [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
>> [ 5645.687837] [<ffffffff81614358>] ret_from_fork+0x58/0x90
>> [ 5645.687852] [<ffffffff810971b0>] ? kthread_create_on_node+0x140/0x140
>>
>> 'rm' did not complete, but was killable. Nothing else was running
>> on the system at the time.
> Which means the filesystem was not hung, nor was rm blocked in XFS.
> That implies the directory/inode reads that rm does were running
> really slowly. Something else going on here.
>
>> The filesystem was mounted with the discard option set, but since
>> that is discouraged, we'll retry without it.
> Ah, yes, that could cause exactly these symptoms.
>
> I'd guess you are using storage that has unqueued TRIM operations
> (i.e. SATA 3.0 hardware somewhere in your storage path, as queued
> TRIM only came along with SATA 3.1 and AFAIA there's not a lot of
> 3.1 hardware out there yet) which means while discards are
> being issued all other IO tanks and goes really slow.
It's bare-metal cloud hardware so I don't immediately know (I don't
control the machine). I could find out -
> We have seen individual TRIM requests on some SSDs take tens of
> milliseconds to complete, regardless of their size. Hence if you
> have one of these devices and you're running thousands of TRIM
> commands across ~30GB of data being freed, then you'd see things
> like rm being really slow on the read side and log forces waiting an
> awful long time for journal IO completion processing to take
> place...
>
- but it's probably better to just drop discard and see if it happens again.
More information about the xfs
mailing list