xfs
[Top] [All Lists]

Flush hung in D state at fclose (after heavy XFS writes

To: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Subject: Flush hung in D state at fclose (after heavy XFS writes
From: "Cheung, Norman" <Norman.Cheung@xxxxxxxxxxxxxx>
Date: Sat, 29 Jun 2013 19:58:20 +0000
Accept-language: en-US
Cc: "Cheung, Norman" <Norman.Cheung@xxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Thread-index: Ac51AETDnirZnYY9SF6T8tOjsCWDcg==
Thread-topic: Flush hung in D state at fclose (after heavy XFS writes
Hello,

I had this issue with Kernel 3.0.13 and reported in this forum ~4 months ago. 
I'd been working with Suse support to resolve but getting no where.  I will be 
grateful should someone can help and give me some pointers to solve or work 
around this hang.  Suse has asked me to reproduce it with 3.0.80-0.5(their 
latest release) so all the attached traces were produced against this kernel.

My application has ~10+ threads running non-stop with high CPU usage (close to 
100% and each is locked to a specific CPU with 32 CPUs in the server).  The 
processing runs in a 5 minute cycle. After the second minute 5 of these threads 
start writing to disk for 3 minutes. Each thread writes a file ~16G in size.  
The threads run with very high priority (-2, much higher than flush, I also 
tried threads with same priority as pdflush, fails sooner)

Disk is a raid-5, I have also tried 5 X raid-0, one raid per disk writing 
thread same result.

Below are the traces from magic sysrq "w"

If I use O_DIRECT write, my application will run for 5-6 days and hang.  If I 
use buffered write, it will hang after 3-4 hours. 

And when I attached GDB to my applications, the flush-8:16 will immediately 
free from D state, and my application too.

Without knowing the internal of flush and XFS, this is my guess looking at the 
trace below:
Could it be that flush(xfs_fs_write_inode)  and my application 
(xfs_file_release at fclose) are holding the same lock?
 
Stack trace of a few thread of interests.  The entire dump is attached.

Jun 28 19:34:11 IMCNode000 kernel: [29127.580426] SysRq : Show Blocked State
Jun 28 19:34:11 IMCNode000 kernel: [29127.580434]   task                        
PC stack   pid father
Jun 28 19:34:11 IMCNode000 kernel: [29127.580629] flush-8:16      D 
0000000000000400     0  5502      2 0x00000000
Jun 28 19:34:11 IMCNode000 kernel: [29127.580636]  ffff883e5bbb7b90 
0000000000000046 ffff8819c91d25c0 ffff883e5bbb6010
Jun 28 19:34:11 IMCNode000 kernel: [29127.580643]  0000000000011180 
0000000000011180 ffff883e5bbb7fd8 ffff883e5bbb7fd8
Jun 28 19:34:11 IMCNode000 kernel: [29127.580651]  0000000000011180 
ffff883e5b8aa0c0 ffff8819c91d25c0 ffff8819dc99a3c0
Jun 28 19:34:11 IMCNode000 kernel: [29127.580658] Call Trace:
Jun 28 19:34:11 IMCNode000 kernel: [29127.580711]  [<ffffffffa04a2cd5>] 
xfs_ioend_wait+0x85/0xb0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.580800]  [<ffffffffa04acb28>] 
xfs_fs_write_inode+0xb8/0x190 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.580880]  [<ffffffff81177793>] 
writeback_single_inode+0x2f3/0x360
Jun 28 19:34:11 IMCNode000 kernel: [29127.580889]  [<ffffffff81177ee4>] 
writeback_sb_inodes+0xc4/0x1a0
Jun 28 19:34:11 IMCNode000 kernel: [29127.580897]  [<ffffffff81178743>] 
writeback_inodes_wb+0xd3/0x160
Jun 28 19:34:11 IMCNode000 kernel: [29127.580904]  [<ffffffff81178b8b>] 
wb_writeback+0x3bb/0x400
Jun 28 19:34:11 IMCNode000 kernel: [29127.580911]  [<ffffffff81178e0b>] 
wb_do_writeback+0x23b/0x250
Jun 28 19:34:11 IMCNode000 kernel: [29127.580918]  [<ffffffff81178f04>] 
bdi_writeback_thread+0xe4/0x240
Jun 28 19:34:11 IMCNode000 kernel: [29127.580926]  [<ffffffff8107c646>] 
kthread+0x96/0xa0
Jun 28 19:34:11 IMCNode000 kernel: [29127.580936]  [<ffffffff81450e44>] 
kernel_thread_helper+0x4/0x10
Jun 28 19:34:11 IMCNode000 kernel: [29127.580954] Tigris_IMC.exe  D 
0000000100669478     0  6347   6269 0x00000000
Jun 28 19:34:11 IMCNode000 kernel: [29127.580960]  ffff8819dc9b5848 
0000000000000086 0000000000000000 ffff8819dc9b4010
Jun 28 19:34:11 IMCNode000 kernel: [29127.580968]  0000000000011180 
0000000000011180 ffff8819dc9b5fd8 ffff8819dc9b5fd8
Jun 28 19:34:11 IMCNode000 kernel: [29127.580975]  0000000000011180 
ffff8819dc9b2540 0000000085227080 ffff883e86942400
Jun 28 19:34:11 IMCNode000 kernel: [29127.580982] Call Trace:
Jun 28 19:34:11 IMCNode000 kernel: [29127.580991]  [<ffffffff81446555>] 
schedule_timeout+0x205/0x2a0
Jun 28 19:34:11 IMCNode000 kernel: [29127.580998]  [<ffffffff814472f8>] 
__down+0x68/0xb0
Jun 28 19:34:11 IMCNode000 kernel: [29127.581008]  [<ffffffff81082097>] 
down+0x47/0x50
Jun 28 19:34:11 IMCNode000 kernel: [29127.581030]  [<ffffffffa04a3ce9>] 
xfs_buf_lock+0x59/0xc0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581104]  [<ffffffffa04a522b>] 
_xfs_buf_find+0x11b/0x210 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581176]  [<ffffffffa04a5356>] 
xfs_buf_get+0x36/0x1d0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581248]  [<ffffffffa04a5514>] 
xfs_buf_read+0x24/0x110 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581322]  [<ffffffffa049a8ce>] 
xfs_trans_read_buf+0x28e/0x3e0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581393]  [<ffffffffa0467f5e>] 
xfs_btree_read_buf_block+0x5e/0xd0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581438]  [<ffffffffa0468682>] 
xfs_btree_lookup_get_block+0x82/0xf0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581480]  [<ffffffffa0468d32>] 
xfs_btree_lookup+0xd2/0x470 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581522]  [<ffffffffa045f967>] 
xfs_bmap_del_extent+0x147/0xac0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581560]  [<ffffffffa0463857>] 
xfs_bunmapi+0x5d7/0xbc0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581601]  [<ffffffffa0482cc8>] 
xfs_itruncate_finish+0x168/0x330 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581657]  [<ffffffffa049e5ce>] 
xfs_free_eofblocks+0x1ce/0x2b0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581725]  [<ffffffffa049f18c>] 
xfs_release+0x13c/0x230 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581793]  [<ffffffffa04a6f00>] 
xfs_file_release+0x10/0x20 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.581855]  [<ffffffff81153e83>] 
__fput+0xb3/0x200
Jun 28 19:34:11 IMCNode000 kernel: [29127.581863]  [<ffffffff8114fefc>] 
filp_close+0x5c/0x90
Jun 28 19:34:11 IMCNode000 kernel: [29127.581870]  [<ffffffff8114ffd7>] 
sys_close+0xa7/0x100
Jun 28 19:34:11 IMCNode000 kernel: [29127.581879]  [<ffffffff8144fd12>] 
system_call_fastpath+0x16/0x1b
Jun 28 19:34:11 IMCNode000 kernel: [29127.581897]  [<00007f90392c2ea0>] 
0x7f90392c2e9f
Jun 28 19:34:11 IMCNode000 kernel: [29127.581903] Tigris_IMC.exe  D 
ffff887ef8994710     0  6352   6269 0x00000000
Jun 28 19:34:11 IMCNode000 kernel: [29127.581909]  ffff8819dcbe7848 
0000000000000086 ffffffffa0480c7e ffff8819dcbe6010
Jun 28 19:34:11 IMCNode000 kernel: [29127.581916]  0000000000011180 
0000000000011180 ffff8819dcbe7fd8 ffff8819dcbe7fd8
Jun 28 19:34:11 IMCNode000 kernel: [29127.581923]  0000000000011180 
ffff8819dcbe4680 ffff883e85249e80 ffff883e86a123c0
Jun 28 19:34:11 IMCNode000 kernel: [29127.581930] Call Trace:
Jun 28 19:34:11 IMCNode000 kernel: [29127.581940]  [<ffffffff81446555>] 
schedule_timeout+0x205/0x2a0
Jun 28 19:34:11 IMCNode000 kernel: [29127.581952]  [<ffffffff814472f8>] 
__down+0x68/0xb0
Jun 28 19:34:11 IMCNode000 kernel: [29127.581959]  [<ffffffff81082097>] 
down+0x47/0x50
Jun 28 19:34:11 IMCNode000 kernel: [29127.581979]  [<ffffffffa04a3ce9>] 
xfs_buf_lock+0x59/0xc0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582050]  [<ffffffffa04a522b>] 
_xfs_buf_find+0x11b/0x210 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582121]  [<ffffffffa04a5356>] 
xfs_buf_get+0x36/0x1d0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582191]  [<ffffffffa04a5514>] 
xfs_buf_read+0x24/0x110 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582264]  [<ffffffffa049a8ce>] 
xfs_trans_read_buf+0x28e/0x3e0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582330]  [<ffffffffa0467f5e>] 
xfs_btree_read_buf_block+0x5e/0xd0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582371]  [<ffffffffa0468682>] 
xfs_btree_lookup_get_block+0x82/0xf0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582413]  [<ffffffffa0468d32>] 
xfs_btree_lookup+0xd2/0x470 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582454]  [<ffffffffa045f967>] 
xfs_bmap_del_extent+0x147/0xac0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582494]  [<ffffffffa0463857>] 
xfs_bunmapi+0x5d7/0xbc0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582541]  [<ffffffffa0482cc8>] 
xfs_itruncate_finish+0x168/0x330 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582597]  [<ffffffffa049e5ce>] 
xfs_free_eofblocks+0x1ce/0x2b0 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582665]  [<ffffffffa049f18c>] 
xfs_release+0x13c/0x230 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582733]  [<ffffffffa04a6f00>] 
xfs_file_release+0x10/0x20 [xfs]
Jun 28 19:34:11 IMCNode000 kernel: [29127.582794]  [<ffffffff81153e83>] 
__fput+0xb3/0x200
Jun 28 19:34:11 IMCNode000 kernel: [29127.582801]  [<ffffffff8114fefc>] 
filp_close+0x5c/0x90
Jun 28 19:34:11 IMCNode000 kernel: [29127.582807]  [<ffffffff8114ffd7>] 
sys_close+0xa7/0x100
Jun 28 19:34:11 IMCNode000 kernel: [29127.582814]  [<ffffffff8144fd12>] 
system_call_fastpath+0x16/0x1b
Jun 28 19:34:11 IMCNode000 kernel: [29127.582823]  [<00007f90392c2ea0>] 
0x7f90392c2e9f


xfsprogs-3.1.6-0.7.1

xfs_info /raid
meta-data=/dev/sdb1              isize=256    agcount=4, agsize=243896192 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=975584768, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=476359, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


fdisk -l

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk 
doesn't support GPT. Use GNU Parted.
Disk /dev/sdb: 3996.0 GB, 3995997306880 bytes
255 heads, 63 sectors/track, 485819 cylinders, total 7804682240 sectors Units = 
sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 
bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 
0x80560481

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb4               1           1           0+  ee  GPT

 /proc/partitions
major minor  #blocks  name

   8       16 3902341120 sdb
   8       17 3902339072 sdb1
   8        0  124022784 sda
   8        1    1051648 sda1
   8        2    1051648 sda2
   8        3  121918464 sda3

/proc/meminfo
MemTotal:       526536468 kB
MemFree:        520264928 kB
Buffers:           16644 kB
Cached:           706520 kB
SwapCached:            0 kB
Active:           580108 kB
Inactive:         240800 kB
Active(anon):      98036 kB
Inactive(anon):       68 kB
Active(file):     482072 kB
Inactive(file):   240732 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1051644 kB
SwapFree:        1051644 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         97868 kB
Mapped:            13212 kB
Shmem:               236 kB
Slab:             125960 kB
SReclaimable:      17020 kB
SUnreclaim:       108940 kB
KernelStack:        3264 kB
PageTables:         3120 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    522322744 kB
Committed_AS:     432276 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1100980 kB
VmallocChunk:   34090199776 kB
HardwareCorrupted:     0 kB
AnonHugePages:     16384 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      129732 kB
DirectMap2M:     6133760 kB
DirectMap1G:    530579456 kB


Many thanks in advance,

Norman 


Attachment: mess_sysrq_6_28_7pm.txt
Description: mess_sysrq_6_28_7pm.txt

<Prev in Thread] Current Thread [Next in Thread>