<div dir="ltr">I initial message was bounced due to large message size; and since then re-posting seems not to work. Using a different email address hoping would help. If you get this in duplicate, my apologies.<div><br></div>
<div><div>I had this issue with Kernel 3.0.13 and reported in this forum ~4 months ago. I'd been working with Suse support to resolve but getting no where. I will be grateful should someone can help and give me some pointers to solve or work around this hang. Suse has asked me to reproduce it with 3.0.80-0.5(their latest release) so all the attached traces were produced against this kernel.</div>
<div><br></div><div>My application has ~10+ threads running non-stop with high CPU usage (close to 100% and each is locked to a specific CPU with 32 CPUs in the server). The processing runs in a 5 minute cycle. After the second minute 5 of these threads start writing to disk for 3 minutes. Each thread writes a file ~16G in size. The threads run with very high priority (-2, much higher than flush, I also tried threads with same priority as pdflush, fails sooner)</div>
<div><br></div><div>Disk is a raid-5, I have also tried 5 X raid-0, one raid per disk writing thread same result.</div><div><br></div><div>Below are the traces from magic sysrq "w"</div><div><br></div><div>If I use O_DIRECT write, my application will run for 5-6 days and hang. If I use buffered write, it will hang after 3-4 hours. </div>
<div><br></div><div>And when I attached GDB to my applications, the flush-8:16 will immediately free from D state, and my application too.</div><div><br></div><div>Without knowing the internal of flush(write back) and XFS, this is my guess looking at the trace below:</div>
<div>Could it be that flush(xfs_fs_write_inode) and my application (xfs_file_release at fclose) are holding the same lock? Or could it be that some resources in the system (e.g. page cache?) is exhausted and waiting for hang in waiting for reclaim.</div>
<div> </div><div>Stack trace of a few thread of interests. The entire dump if interested, I can send in a separate mail:</div><div><br></div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580426] SysRq : Show Blocked State</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580434] task PC stack pid father</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580629] flush-8:16 D 0000000000000400 0 5502 2 0x00000000</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580636] ffff883e5bbb7b90 0000000000000046 ffff8819c91d25c0 ffff883e5bbb6010</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580643] 0000000000011180 0000000000011180 ffff883e5bbb7fd8 ffff883e5bbb7fd8</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580651] 0000000000011180 ffff883e5b8aa0c0 ffff8819c91d25c0 ffff8819dc99a3c0</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580658] Call Trace:</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580711] [<ffffffffa04a2cd5>] xfs_ioend_wait+0x85/0xb0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580800] [<ffffffffa04acb28>] xfs_fs_write_inode+0xb8/0x190 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580880] [<ffffffff81177793>] writeback_single_inode+0x2f3/0x360</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580889] [<ffffffff81177ee4>] writeback_sb_inodes+0xc4/0x1a0</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580897] [<ffffffff81178743>] writeback_inodes_wb+0xd3/0x160</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580904] [<ffffffff81178b8b>] wb_writeback+0x3bb/0x400</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580911] [<ffffffff81178e0b>] wb_do_writeback+0x23b/0x250</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580918] [<ffffffff81178f04>] bdi_writeback_thread+0xe4/0x240</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580926] [<ffffffff8107c646>] kthread+0x96/0xa0</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580936] [<ffffffff81450e44>] kernel_thread_helper+0x4/0x10</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580954] Tigris_IMC.exe D 0000000100669478 0 6347 6269 0x00000000</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580960] ffff8819dc9b5848 0000000000000086 0000000000000000 ffff8819dc9b4010</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580968] 0000000000011180 0000000000011180 ffff8819dc9b5fd8 ffff8819dc9b5fd8</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580975] 0000000000011180 ffff8819dc9b2540 0000000085227080 ffff883e86942400</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580982] Call Trace:</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580991] [<ffffffff81446555>] schedule_timeout+0x205/0x2a0</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.580998] [<ffffffff814472f8>] __down+0x68/0xb0</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581008] [<ffffffff81082097>] down+0x47/0x50</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581030] [<ffffffffa04a3ce9>] xfs_buf_lock+0x59/0xc0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581104] [<ffffffffa04a522b>] _xfs_buf_find+0x11b/0x210 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581176] [<ffffffffa04a5356>] xfs_buf_get+0x36/0x1d0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581248] [<ffffffffa04a5514>] xfs_buf_read+0x24/0x110 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581322] [<ffffffffa049a8ce>] xfs_trans_read_buf+0x28e/0x3e0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581393] [<ffffffffa0467f5e>] xfs_btree_read_buf_block+0x5e/0xd0 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581438] [<ffffffffa0468682>] xfs_btree_lookup_get_block+0x82/0xf0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581480] [<ffffffffa0468d32>] xfs_btree_lookup+0xd2/0x470 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581522] [<ffffffffa045f967>] xfs_bmap_del_extent+0x147/0xac0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581560] [<ffffffffa0463857>] xfs_bunmapi+0x5d7/0xbc0 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581601] [<ffffffffa0482cc8>] xfs_itruncate_finish+0x168/0x330 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581657] [<ffffffffa049e5ce>] xfs_free_eofblocks+0x1ce/0x2b0 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581725] [<ffffffffa049f18c>] xfs_release+0x13c/0x230 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581793] [<ffffffffa04a6f00>] xfs_file_release+0x10/0x20 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581855] [<ffffffff81153e83>] __fput+0xb3/0x200</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581863] [<ffffffff8114fefc>] filp_close+0x5c/0x90</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581870] [<ffffffff8114ffd7>] sys_close+0xa7/0x100</div><div>
Jun 28 19:34:11 IMCNode000 kernel: [29127.581879] [<ffffffff8144fd12>] system_call_fastpath+0x16/0x1b</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581897] [<00007f90392c2ea0>] 0x7f90392c2e9f</div><div>
Jun 28 19:34:11 IMCNode000 kernel: [29127.581903] Tigris_IMC.exe D ffff887ef8994710 0 6352 6269 0x00000000</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581909] ffff8819dcbe7848 0000000000000086 ffffffffa0480c7e ffff8819dcbe6010</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581916] 0000000000011180 0000000000011180 ffff8819dcbe7fd8 ffff8819dcbe7fd8</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581923] 0000000000011180 ffff8819dcbe4680 ffff883e85249e80 ffff883e86a123c0</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581930] Call Trace:</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581940] [<ffffffff81446555>] schedule_timeout+0x205/0x2a0</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581952] [<ffffffff814472f8>] __down+0x68/0xb0</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581959] [<ffffffff81082097>] down+0x47/0x50</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.581979] [<ffffffffa04a3ce9>] xfs_buf_lock+0x59/0xc0 [xfs]</div><div>
Jun 28 19:34:11 IMCNode000 kernel: [29127.582050] [<ffffffffa04a522b>] _xfs_buf_find+0x11b/0x210 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582121] [<ffffffffa04a5356>] xfs_buf_get+0x36/0x1d0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582191] [<ffffffffa04a5514>] xfs_buf_read+0x24/0x110 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582264] [<ffffffffa049a8ce>] xfs_trans_read_buf+0x28e/0x3e0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582330] [<ffffffffa0467f5e>] xfs_btree_read_buf_block+0x5e/0xd0 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582371] [<ffffffffa0468682>] xfs_btree_lookup_get_block+0x82/0xf0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582413] [<ffffffffa0468d32>] xfs_btree_lookup+0xd2/0x470 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582454] [<ffffffffa045f967>] xfs_bmap_del_extent+0x147/0xac0 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582494] [<ffffffffa0463857>] xfs_bunmapi+0x5d7/0xbc0 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582541] [<ffffffffa0482cc8>] xfs_itruncate_finish+0x168/0x330 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582597] [<ffffffffa049e5ce>] xfs_free_eofblocks+0x1ce/0x2b0 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582665] [<ffffffffa049f18c>] xfs_release+0x13c/0x230 [xfs]</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582733] [<ffffffffa04a6f00>] xfs_file_release+0x10/0x20 [xfs]</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582794] [<ffffffff81153e83>] __fput+0xb3/0x200</div>
<div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582801] [<ffffffff8114fefc>] filp_close+0x5c/0x90</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582807] [<ffffffff8114ffd7>] sys_close+0xa7/0x100</div><div>
Jun 28 19:34:11 IMCNode000 kernel: [29127.582814] [<ffffffff8144fd12>] system_call_fastpath+0x16/0x1b</div><div>Jun 28 19:34:11 IMCNode000 kernel: [29127.582823] [<00007f90392c2ea0>] 0x7f90392c2e9f</div><div>
<br></div><div><br></div><div>xfsprogs-3.1.6-0.7.1</div><div><br></div><div>xfs_info /raid</div><div>meta-data=/dev/sdb1 isize=256 agcount=4, agsize=243896192 blks</div><div> = sectsz=512 attr=2</div>
<div>data = bsize=4096 blocks=975584768, imaxpct=5</div><div> = sunit=0 swidth=0 blks</div><div>naming =version 2 bsize=4096 ascii-ci=0</div>
<div>log =internal bsize=4096 blocks=476359, version=2</div><div> = sectsz=512 sunit=0 blks, lazy-count=1</div><div>realtime =none extsz=4096 blocks=0, rtextents=0</div>
<div><br></div><div><br></div><div>fdisk -l</div><div><br></div><div>WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.</div><div>Disk /dev/sdb: 3996.0 GB, 3995997306880 bytes</div>
<div>255 heads, 63 sectors/track, 485819 cylinders, total 7804682240 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x80560481</div>
<div><br></div><div> Device Boot Start End Blocks Id System</div><div>/dev/sdb4 1 1 0+ ee GPT</div><div><br></div><div> /proc/partitions</div><div>major minor #blocks name</div>
<div><br></div><div> 8 16 3902341120 sdb</div><div> 8 17 3902339072 sdb1</div><div> 8 0 124022784 sda</div><div> 8 1 1051648 sda1</div><div> 8 2 1051648 sda2</div><div> 8 3 121918464 sda3</div>
<div><br></div><div>/proc/meminfo</div><div>MemTotal: 526536468 kB</div><div>MemFree: 520264928 kB</div><div>Buffers: 16644 kB</div><div>Cached: 706520 kB</div><div>SwapCached: 0 kB</div>
<div>Active: 580108 kB</div><div>Inactive: 240800 kB</div><div>Active(anon): 98036 kB</div><div>Inactive(anon): 68 kB</div><div>Active(file): 482072 kB</div><div>Inactive(file): 240732 kB</div>
<div>Unevictable: 0 kB</div><div>Mlocked: 0 kB</div><div>SwapTotal: 1051644 kB</div><div>SwapFree: 1051644 kB</div><div>Dirty: 0 kB</div><div>Writeback: 0 kB</div>
<div>AnonPages: 97868 kB</div><div>Mapped: 13212 kB</div><div>Shmem: 236 kB</div><div>Slab: 125960 kB</div><div>SReclaimable: 17020 kB</div><div>SUnreclaim: 108940 kB</div>
<div>KernelStack: 3264 kB</div><div>PageTables: 3120 kB</div><div>NFS_Unstable: 0 kB</div><div>Bounce: 0 kB</div><div>WritebackTmp: 0 kB</div><div>CommitLimit: 522322744 kB</div>
<div>Committed_AS: 432276 kB</div><div>VmallocTotal: 34359738367 kB</div><div>VmallocUsed: 1100980 kB</div><div>VmallocChunk: 34090199776 kB</div><div>HardwareCorrupted: 0 kB</div><div>AnonHugePages: 16384 kB</div>
<div>HugePages_Total: 0</div><div>HugePages_Free: 0</div><div>HugePages_Rsvd: 0</div><div>HugePages_Surp: 0</div><div>Hugepagesize: 2048 kB</div><div>DirectMap4k: 129732 kB</div><div>
DirectMap2M: 6133760 kB</div><div>DirectMap1G: 530579456 kB</div><div><br></div><div><br></div><div>Many thanks in advance,</div><div><br></div><div>Norman </div></div><div><br></div></div>