<div dir="ltr">Thanks for your quick and clear response. Some comments bellow:<br><div class="gmail_extra"><br><br><div class="gmail_quote">2014-08-25 13:18 GMT+08:00 Dave Chinner <span dir="ltr"><<a href="mailto:david@fromorbit.com" target="_blank">david@fromorbit.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Aug 25, 2014 at 11:34:34AM +0800, Zhang Qiang wrote:<br>
> Dear XFS community & developers,<br>
><br>
> I am using CentOS 6.3 and xfs as base file system and use RAID5 as hardware<br>
> storage.<br>
><br>
> Detail environment as follow:<br>
> OS: CentOS 6.3<br>
> Kernel: kernel-2.6.32-279.el6.x86_64<br>
> XFS option info(df output): /dev/sdb1 on /data type xfs<br>
> (rw,noatime,nodiratime,nobarrier)<br>
><br>
> Detail phenomenon:<br>
><br>
> # df<br>
> Filesystem Size Used Avail Use% Mounted on<br>
> /dev/sda1 29G 17G 11G 61% /<br>
> /dev/sdb1 893G 803G 91G 90% /data<br>
> /dev/sda4 2.2T 1.6T 564G 75% /data1<br>
><br>
> # time touch /data1/1111<br>
> real 0m23.043s<br>
> user 0m0.001s<br>
> sys 0m0.349s<br>
><br>
> # perf top<br>
> Events: 6K cycles<br>
> 16.96% [xfs] [k] xfs_inobt_get_rec<br>
> 11.95% [xfs] [k] xfs_btree_increment<br>
> 11.16% [xfs] [k] xfs_btree_get_rec<br>
> 7.39% [xfs] [k] xfs_btree_get_block<br>
> 5.02% [xfs] [k] xfs_dialloc<br>
> 4.87% [xfs] [k] xfs_btree_rec_offset<br>
> 4.33% [xfs] [k] xfs_btree_readahead<br>
> 4.13% [xfs] [k] _xfs_buf_find<br>
> 4.05% [kernel] [k] intel_idle<br>
> 2.89% [xfs] [k] xfs_btree_rec_addr<br>
> 1.04% [kernel] [k] kmem_cache_free<br>
><br>
><br>
> It seems that some xfs kernel function spend much time (xfs_inobt_get_rec,<br>
> xfs_btree_increment, etc.)<br>
><br>
> I found a bug in bugzilla [1], is that is the same issue like this?<br>
<br>
</div></div>No.<br></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class=""><br>
> It's very greatly appreciated if you can give constructive suggestion about<br>
> this issue, as It's really hard to reproduce from another system and it's<br>
> not possible to do upgrade on that online machine.<br>
<br>
</div>You've got very few free inodes, widely distributed in the allocated<br>
inode btree. The CPU time above is the btree search for the next<br>
free inode.<br>
<br>
This is the issue solved by this series of recent commits to add a<br>
new on-disk free inode btree index:<br></blockquote><div>[Qiang] This meas that if I want to fix this issue, I have to apply the following patches and build my own kernel.</div><div><br></div><div>As the on-disk structure has been changed, so should I also re-create xfs filesystem again? is there any user space tools to convert old disk filesystem to new one, and don't need to backup and restore currently data?</div>
<div><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
53801fd xfs: enable the finobt feature on v5 superblocks<br>
0c153c1 xfs: report finobt status in fs geometry<br>
a3fa516 xfs: add finobt support to growfs<br>
3efa4ff xfs: update the finobt on inode free<br>
2b64ee5 xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper<br>
6dd8638 xfs: use and update the finobt on inode allocation<br>
0aa0a75 xfs: insert newly allocated inode chunks into the finobt<br>
9d43b18 xfs: update inode allocation/free transaction reservations for finobt<br>
aafc3c2 xfs: support the XFS_BTNUM_FINOBT free inode btree type<br>
8e2c84d xfs: reserve v5 superblock read-only compat. feature bit for finobt<br>
57bd3db xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers<br>
<br>
Which is of no help to you, however, because it's not available in<br>
any CentOS kernel.<br></blockquote><div>[Qiang] Do you think if it's possible to just backport these patches to kernel 6.2.32 (CentOS 6.3) to fix this issue?</div><div><br></div><div>Or it's better to backport to 3.10 kernel, used in CentOS 7.0?</div>
<div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
There's really not much you can do to avoid the problem once you've<br>
punched random freespace holes in the allocated inode btree. IT<br>
generally doesn't affect many people; those that it does affect are<br>
normally using XFS as an object store indexed by a hard link farm<br>
(e.g. various backup programs do this).<br></blockquote><div>OK, I see.</div><div><br></div><div>Could you please guide me to reproduce this issue easily? as I have tried to use a 500G xfs partition, and use about 98 % spaces, but still can't reproduce this issue. Is there any easy way from your mind?</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
If you dump the superblock via xfs_db, the difference between icount<br>
and ifree will give you idea of how much "needle in a haystack"<br>
searching is going on. You can probably narrow it down to a specific<br>
AG by dumping the AGI headers and checking the same thing. filling<br>
in all the holes (by creating a bunch of zero length files in the<br>
appropriate AGs) might take some time, but it should make the<br>
problem go away until you remove more filesystem and create random<br>
free inode holes again...<br></blockquote><div> </div><div>I will try to investigate the detail issue.</div><div><br></div><div>Thanks for your kindly response.</div><div>Qiang</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Cheers,<br>
<br>
Dave.<br>
<span class="HOEnZb"><font color="#888888">--<br>
Dave Chinner<br>
<a href="mailto:david@fromorbit.com">david@fromorbit.com</a><br>
</font></span></blockquote></div><br></div></div>