bad performance on touch/cp file on XFS system

Zhang Qiang zhangqiang.buaa at gmail.com
Mon Aug 25 04:05:33 CDT 2014


Great, thank you.

>From my xfs_db debug, I found I have icount and ifree as follow:

icount = 220619904
ifree = 26202919

So the number of free inode take about 10%, so that's not so few.

So, are you still sure the patches can fix this issue?

Here's the detail xfs_db info:

# mount /dev/sda4 /data1/
# xfs_info /data1/
meta-data=/dev/sda4              isize=256    agcount=4, agsize=142272384
blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=569089536, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=277875, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# umount /dev/sda4
# xfs_db /dev/sda4
xfs_db> sb 0
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 569089536
rblocks = 0
rextents = 0
uuid = 13ecf47b-52cf-4944-9a71-885bddc5e008
logstart = 536870916
rootino = 128
rbmino = 129
rsumino = 130
rextsize = 1
agblocks = 142272384
agcount = 4
rbmblocks = 0
logblocks = 277875
versionnum = 0xb4a4
sectsize = 512
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 28
rextslog = 0
inprogress = 0
imax_pct = 5
icount = 220619904
ifree = 26202919
fdblocks = 147805479
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 1
features2 = 0xa
bad_features2 = 0xa
xfs_db> sb 1
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 569089536
rblocks = 0
rextents = 0
uuid = 13ecf47b-52cf-4944-9a71-885bddc5e008
logstart = 536870916
rootino = 128
rbmino = null
rsumino = null
rextsize = 1
agblocks = 142272384
agcount = 4
rbmblocks = 0
logblocks = 277875
versionnum = 0xb4a4
sectsize = 512
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 28
rextslog = 0
inprogress = 1
imax_pct = 5
icount = 0
ifree = 0
fdblocks = 568811645
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 1
features2 = 0xa
bad_features2 = 0xa
xfs_db> sb 2
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 569089536
rblocks = 0
rextents = 0
uuid = 13ecf47b-52cf-4944-9a71-885bddc5e008
logstart = 536870916
rootino = null
rbmino = null
rsumino = null
rextsize = 1
agblocks = 142272384
agcount = 4
rbmblocks = 0
logblocks = 277875
versionnum = 0xb4a4
sectsize = 512
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 28
rextslog = 0
inprogress = 1
imax_pct = 5
icount = 0
ifree = 0
fdblocks = 568811645
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 1
features2 = 0xa
bad_features2 = 0xa
xfs_db> sb 3
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 569089536
rblocks = 0
rextents = 0
uuid = 13ecf47b-52cf-4944-9a71-885bddc5e008
logstart = 536870916
rootino = 128
rbmino = null
rsumino = null
rextsize = 1
agblocks = 142272384
agcount = 4
rbmblocks = 0
logblocks = 277875
versionnum = 0xb4a4
sectsize = 512
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 28
rextslog = 0
inprogress = 1
imax_pct = 5
icount = 0
ifree = 0
fdblocks = 568811645
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 1
features2 = 0xa
bad_features2 = 0xa


Thanks
Qiang



2014-08-25 16:56 GMT+08:00 Dave Chinner <david at fromorbit.com>:

> On Mon, Aug 25, 2014 at 04:09:05PM +0800, Zhang Qiang wrote:
> > Thanks for your quick and clear response. Some comments bellow:
> >
> >
> > 2014-08-25 13:18 GMT+08:00 Dave Chinner <david at fromorbit.com>:
> >
> > > On Mon, Aug 25, 2014 at 11:34:34AM +0800, Zhang Qiang wrote:
> > > > Dear XFS community & developers,
> > > >
> > > > I am using CentOS 6.3 and xfs as base file system and use RAID5 as
> > > hardware
> > > > storage.
> > > >
> > > > Detail environment as follow:
> > > >    OS: CentOS 6.3
> > > >    Kernel: kernel-2.6.32-279.el6.x86_64
> > > >    XFS option info(df output): /dev/sdb1 on /data type xfs
> > > > (rw,noatime,nodiratime,nobarrier)
> ....
>
> > > > It's very greatly appreciated if you can give constructive suggestion
> > > about
> > > > this issue, as It's really hard to reproduce from another system and
> it's
> > > > not possible to do upgrade on that online machine.
> > >
> > > You've got very few free inodes, widely distributed in the allocated
> > > inode btree. The CPU time above is the btree search for the next
> > > free inode.
> > >
> > > This is the issue solved by this series of recent commits to add a
> > > new on-disk free inode btree index:
> > >
> > [Qiang] This meas that if I want to fix this issue, I have to apply the
> > following patches and build my own kernel.
>
> Yes. Good luck, even I wouldn't attempt to do that.
>
> And then use xfsprogs 3.2.1, and make a new filesystem that enables
> metadata CRCs and the free inode btree feature.
>
> > As the on-disk structure has been changed, so should I also re-create xfs
> > filesystem again?
>
> Yes, you need to download the latest xfsprogs (3.2.1) to be able to
> make it with the necessary feature bits set.
>
> > is there any user space tools to convert old disk
> > filesystem to new one, and don't need to backup and restore currently
> data?
>
> No, we don't write utilities to mangle on disk formats. dump, mkfs
> and restore is far more reliable than any "in-place conversion" code
> we could write. It will probably be faster, too.
>
> > > Which is of no help to you, however, because it's not available in
> > > any CentOS kernel.
> > >
> > [Qiang] Do you think if it's possible to just backport these patches to
> > kernel  6.2.32 (CentOS 6.3) to fix this issue?
> >
> > Or it's better to backport to 3.10 kernel, used in CentOS 7.0?
>
> You can try, but if you break it you get to keep all the pieces
> yourself. Eventually someone who maintains the RHEL code will do a
> backport that will trickle down to CentOS. If you need it any
> sooner, then you'll need to do it yourself, or upgrade to RHEL
> and ask your support contact for it to be included in RHEL 7.1....
>
> > > There's really not much you can do to avoid the problem once you've
> > > punched random freespace holes in the allocated inode btree. IT
> > > generally doesn't affect many people; those that it does affect are
> > > normally using XFS as an object store indexed by a hard link farm
> > > (e.g. various backup programs do this).
> > >
> > OK, I see.
> >
> > Could you please guide me to reproduce this issue easily? as I have tried
> > to use a 500G xfs partition, and use about 98 % spaces, but still can't
> > reproduce this issue. Is there any easy way from your mind?
>
> Search the archives for the test cases that were used for the patch
> set. There's a performance test case documented in the review
> discussions.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20140825/c514a671/attachment.html>


More information about the xfs mailing list