xfs_fsr question for improvement
Michael Monnerie
michael.monnerie at is.it-management.at
Mon May 3 01:49:43 CDT 2010
On Samstag, 17. April 2010 Dave Chinner wrote:
> They have thousands of extents in them and they are all between
> 8-10GB in size, and IO from my VMs are stiall capable of saturating
> the disks backing these files. While I'd normally consider these
> files fragmented and candidates for running fsr on tme, the number
> of extents is not actually a performance limiting factor and so
> there's no point in defragmenting them. Especially as that requires
> shutting down the VMs...
I personally care less about file fragmentation than about
metadata/inode/directory fragmentation. This server gets accesses from
numerous people,
# time find /mountpoint/ -inum 107901420
/mountpoint/some/dir/ectory/path/x.iso
real 7m50.732s
user 0m0.152s
sys 0m2.376s
It took nearly 8 minutes to search through that mount point, which is
6TB big on a RAID-5 striped over 7 2TB disks, so search speed should be
high. Especially as there are only 765.000 files on that disk:
Filesystem Inodes IUsed IFree IUse%
/mountpoint 1258291200 765659 1257525541 1%
Wouldn't you say an 8 minutes search over just 765.000 files is slow,
even when only using 7x 2TB 7200rpm disks in RAID-5?
> > Would it be possible xfs_fsr defrags the meta data in a way that
> > they are all together so seeks are faster?
>
> It's not related to fsr because fsr does not defragment metadata.
> Some metadata cannot be defragmented (e.g. inodes cannot be moved),
> some metadata cannot be manipulated directly (e.g. free space
> btrees), and some is just difficult to do (e.g. directory
> defragmentation) so hasn't ever been done.
I see. On this particular server I know it would be good for performance
to have the metadata defrag'ed, but that's not the aim of xfs_fsr.
But maybe some developer is bored once and finds a way to optimize the
search&find of files on an aged filesystem, i.e. metadata defrag :-)
I tried this two times:
# time find /mountpoint/ -inum 107901420
real 8m17.316s
user 0m0.148s
sys 0m1.964s
# time find /mountpoint/ -inum 107901420
real 0m30.113s
user 0m0.540s
sys 0m9.813s
Caching helps the 2nd time :-)
> > Currently, when I do "find /this_big_fs -inum 1234", it takes
> > *ages* for a run, while there are not so many files on it:
> > # iostat -kx 5 555
> > Device: r/s rkB/s avgrq-sz avgqu-sz await svctm
> > %util xvdb 23,20 92,80 8,00 0,42 15,28
> > 18,17 42,16 xvdc 20,20 84,00 8,32 0,57
> > 28,40 28,36 57,28
>
> Well, it's not XFS's fault that each read IO is taking 20-30ms. You
> can only do 30-50 IOs a second per drive at that rate, so:
>
> [...]
>
> > So I get 43 reads/second at 100% utilization. Well I can see up to
>
> This is right on the money - it's going as fast a your (slow) RAID-5
> volume will allow it to....
>
> > 150r/s, but still that's no "wow". A single run to find an inode
> > takes a very long time.
>
> Raid 5/6 generally provides the same IOPS performance as a single
> spindle, regardless of the width of the RAID stripe. A 2TB sata
> drive might be able to do 150-200 IOPS, so a RAID5 array made up of
> these drives will tend to max out at roughly the same....
Running xfs_fsr, I can see up to 1200r+1200w=2400I/Os per second:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-
sz avgqu-sz await svctm %util
xvdc 0,00 0,00 0,00 1191,42 0,00 52320,16
87,83 121,23 96,77 0,71 84,63
xvde 0,00 0,00 1226,35 0,00 52324,15 0,00
85,33 0,77 0,62 0,13 15,33
But on average it's about 600-700 read plus writes per second, so
1200-1400 IOPS.
Both "disks" are 2TB LVM volumes on the same raidset, I just had to
split it as XEN doesn't allow to create >2TB volumes.
So, the badly slow I/O I see during "find" are not happening during fsr.
How can that be?
I'm just running another "find" on a fresh remounted xfs, and I can see
the reads are happening on 2 of the 3 2TB volumes parallel:
Device: r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await
svctm %util
xvdb 103,20 0,00 476,80 0,00 9,24 0,46
4,52 4,50 46,40
xvdc 97,80 0,00 455,20 0,00 9,31 0,52
5,29 5,30 51,84
When I created that XFS, I took two 2TB partitions, did pvcreate,
vgcreate and lvcreate. Could it be that lvcreate automatically thought
it should do a RAID-0? Because all reads are equally split between the
two volumes. After a while, I added the 3rd 2TB volume, and I can't see
that behaviour there. So maybe this is the source of all evil.
BTW: I changed mount options "atime,diratime" to "relatime,reldiratime"
now and "find" runtime went from 8 minutes down to 7m14s.
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31
// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20100503/e6ce1e3b/attachment.sig>
More information about the xfs
mailing list