xfs
[Top] [All Lists]

Re: unexpected high fragmentation, any ideas?

To: Marc Lehmann <schmorp@xxxxxxxxxx>
Subject: Re: unexpected high fragmentation, any ideas?
From: Russell Cattelan <cattelan@xxxxxxx>
Date: Sun, 03 Apr 2005 10:45:24 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20050403135805.GC24559@xxxxxxxxxx>
References: <20050403004653.GA981@xxxxxxxxxx> <20050403050542.GB5727@xxxxxxxxxxxxxxxxxxxxx> <20050403135805.GC24559@xxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.9 (Macintosh/20041103)
You are right about one thing multiple writers will cause file interleaving.

It's interesting that delayed allocation is not helping as well as it should
XFS should be able to cluster delayed allocate pages together and thus
ask the allocator for larger contiguous block at one go.

I think the problem you are running into is that with a slow writing app
pdflush is pushing pages out to disk to quickly.
A way to test that is to increase the pdflush interval, don't remember which
proc value you need to change for that dirty_writback_centisecs I think.



Marc Lehmann wrote:

I was contacted in private to offer more information, and thoguht it might be
a good idea to let the list know:

* all files are being written into the same directory. like 2-4 slow "dd
 of=/xfs/video.dat"'s

* the disk is used exclusively for this application, no other writers
 are present.

(fragmentation reduces i/o performance)

I just mentioned that in case xfs would display a high extent count but
the file would, in fact, be almost contiguous. I now used xfs_bmap on a
recently-written file (2.2GB), it looks like this:

17_20050403135800_20050403150000.nuv:
EXT: FILE-OFFSET         BLOCK-RANGE          AG AG-OFFSET              TOTAL
  0: [0..895]:           161889248..161890143  8 (5598368..5599263)       896
  1: [896..1151]:        161882848..161883103  8 (5591968..5592223)       256
  2: [1152..101119]:     173099520..173199487  8 (16808640..16908607)   99968
  3: [101120..511231]:   195363656..195773767 10 (56..410167)          410112
  4: [511232..910975]:   214936240..215335983 11 (36280..436023)       399744
  5: [910976..987903]:   243994088..244071015 12 (9557768..9634695)     76928
  6: [987904..988927]:   238584712..238585735 12 (4148392..4149415)      1024
  7: [988928..989951]:   238583688..238584711 12 (4147368..4148391)      1024
  8: [989952..991999]:   238581640..238583687 12 (4145320..4147367)      2048
  9: [992000..994175]:   238579464..238581639 12 (4143144..4145319)      2176
 10: [994176..996351]:   238577280..238579455 12 (4140960..4143135)      2176
 11: [996352..998399]:   238575232..238577279 12 (4138912..4140959)      2048
 12: [998400..1000575]:  238573056..238575231 12 (4136736..4138911)      2176
 13: [1000576..1002623]: 238571008..238573055 12 (4134688..4136735)      2048
 14: [1002624..1003775]: 238569856..238571007 12 (4133536..4134687)      1152
 15: [1003776..1004799]: 238568832..238569855 12 (4132512..4133535)      1024
 16: [1004800..1005823]: 238567808..238568831 12 (4131488..4132511)      1024
 17: [1005824..1006847]: 238566784..238567807 12 (4130464..4131487)      1024
 18: [1006848..1007871]: 238565760..238566783 12 (4129440..4130463)      1024
 19: [1007872..1009023]: 238564608..238565759 12 (4128288..4129439)      1152
 20: [1009024..1010047]: 238563584..238564607 12 (4127264..4128287)      1024
 21: [1010048..1011071]: 238562560..238563583 12 (4126240..4127263)      1024
 22: [1011072..1012095]: 238561536..238562559 12 (4125216..4126239)      1024
 23: [1012096..1013119]: 238560512..238561535 12 (4124192..4125215)      1024
 24: [1013120..1014271]: 238559360..238560511 12 (4123040..4124191)      1152
 25: [1014272..1015295]: 238558336..238559359 12 (4122016..4123039)      1024
 26: [1015296..1016319]: 238557312..238558335 12 (4120992..4122015)      1024
 27: [1016320..1018367]: 238555264..238557311 12 (4118944..4120991)      2048
 28: [1018368..1019391]: 238554240..238555263 12 (4117920..4118943)      1024

... the remaining ~500 extents look very similar (~1024 block).

it looks as if there was only one writer initially (that's just a
conjecture), and that xfs simply interleaves write()'s by multiple writers
(1024 blocks is probably the i/o size the writers use, they use rather
large write()s).

looking at the extent above map, i also see this pattern quite often:

  when the block order is                   abcdefghi
  then xfs allocates extents in this order: abdcefhgi

i.e. it often swaps adjacent blocks, see, for example, pairs 6&7,
13&14. Looking at some other files this is quite common.

ext3 looks much better as it (seemingly) tries to allocate the files in
different block groups when multiple files are being written.

xfs_fsr, OTOH, does a perfect job - all files are single-extent files
after it ran, even when I run it while there are three other writers!

I'd run xfs_fsr continuously, but the i/o bandwidth lost is immense, and
xfs_fsr tends to copy gigabytes of a file and then detects that the file
is being modified, which somewhat precludes it's use on a busy filesystem.




Attachment: signature.asc
Description: OpenPGP digital signature

<Prev in Thread] Current Thread [Next in Thread>