On Sonntag, 31. März 2013 12:22:31 Dave Chinner wrote:
> On Fri, Mar 29, 2013 at 03:59:46PM -0400, Dave Hall wrote:
> > Dave, Stan,
> > Here is the link for perf top -U: http://pastebin.com/JYLXYWki.
> > The ag report is at http://pastebin.com/VzziSa4L. Interestingly,
> > the backups ran fast a couple times this week. Once under 9 hours.
> > Today it looks like it's running long again.
> 12.38% [xfs] [k] xfs_btree_get_rec
> 11.65% [xfs] [k] _xfs_buf_find
> 11.29% [xfs] [k] xfs_btree_increment
> 7.88% [xfs] [k] xfs_inobt_get_rec
> 5.40% [kernel] [k] intel_idle
> 4.13% [xfs] [k] xfs_btree_get_block
> 4.09% [xfs] [k] xfs_dialloc
> 3.21% [xfs] [k] xfs_btree_readahead
> 2.00% [xfs] [k] xfs_btree_rec_offset
> 1.50% [xfs] [k] xfs_btree_rec_addr
> Inode allocation searches, looking for an inode near to the parent
> Whatthis indicates is that you have lots of sparsely allocated inode
> chunks on disk. i.e. each 64 indoe chunk has some free inodes in it,
> and some used inodes. This is Likely due to random removal of inodes
> as you delete old backups and link counts drop to zero. Because we
> only index inodes on "allocated chunks", finding a chunk that has a
> free inode can be like finding a needle in a haystack. There are
> heuristics used to stop searches from consuming too much CPU, but it
> still can be quite slow when you repeatedly hit those paths....
> I don't have an answer that will magically speed things up for
> you right now...
Hmm, unfortunately, this access pattern is pretty common, at least all "cp -al
& rsync" based backup solutions will suffer from it after a while. I noticed,
that the "removing old backups" part is also taking *ages* in this scenario.
I had to manually remove parts of a backup (subtrees with a few million
ordinary files, massively hardlinked as usual), that took 4-5 hours for each
run on a Hitachi Ultrastar 7K4000 drive. For the 8 subtrees, that finally took
one and a half day, freeing about 500 GB space. Oh well.
The question is: is it (logically) possible to reorganize the fragmented inode
allocation space with a specialized tool (to be implemented), that lays out
the allocation space in such a way, that matches XFS earliest "expectations",
or does that violate some deeper FS logic, I'm not aware of?
I have to mention, that I haven't made any tests with other file systems, as
playing games with backups ranges very low on my scale of sensible tests, but
experience has shown, that XFS usually sucks less than its alternatives, even
if the access pattern don't match its primary optimization domain.
Hence, implementing such a tool makes sense, where "least sucking" should be