On Wed, Mar 28, 2012 at 11:10:41AM -0400, Christoph Hellwig wrote:
> On Wed, Mar 28, 2012 at 11:53:37AM +1100, Dave Chinner wrote:
> > in IO patterns and performance under heavy load here with this
> > patch set. it doesn't however, reduce the buffer cache lookups all
> > that much on such workloads - about 10% at most - as most of the
> > lookups are common from the directory and inode buffer
> > modifications. Here's a sample profile:
> 10% might not be extremly huge, but it's pretty significant.
Yes, I didn't mean to belittle the improvement it makes, as every
little bit helps, just that the buffer cache lookups are dominated
by other types of lookups.
> > This shows that 50% of the lookups from the directory code, 25% from
> > the inode btree lookups, 12% from mapping inodes, and 10% from
> > reading the AGI buffer during inode allocation.
> > You know, I suspect that we could avoid almost all those AGI buffer
> > lookups by moving to a similar in-core log and flush technique that
> > the inodes use. We've already got all the information in the struct
> > xfs_perag - rearranging it to have a "in-core on disk" structures
> > for the AGI, AGF and AGFL would make a lot of the "select an AG"
> > code much simpler than having to read and modify the AG buffers
> > directly. It might even be possible to do such a change without
> > needing to change the on-disk journal format for them...
I just had a crazy thought - it would be relatively easy to make
object based caches for finding buffers. Add an rbtree root to
various structures (e.g. inode, AGI, AGF, etc) and index all the
buffers associated with the btrees on that object in the object
rbtree. Need to find a directory/bmapbt/attr buffer? look up the
rbtree on the inode. Need to find a freespace btree buffer? lookup
the rbtree on the AGF.
I suspect that this can be done without much API churn, and it would
remove the central per-AG buffer cache lookups for most operations.
Smaller caches means less lookup overhead for most operations - with
10-11% of CPU time being spent in lookups on an 8p machine, that's
almost an entire CPU worth of time being used. Hence reducing the
rbtree lookup and modification overhead should be a significant win.
Crazy idea, yes, but I'm going to think about it some more,
especially as the shrinker operates of the LRU and is entirely
independent of the rbtree indexing.....
> > I think I'll put that on my list of stuff to do - right next to
> > in-core unlinked inode lists....
> Sounds fine. A simple short-term fix might be to simply pin a reference
> to the AGI buffers and add a pointer from struct xfs_perag to them.
I'd prefer not to do that - filesystems with lots of AGs will then
pin significant amounts of memory that would otherwise be
reclaimable. Besides, I don't think the problem is that significant
to need immediate resolution in such a way.