[Top] [All Lists]

Re: [PATCH 4/4] xfs: convert buffer cache hash to rbtree

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH 4/4] xfs: convert buffer cache hash to rbtree
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 10 Sep 2010 13:22:00 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20100909015150.GB11362@xxxxxxxxxxxxx>
References: <1283958778-28610-1-git-send-email-david@xxxxxxxxxxxxx> <1283958778-28610-5-git-send-email-david@xxxxxxxxxxxxx> <20100909015150.GB11362@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Wed, Sep 08, 2010 at 09:51:50PM -0400, Christoph Hellwig wrote:
> On Thu, Sep 09, 2010 at 01:12:58AM +1000, Dave Chinner wrote:
> > I have selected rbtrees for indexing becuse they can have O(log n)
> > search scalability, and insert and remove cost is not excessive,
> > even on large trees. Hence we should be able to cache large numbers
> > of buffers without incurring the excessive cache miss search
> > penalties that the hash is imposing on us.
> Once thing that worries me about the rbtrees is that the Linux
> implementation doesn't allow for lockless readers.  But in the end the
> buffer cache implementation is very well encapsulated, so if the need
> arises we could easily change the underlying data structure.

Agreed. I'm going for simplicity of implementation first - list to
rbtree conversion is pretty trivial and realtively easy to verify.
We can revisit the choice of rbtrees later on if/when we need to.

> > +   /*
> > +    * The buftarg cache should never be used by external devices.
> > +    * Ensure we catch any users with extreme prejudice.
> > +    */
> > +   btp->bt_mp = external ? NULL : mp;
> I'd much prefer to always initialize this field.  We currently have a
> b_mount field struct xfs_buf which is used only in a few places
> and initialized rather, ehmm, lazily.  If we could replace it with
> ->b_target->bt_mount we can shrink struct buf and make the information
> available much more consistently.  Just adding the mount argument
> to the buftarg and removing it from the buf would be a nice little
> preparatory patch.

Good idea. I'll run up a patch to do that - if we've got more
buffers around, giving them a diet makes sense.

> And yes, I think bt_mount would be much nicer name than bt_mp.

Agreed. call me lazy ;)

> > @@ -210,8 +210,6 @@ xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno)
> >     pag = radix_tree_lookup(&mp->m_perag_tree, agno);
> >     if (pag) {
> >             ASSERT(atomic_read(&pag->pag_ref) >= 0);
> > -           /* catch leaks in the positive direction during testing */
> > -           ASSERT(atomic_read(&pag->pag_ref) < 1000);
> Di you manage to hit this during testing?  Either way it should probably
> be a separate patch.

Not with xfstests. Takes about 0.5s for fsmark to hit it, though. ;)
I'll put it in a separate patch, too.


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>