On Tue, Sep 14, 2010 at 12:27:42PM -0400, Christoph Hellwig wrote:
> On Tue, Sep 14, 2010 at 08:56:04PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > With delayed logging greatly increasing the sustained
> > parallelism of inode operations, the inode cache locking is
> > showing significant read vs write contention when inode reclaim
> > runs at the same time as lookups. There is also a lot more write
> > lock acquistions than there are read locks (4:1 ratio) so the
> > read locking is not really buying us much in the way of
> > parallelism.
> That's just for your parallel creates workload, isn't it? If we'd
> get that bad hit rates on normal workloads something is pretty
> wrong with the inode cache.
I see it on the unlink workloads, as well as any use-once workload I
care to run (e.g. du, find, grep, xfsdump, xfs_fsr, etc) on a
fileystem with more inodes in it that can fit in the currently
available memory. When we have userspace memory pressure, the amount
of memory available for caching can be pretty low, so this is quite
a common situation.
> For a workload with 4 times as many writelocks just changing the
> rwlock to a spinlock should provide more benefits. Did you test what
> effect this has on other workloads?
Not yet - I've run it through xfsqa on single CPU, 2p and 4p
machines for the past few days, but I haven't benchmarked it
comparitively on other workloads yet. The whole patchset needs
> In addition to that I feel really uneasy about changes to the inode
> cache locking without really heavy NFS server testing - we just had too
> many issues in this area in the past.
Agreed, and that's something I need to test against.
> > To avoid the read vs write contention, change the cache to use RCU locking
> > on
> > the read side. To avoid needing to RCU free every single inode, use the
> > built
> > in slab RCU freeing mechanism. This requires us to be able to detect
> > lookups of
> > freed inodes, so en??ure that ever freed inode has an inode number of zero
> > and
> > the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache
> > hit
> > lookup path, but also add a check for a zero inode number as well.
> How does this interact with slab poisoning?
if (!(flags & SLAB_DESTROY_BY_RCU))
flags |= SLAB_POISON;
if (flags & SLAB_DESTROY_BY_RCU)
BUG_ON(flags & SLAB_POISON);
and SLUB does an equivalent setup where OBJECT_POISON is not set if
SLAB_DESTROY_BY_RCU is set. So it effectively turns off slab
poisoning, but leaves all the other debug checks active.