On Tue, Sep 14, 2010 at 01:48:26PM -0500, Alex Elder wrote:
> On Tue, 2010-09-14 at 20:56 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > Having multiple CPUs trying to do the same cache shrinking work can
> > be actively harmful to perforamnce when the shrinkers land in the
> > same AGs. They then lockstep on perag locks, causing contention and
> > slowing each other down. Reclaim walking is sufficiently efficient
> > that we do no need parallelism to make significant progress, so stop
> > parallel access at the door.
> > Instead, keep track of the number of objects the shrinkers want
> > cleaned and make sure the single running shrinker does not stop
> > until it has hit the threshold that the other shrinker calls have
> > built up.
> > This increases the cold-cache unlink rate of a 8-way parallel unlink
> > workload from about 15,000 unlinks/s to 60-70,000 unlinks/s for the
> > same CPU usage (~700%), resulting in the runtime for a 200M inode
> > unlink workload dropping from 4h50m to just under 1 hour.
> This is an aside, but...
> Shrinking still hits the first AG's more than the rest,
> right? I.e. if AG 0 has nr_to_scan reclaimable inodes, no
> other AG's get their inodes reclaimed?
It aggregates across all AGs, so if AG zero has none, then it moves
to AG 1...
I'm actually considering respinning this patch to be a little
different. I've got a prototype that just does a full non-blocking
reclaim run if nr_to_scan != 0 and then returns -1. It seems to
result in much better dentry/inode/xfs_cache balance, kswapd CPU
time drops dramatically, it doesn't affect create perfromance at all
and unlink performance becomes much,much more consistent and drops
from ~14m30s down to ~11m30s for 50M inodes.
I only made thしs change late last night, so I'll do some more
testing before going any further with it.