I recently made the mistake of enabling quotas on one of my regular
scalability tests - concurrent file creates - and discovered that
the quota modification serialised the entire workload. Not good.
Only two of these patches are really scalability patches - the first
patch in the series is a cleanup that gets rid of dquot hints.
The first scalability change is to not require the dquot lock when
taking references to the dquot. This is done simply by converting
the reference count to an atomic and replacing all all operations
with equivalent atomic variable operations. This means that we can
remove the dquot lock from xfs_qm_dqhold(). Further optimisations
can be done on the release of references, but that is not done in
this patch or in this patch set.
Getting rid of the dquot lock from the hold code moves the
contention point to the transaction subsystem - xfs_trans_dqresv and
the transaction commit code. The second scalability change it to
make xfs_trans_dqresv() lockless by using cmpxchg rather than the
dquot lock for updating the reservations. We don't really need to
hold the dquot lock to check the quota limits as the limits almost
never change - it's really only the reservation that we care about
here, and if that changes between the check and the cmpxchg, then
we'll go around the loop and check the limits again with the newly
Overall, these patches improve workload performance from around
16,500 creates/s to about 24,000 creates/s. While 25% improvement is
nothing to complain about, performance without quotas is about
250,000 creates/s. So there's still a lot of ground to make up here.
The patchset moves the contention almost entirely to the transaction
commit code, along with the xfs_qm_dqrele calls in xfs_create (about
15% of the overall locks contention). Fixing the transaction commit
code is a major piece of work and where the order of magnitude
improvement will come from, but I haven't quite figured it all
out yet. The dqrele code is simpler, so I'll probably have a patch
soon for that - it'll give another 10% improvement on what we have