This series addresses the log grant lock contention seen by 8-way fs_mark
workloads. The log grant lock protects:
- reserve grant head and wait queue
- write grant head and wait queue
- tail lsn
- last sync lsn
While they are all currently protected by a single lock, ther eis no reason
that they need to all be under the same lock. As a result, one option for
scaling was simply to split the grant lock into three locks - one for each of
the above groups. However, this would mean that we'd need to nest locks inside
each other and it ignores the fact that we really only use the lock on the
tail and last sync lsn variables to protect against concurrent updates.
Hence we can make the tail and last sync LSN variables independent of the grant
lock by making them atomic variables. This means that when we are moving the
tail of the log, we can avoid all locking except when there are waiters queued
on the grant heads.
Making the grant heads scale better is a bit more of a chanllenge. Just
replacing the grant lock with reserve and write grant locks doesn't really help
improve scalability because we'd still need to take both locks in the hot
xfs_log_reserve() path. To improve scalability, we really need to make this
path lock free.
The first step to acheiving this is to encode the grant heads as a 64 bit
variable and then convert it to an atomic variable. This means we can read
the grant heads without holding locks and that allows (in combination with the
above tail/last sync atomics) tail pushing calculations and available log space
calculations to operate lock free.
The second step is to introduce a lock per grant queue that is used exclusively
to protect queue manpulations. With the use of list_empty_careful() we can
check whether the queue has waiters without holding the queue lock. Hence
in the case where the queues are empty we do not need to take the queue locks
in the fast path.
Finally, we need to make the grant head space calculations lockless. With the
grant heads already being atomic variables, we can change the calculation
algorithm to a lockless cmpxchg algorithm. This means we no longer need any
spinlocks in the transaction reserve fast path and hence the scalability of
this path should be significantly improved. There is one down side to this
change - the xlog_verify_head() debug code can no longer be reliably used to
detect accounting problems in the grant space allocations as it requires an
atomic sample of both grant heads. However, the tail verification and the
xlog_space_left() verification still works without problems, so we still have
some debug checking on the grant head locations.
After all this, having converted the ticket queues to use generic wait queues
directly during the series, it seemed like a good idea to remove all the other
users of sv_t types in the log code. Hence there is three patches at the end to
do the conversion and remove the sv_t wrapper from the codebase completely.
- split into lots of patches
- clean up the code and comments
- add patches to clean up sv_t usage at the end of the series
Finally, there's a patch to split up the log grant lock. This needs splitting
into 4 or 5 smaller patches (as you can see it was originally from the commit
log). It splits the grant lock into two list locks (reserve and write queues),
and converts all the other variables that the grant lock protected into atomic
variables. Grant head calculations are made atomic by converting them into 64
bit "LSNs" and the use of cmpxchg loops on atomic 64 bit variables. All log
tail and sync LSNs updates are made atomic via conversion to atomic variables.
With this, the grant lock goes away completely, and the transaction reserve
fast path now only has two cmpxchg loops instead of a heavily contended spin