[Top] [All Lists]

[PATCH 00/16] xfs: current patch stack for 2.6.38 window

To: xfs@xxxxxxxxxxx
Subject: [PATCH 00/16] xfs: current patch stack for 2.6.38 window
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 8 Nov 2010 19:55:03 +1100

FYI, here is my current XFS patch stack that I'll be trying to get ready in
time for the 2.6.38 merge window.  Note that the first two patches are
candidates for 2.6.37-rc. They are a perag reference counting fix and the
movement of a trace point.

My tree is currently based on the VFS locking changes I have out for review,
so there's a couple fo patches that won't apply sanely to a mainline or OSS xfs
dev tree. See below for a pointer to a git tree with all the patches in it.

First patch is a per-cpu superblock counter rewrite. This uses the generic
per-cpu coutner infrastructure to do the heavy lifting. Needs to be split into
two patches.

Following this is the dynamic speculative allocation patches. These have been
rewritten to be base don the current inode size rather than a thumb-in-the-air
how-many-preallocs-have-we-already-done algorithm. There are also some fixes in
the second patch that fix assumptions about ip->i_delayed_blks being zero after
a flush.

Next up we have the inode cache RCU freeing and lookup patches, including one
that avoids putting the inode in the VFS hash (similar to Christoph's patch,
but using the different VFS code).

Then there are buffer cache reclaim changes. First is a per-buftarg shrinker
interface, followed by a lazily updated per-buftarg buffer LRU. building on
this connecting up the prioritised buffer reclaim hooks that ensure more
critical buffers are harder to reclaim.

AIL lock contention fixes are next, with bulk AIL insert and removal functions
being implemented and connected up to the transaction commit and inode buffer
IO completion routines. These significantly reduce AIL lock contention, and
combined with a reduction in the granularity of xfsaild push wakeups, the AIL
lock drops out of the "top 10" contended locks on ۸-way workloads.

There's a fix to avoid error injection from burning CPU on debug kernels - with
a badly fragmented freespace tree, the btree block validation was taking ~60%
of the CPU time, with most of that running error injection checks. 

Finally, there's a patch to split up the log grant lock. This needs splitting
into 4 or 5 smaller patches (as you can see it was originally from the commit
log). It splits the grant lock into two list locks (reserve and write queues),
and converts all the other variables that the grant lock protected into atomic
variables. Grant head calculations are made atomic by converting them into 64
bit "LSNs" and the use of cmpxchg loops on atomic 64 bit variables. All log
tail and sync LSNs updates are made atomic via conversion to atomic variables.
With this, the grant lock goes away completely, and the transaction reserve
fast path now only has two cmpxchg loops instead of a heavily contended spin

The result of all this is raw cpu bound 8-way create performance of just over
100,000 inodes/s, and unlink performance of over 90,000 inodes/s. 8-way dbench
performance is improved from ~1150MB/s to ~1650MB/s by this patchset.

For 8-way creation and unlink of small files (~50 million), the lockstat
profiles look like:

                                contended       total           Lock
                Lock            acquistions  acquisitions       Description
-----------------------------   -----------  ------------       
           inode_wb_list_lock:    496330785    836287347        VFS
                  dcache_lock:    116299583    681450027        VFS
        &(&vblk->lock)->rlock:     52829329    131054495        virtio block 
    &sb->s_type->i_lock_key#1:     41772196   2375571240        VFS 
  &(&cil->xc_cil_lock)->rlock:     29549897    410553961        XFS (CIL commit 
         &irq_desc_lock_class:     27520142     63908701        IRQ edge lock
 &(&pag->pag_buf_lock)->rlock:     11756249   1838039685        XFS (buffer 
cache lock)
    &(&dentry->d_lock)->rlock:      5735657   1225028487        VFS
 &(&parent->list_lock)->rlock:      4356293    249408696        VM (SLAB list 
           inode_sb_list_lock:      3616366    203712449        VFS
                        key#5:      2075310    139221312        XFS SB percpu 
              inode_hash_lock:      1529969    102359626        VFS
             rcu_node_level_0:      1363470     13730113        RCU
        &(&zone->lock)->rlock:      1247467     16469316        VM (free list 
 &(&pag->pag_ici_lock)->rlock:       770880    337090972        XFS (inode 
cache lock)
                    &rq->lock:       589111    184220946        Scheduler
               inode_lru_lock:       527163    102791204        VFS
g->l_grant_write_lock)->rlock:       526471     51279626        XFS (grant 
write lock)
    &(&pag->pagb_lock)->rlock:       402878    208861744        XFS (busy 
extent list)
    &(&zone->lru_lock)->rlock:       167692     25383748        VM (page cache 
              &on_slab_l3_key:       166183     58470153        VM (slab cache)
            semaphore->lock#2:       161321   3659173925        ???
     &(&ailp->xa_lock)->rlock:       143859    164470123        XFS (AIL lock)
          &cil->xc_ctx_lock-W:        32850       173279        XFS (CIL push 
          &cil->xc_ctx_lock-R:        90868    357572724        XFS (CIL push 

I'm still to determine if I'll have the time to finish the removal of the page 
cache from
the buffer cache yet - for pure inode create/unlink workloads the buftarg
mapping tree lock is the second most heavily contended lock in the system.
Hence this definitely needs solving in some way or another....

Anyway, comments are welcome - just keep in mind that there is still some
polish required for these patches. ;)

If you want the git version, everything is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git working

Dave Chinner (16):
      xfs: fix per-ag reference counting in inode reclaim tree walking
      xfs: move delayed write buffer trace
      [RFC] xfs: use generic per-cpu counter infrastructure
      xfs: dynamic speculative EOF preallocation
      xfs: don't truncate prealloc from frequently accessed inodes
      patch xfs-inode-hash-fake
      xfs: convert inode cache lookups to use RCU locking
      xfs: convert pag_ici_lock to a spin lock
      xfs: convert xfsbud shrinker to a per-buftarg shrinker.
      xfs: add a lru to the XFS buffer cache
      xfs: connect up buffer reclaim priority hooks
      xfs: bulk AIL insertion during transaction commit
      xfs: reduce the number of AIL push wakeups
      xfs: remove all the inodes on a buffer from the AIL in bulk
      xfs: only run xfs_error_test if error injection is active
      xfs: make xlog_space_left() independent of the grant lock

 fs/xfs/linux-2.6/xfs_buf.c     |  239 ++++++++----
 fs/xfs/linux-2.6/xfs_buf.h     |   43 ++-
 fs/xfs/linux-2.6/xfs_iops.c    |   11 +-
 fs/xfs/linux-2.6/xfs_linux.h   |    9 -
 fs/xfs/linux-2.6/xfs_super.c   |   22 +-
 fs/xfs/linux-2.6/xfs_sync.c    |   28 +-
 fs/xfs/linux-2.6/xfs_trace.h   |   36 +-
 fs/xfs/quota/xfs_dquot.c       |    2 +-
 fs/xfs/quota/xfs_qm_syscalls.c |    3 +
 fs/xfs/xfs_ag.h                |    2 +-
 fs/xfs/xfs_alloc.c             |    4 +-
 fs/xfs/xfs_bmap.c              |    9 +-
 fs/xfs/xfs_btree.c             |   11 +-
 fs/xfs/xfs_buf_item.c          |   17 +-
 fs/xfs/xfs_da_btree.c          |    4 +-
 fs/xfs/xfs_dfrag.c             |   13 +
 fs/xfs/xfs_error.c             |    3 +
 fs/xfs/xfs_error.h             |    5 +-
 fs/xfs/xfs_extfree_item.c      |   85 +++--
 fs/xfs/xfs_extfree_item.h      |   12 +-
 fs/xfs/xfs_fsops.c             |    4 +-
 fs/xfs/xfs_ialloc.c            |    2 +-
 fs/xfs/xfs_iget.c              |   55 ++-
 fs/xfs/xfs_inode.c             |   24 +-
 fs/xfs/xfs_inode.h             |    1 +
 fs/xfs/xfs_inode_item.c        |  112 +++++-
 fs/xfs/xfs_iomap.c             |   53 ++-
 fs/xfs/xfs_log.c               |  678 +++++++++++++++++---------------
 fs/xfs/xfs_log_cil.c           |    9 +-
 fs/xfs/xfs_log_priv.h          |   40 ++-
 fs/xfs/xfs_log_recover.c       |   27 +-
 fs/xfs/xfs_mount.c             |  837 +++++++++++-----------------------------
 fs/xfs/xfs_mount.h             |   80 +---
 fs/xfs/xfs_trans.c             |   70 ++++-
 fs/xfs/xfs_trans.h             |    2 +-
 fs/xfs/xfs_trans_ail.c         |  189 ++++++++-
 fs/xfs/xfs_trans_extfree.c     |    4 +-
 fs/xfs/xfs_trans_priv.h        |   13 +-
 fs/xfs/xfs_vnodeops.c          |   61 ++-
 include/linux/percpu_counter.h |   16 +
 lib/percpu_counter.c           |   79 ++++
 41 files changed, 1593 insertions(+), 1321 deletions(-)

<Prev in Thread] Current Thread [Next in Thread>