[Top] [All Lists]

[PATCH 0/7] Delayed write metadata writeback V3

To: xfs@xxxxxxxxxxx
Subject: [PATCH 0/7] Delayed write metadata writeback V3
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 25 Jan 2010 17:22:37 +1100
(a.k.a. kill async inode writeback V3)

While I started with killing async inode writeback, the series has
grown. It's not really limited to inode writeback - it touches dquot
flushing, changes the way the AIL pushes on buffers, adds xfsbufd
sortingi for delayed write buffers, adds a real non-blocking mode to
inode reclaim and avoids physical inode writeback from the VFS while
fixing bugs in handling delayed write inodes.  Hence this is more
about enabling efficient delayed write metadata than it is able
killing async inode writeback.

The idea behind this series is to make metadata buffers get
written from xfsbufd via the delayed write queue rather than being
issued asynchronously from all over the place. To do this, async
buffer writeback is almost entirely removed from XFS, replaced
instead by delayed writes and a method to expedite flushing of
delayed write buffers when required.

The result of funnelling all the buffer IO into a single place
is that we can more tightly control and therefore optimise the
submission of metadata IO. Aggregating the buffers before dispatch
allows much better sort efficiency of the buffers as the sort window
is not limited to the size of the elevator congestion hysteresis
limit. Hence we can approach 100% merge effeciency on large numbers
of buffers when dispatched for IO and greatly reduce the amount
of seeking metadata writeback causes.

The major change is to the inode flushing and reclaim code. Delayed
write inodes hold the flush lock for much longer than for async
writeback, and hence blocking on the flush lock can cause extremely
long latencies without other mechanisms to expedite the release of
the flush locks. To prevent needing to flush inodes immeidately,
all operations are done non-blocking unless synchronous. THis
required a significant rework of the inode reclaim code, but it
greatly simplified other pieces of code (e.g. log item pushing).

Version 3
- rework inode reclaim to:
        - separate it from xfs_iflush return values
        - provide a non-blocking mode for background operation
- apply delwri buffer promotion tricks to dquot flushing
- kill unneeded dquot flushing flags, similar to inode flushing flag
- fix sync inode flush bug when trying to flush delwri inodes

Version 2:
- use generic list sort function
- when unmounting, push the delwri buffers first, then do sync inode
  reclaim so that reclaim doesn't block for 15 seconds waiting for
  delwri inode buffers to be aged and written before the inodes can
  be reclaimed.

Performance numbers for this version are the same as V2, which were
as follows:

Perf results (average of 3 runs) on a debug XFS build (means allocation
patterns are randomly varied, so runtimes are also a bit variable):

Untar 2.6.32 kernel tarball, sync, then remove:

                  Untar+sync     rm -rf
xfs-dev:           25.2s          13.0s
xfs-dev-delwri-1:  22.5s           9.1s
xfs-dev-delwri-2:  21.9s           8.4s

4 processes each creating 100,000, five byte files in separate
directories concurrently, then 4 processes removing a directory each

                  create          rm -rf
xfs-dev:           8m32s           4m10s
xfs-dev-delwri-1:  4m55s           3m42s
xfs-dev-delwri-2:  4m56s           3m33s

The patch series (plus the couple of previous bug fixes) are
available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/dgc/xfs for-2.6.34

Dave Chinner (9):
      xfs: don't hold onto reserved blocks on remount,ro
      xfs: turn off sign warnings
      xfs: Make inode reclaim states explicit
      xfs: Use delayed write for inodes rather than async
      xfs: Don't issue buffer IO direct from AIL push
      xfs: Sort delayed write buffers before dispatch
      xfs: Use delay write promotion for dquot flushing
      xfs: kill the unused XFS_QMOPT_* flush flags
      xfs: xfs_fs_write_inode() can fail to write inodes synchronously

 fs/xfs/Makefile               |    2 +-
 fs/xfs/linux-2.6/xfs_buf.c    |  117 +++++++++++++++++++++++++++++---------
 fs/xfs/linux-2.6/xfs_buf.h    |    2 +
 fs/xfs/linux-2.6/xfs_super.c  |   72 ++++++++++++++++++------
 fs/xfs/linux-2.6/xfs_sync.c   |  124 ++++++++++++++++++++++++++++++++--------
 fs/xfs/linux-2.6/xfs_trace.h  |    1 +
 fs/xfs/quota/xfs_dquot.c      |   38 +++++-------
 fs/xfs/quota/xfs_dquot_item.c |   87 ++++------------------------
 fs/xfs/quota/xfs_dquot_item.h |    4 -
 fs/xfs/quota/xfs_qm.c         |   14 ++---
 fs/xfs/xfs_buf_item.c         |   64 ++++++++++++----------
 fs/xfs/xfs_inode.c            |   86 ++--------------------------
 fs/xfs/xfs_inode.h            |   11 +---
 fs/xfs/xfs_inode_item.c       |  108 +++++++----------------------------
 fs/xfs/xfs_inode_item.h       |    6 --
 fs/xfs/xfs_mount.c            |   13 ++++-
 fs/xfs/xfs_mount.h            |    1 +
 fs/xfs/xfs_quota.h            |    8 +--
 fs/xfs/xfs_trans_ail.c        |    7 ++
 19 files changed, 367 insertions(+), 398 deletions(-)

<Prev in Thread] Current Thread [Next in Thread>