xfs
[Top] [All Lists]

[XFS updates] XFS development tree branch, master, updated. v2.6.34-66-g

To: xfs@xxxxxxxxxxx
Subject: [XFS updates] XFS development tree branch, master, updated. v2.6.34-66-gccf7c23
From: xfs@xxxxxxxxxxx
Date: Mon, 24 May 2010 13:04:36 -0500
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, master has been updated
  ccf7c23 xfs: Ensure inode allocation buffers are fully replayed
  df80615 xfs: enable background pushing of the CIL
  9da1ab1 xfs: forced unmounts need to push the CIL
  71e330b xfs: Introduce delayed logging core code
  ed3b4d6 xfs: Improve scalability of busy extent tracking
  955833c xfs: make the log ticket ID available outside the log infrastructure
  169a7b0 xfs: clean up log ticket overrun debug output
  c115541 xfs: Clean up XFS_BLI_* flag namespace
  64fc35d xfs: modify buffer item reference counting
  3383ca5 xfs: allow log ticket allocation to take allocation flags
  524ee36 xfs: Don't reuse the same transaction ID for duplicated transactions.
      from  b4ed4626a9775cd8cb77209280d24839526f94f2 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit ccf7c23fc129e75ef60e6f59f60a485b7a056598
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu May 20 23:19:42 2010 +1000

    xfs: Ensure inode allocation buffers are fully replayed
    
    With delayed logging, we can get inode allocation buffers in the
    same transaction inode unlink buffers. We don't currently mark inode
    allocation buffers in the log, so inode unlink buffers take
    precedence over allocation buffers.
    
    The result is that when they are combined into the same checkpoint,
    only the unlinked inode chain fields are replayed, resulting in
    uninitialised inode buffers being detected when the next inode
    modification is replayed.
    
    To fix this, we need to ensure that we do not set the inode buffer
    flag in the buffer log item format flags if the inode allocation has
    not already hit the log. To avoid requiring a change to log
    recovery, we really need to make this a modification that relies
    only on in-memory sate.
    
    We can do this by checking during buffer log formatting (while the
    CIL cannot be flushed) if we are still in the same sequence when we
    commit the unlink transaction as the inode allocation transaction.
    If we are, then we do not add the inode buffer flag to the buffer
    log format item flags. This means the entire buffer will be
    replayed, not just the unlinked fields. We do this while
    CIL flusheÑ? are locked out to ensure that we don't race with the
    sequence numbers changing and hence fail to put the inode buffer
    flag in the buffer format flags when we really need to.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit df806158b0f6eb24247773b4a19b8b59d7217e59
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon May 17 15:52:13 2010 +1000

    xfs: enable background pushing of the CIL
    
    If we let the CIL grow without bound, it will grow large enough to violate
    recovery constraints (must be at least one complete transaction in the log 
at
    all times) or take forever to write out through the log buffers. Hence we 
need
    a check during asynchronous transactions as to whether the CIL needs to be
    pushed.
    
    We track the amount of log space the CIL consumes, so it is relatively 
simple
    to limit it on a pure size basis. Make the limit the minimum of just under 
half
    the log size (recovery constraint) or 8MB of log space (which is an awful 
lot
    of metadata).
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit 9da1ab181ac1790f86528b86ba5876f037e8dcdc
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon May 17 15:51:59 2010 +1000

    xfs: forced unmounts need to push the CIL
    
    If the filesystem is being shut down and the there is no log error,
    the current code forces out the current log buffers. This code now needs
    to push the CIL before it forces out the log buffers to acheive the same
    result.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit 71e330b593905e40d6c5afa824d38ee02d70ce5f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri May 21 14:37:18 2010 +1000

    xfs: Introduce delayed logging core code
    
    The delayed logging code only changes in-memory structures and as
    such can be enabled and disabled with a mount option. Add the mount
    option and emit a warning that this is an experimental feature that
    should not be used in production yet.
    
    We also need infrastructure to track committed items that have not
    yet been written to the log. This is what the Committed Item List
    (CIL) is for.
    
    The log item also needs to be extended to track the current log
    vector, the associated memory buffer and it's location in the Commit
    Item List. Extend the log item and log vector structures to enable
    this tracking.
    
    To maintain the current log format for transactions with delayed
    logging, we need to introduce a checkpoint transaction and a context
    for tracking each checkpoint from initiation to transaction
    completion.  This includes adding a log ticket for tracking space
    log required/used by the context checkpoint.
    
    To track all the changes we need an io vector array per log item,
    rather than a single array for the entire transaction. Using the new
    log vector structure for this requires two passes - the first to
    allocate the log vector structures and chain them together, and the
    second to fill them out.  This log vector chain can then be passed
    to the CIL for formatting, pinning and insertion into the CIL.
    
    Formatting of the log vector chain is relatively simple - it's just
    a loop over the iovecs on each log vector, but it is made slightly
    more complex because we re-write the iovec after the copy to point
    back at the memory buffer we just copied into.
    
    This code also needs to pin log items. If the log item is not
    already tracked in this checkpoint context, then it needs to be
    pinned. Otherwise it is already pinned and we don't need to pin it
    again.
    
    The only other complexity is calculating the amount of new log space
    the formatting has consumed. This needs to be accounted to the
    transaction in progress, and the accounting is made more complex
    becase we need also to steal space from it for log metadata in the
    checkpoint transaction. Calculate all this at insert time and update
    all the tickets, counters, etc correctly.
    
    Once we've formatted all the log items in the transaction, attach
    the busy extents to the checkpoint context so the busy extents live
    until checkpoint completion and can be processed at that point in
    time. Transactions can then be freed at this point in time.
    
    Now we need to issue checkpoints - we are tracking the amount of log space
    used by the items in the CIL, so we can trigger background checkpoints when 
the
    space usage gets to a certain threshold. Otherwise, checkpoints need ot be
    triggered when a log synchronisation point is reached - a log force event.
    
    Because the log write code already handles chained log vectors, writing the
    transaction is trivial, too. Construct a transaction header, add it
    to the head of the chain and write it into the log, then issue a
    commit record write. Then we can release the checkpoint log ticket
    and attach the context to the log buffer so it can be called during
    Io completion to complete the checkpoint.
    
    We also need to allow for synchronising multiple in-flight
    checkpoints. This is needed for two things - the first is to ensure
    that checkpoint commit records appear in the log in the correct
    sequence order (so they are replayed in the correct order). The
    second is so that xfs_log_force_lsn() operates correctly and only
    flushes and/or waits for the specific sequence it was provided with.
    
    To do this we need a wait variable and a list tracking the
    checkpoint commits in progress. We can walk this list and wait for
    the checkpoints to change state or complete easily, an this provides
    the necessary synchronisation for correct operation in both cases.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit ed3b4d6cdc81e8feefdbfa3c584614be301b6d39
Author: Dave Chinner <david@xxxxxxxxxxxxx>
Date:   Fri May 21 12:07:08 2010 +1000

    xfs: Improve scalability of busy extent tracking
    
    When we free a metadata extent, we record it in the per-AG busy
    extent array so that it is not re-used before the freeing
    transaction hits the disk. This array is fixed size, so when it
    overflows we make further allocation transactions synchronous
    because we cannot track more freed extents until those transactions
    hit the disk and are completed. Under heavy mixed allocation and
    freeing workloads with large log buffers, we can overflow this array
    quite easily.
    
    Further, the array is sparsely populated, which means that inserts
    need to search for a free slot, and array searches often have to
    search many more slots that are actually used to check all the
    busy extents. Quite inefficient, really.
    
    To enable this aspect of extent freeing to scale better, we need
    a structure that can grow dynamically. While in other areas of
    XFS we have used radix trees, the extents being freed are at random
    locations on disk so are better suited to being indexed by an rbtree.
    
    So, use a per-AG rbtree indexed by block number to track busy
    extents.  This incures a memory allocation when marking an extent
    busy, but should not occur too often in low memory situations. This
    should scale to an arbitrary number of extents so should not be a
    limitation for features such as in-memory aggregation of
    transactions.
    
    However, there are still situations where we can't avoid allocating
    busy extents (such as allocation from the AGFL). To minimise the
    overhead of such occurences, we need to avoid doing a synchronous
    log force while holding the AGF locked to ensure that the previous
    transactions are safely on disk before we use the extent. We can do
    this by marking the transaction doing the allocation as synchronous
    rather issuing a log force.
    
    Because of the locking involved and the ordering of transactions,
    the synchronous transaction provides the same guarantees as a
    synchronous log force because it ensures that all the prior
    transactions are already on disk when the synchronous transaction
    hits the disk. i.e. it preserves the free->allocate order of the
    extent correctly in recovery.
    
    By doing this, we avoid holding the AGF locked while log writes are
    in progress, hence reducing the length of time the lock is held and
    therefore we increase the rate at which we can allocate and free
    from the allocation group, thereby increasing overall throughput.
    
    The only problem with this approach is that when a metadata buffer is
    marked stale (e.g. a directory block is removed), then buffer remains
    pinned and locked until the log goes to disk. The issue here is that
    if that stale buffer is reallocated in a subsequent transaction, the
    attempt to lock that buffer in the transaction will hang waiting
    the log to go to disk to unlock and unpin the buffer. Hence if
    someone tries to lock a pinned, stale, locked buffer we need to
    push on the log to get it unlocked ASAP. Effectively we are trading
    off a guaranteed log force for a much less common trigger for log
    force to occur.
    
    Ideally we should not reallocate busy extents. That is a much more
    complex fix to the problem as it involves direct intervention in the
    allocation btree searches in many places. This is left to a future
    set of modifications.
    
    Finally, now that we track busy extents in allocated memory, we
    don't need the descriptors in the transaction structure to point to
    them. We can replace the complex busy chunk infrastructure with a
    simple linked list of busy extents. This allows us to remove a large
    chunk of code, making the overall change a net reduction in code
    size.
    
    Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit 955833cf2ad0aa39b336e853cad212d867199984
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri May 14 21:41:46 2010 +1000

    xfs: make the log ticket ID available outside the log infrastructure
    
    The ticket ID is needed to uniquely identify transactions when doing busy
    extent matching. Delayed logging changes the lifecycle of busy extents with
    respect to the transaction structure lifecycle. Hence we can no longer use
    the transaction structure as a means of determining the owner of the busy
    extent as it may be freed and reused while the busy extent is still active.
    
    This commit provides the infrastructure to access the xlog_tid_t held in the
    ticket from a transaction handle. This avoids the need for callers to peek
    into the transaction and log structures to find this out.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit 169a7b078eaa765e6bd09865c985298ee9084a89
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri May 7 11:05:31 2010 +1000

    xfs: clean up log ticket overrun debug output
    
    Push the error message output when a ticket overrun is detected
    into the ticket printing functions. Also remove the debug version
    of the code as the production version will still panic just as
    effectively on a debug kernel via the panic mask being set.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit c11554104f4dcb509fd43973389b097a04b9d51d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri May 7 11:05:19 2010 +1000

    xfs: Clean up XFS_BLI_* flag namespace
    
    Clean up the buffer log format (XFS_BLI_*) flags because they have a
    polluted namespace. They XFS_BLI_ prefix is used for both in-memory
    and on-disk flag feilds, but have overlapping values for different
    flags. Rename the buffer log format flags to use the XFS_BLF_*
    prefix to avoid confusing them with the in-memory XFS_BLI_* prefixed
    flags.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit 64fc35de60da3b1fe970168d10914bf1cf34a3e3
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri May 7 11:04:34 2010 +1000

    xfs: modify buffer item reference counting
    
    The buffer log item reference counts used to take referenceÑ? for every
    transaction, similar to the pin counting. This is symmetric (like the
    pin/unpin) with respect to transaction completion, but with dleayed logging
    becomes assymetric as the pinning becomes assymetric w.r.t. transaction
    completion.
    
    To make both cases the same, allow the buffer pinning to take a reference to
    the buffer log item and always drop the reference the transaction has on it
    when being unlocked. This is balanced correctly because the unpin operation
    always drops a reference to the log item. Hence reference counting becomes
    symmetric w.r.t. item pinning as well as w.r.t active transactions and as a
    result the reference counting model remain consistent between normal and
    delayed logging.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit 3383ca5780f88bb2c119174045ed77d5ece08072
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri May 7 11:04:17 2010 +1000

    xfs: allow log ticket allocation to take allocation flags
    
    Delayed logging currently requires ticket allocation to succeed, so
    we need to be able to sleep on allocation. It also should not allow
    memory allocation to recurse into the filesystem. hence we need to
    pass allocation flags directing the type of allocation the caller
    requires.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

commit 524ee36fa4661d745a467c3bba0e1034fd1f4b77
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri May 7 11:05:05 2010 +1000

    xfs: Don't reuse the same transaction ID for duplicated transactions.
    
    The transaction ID is written into the log as the unique identifier
    for transactions during recover. When duplicating a transaction, we
    reuse the log ticket, which means it has the same transaction ID as
    the previous transaction.
    
    Rather than regenerating a random transaction ID for the duplicated
    transaction, just add one to the current ID so that duplicated
    transaction can be easily spotted in the log and during recovery
    during problem diagnosis.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/Makefile                 |    1 +
 fs/xfs/linux-2.6/xfs_buf.c      |    9 +
 fs/xfs/linux-2.6/xfs_quotaops.c |    1 +
 fs/xfs/linux-2.6/xfs_super.c    |   12 +-
 fs/xfs/linux-2.6/xfs_trace.h    |   83 +++--
 fs/xfs/quota/xfs_dquot.c        |    6 +-
 fs/xfs/xfs_ag.h                 |   24 +-
 fs/xfs/xfs_alloc.c              |  357 ++++++++++++++------
 fs/xfs/xfs_alloc.h              |    7 +-
 fs/xfs/xfs_alloc_btree.c        |    2 +-
 fs/xfs/xfs_buf_item.c           |  166 +++++-----
 fs/xfs/xfs_buf_item.h           |   18 +-
 fs/xfs/xfs_error.c              |    2 +-
 fs/xfs/xfs_log.c                |  120 +++++--
 fs/xfs/xfs_log.h                |   14 +-
 fs/xfs/xfs_log_cil.c            |  725 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_log_priv.h           |  118 ++++++-
 fs/xfs/xfs_log_recover.c        |   46 ++--
 fs/xfs/xfs_log_recover.h        |    2 +-
 fs/xfs/xfs_mount.h              |    1 +
 fs/xfs/xfs_trans.c              |  144 ++++++--
 fs/xfs/xfs_trans.h              |   44 +--
 fs/xfs/xfs_trans_buf.c          |   46 ++--
 fs/xfs/xfs_trans_item.c         |  114 +------
 fs/xfs/xfs_trans_priv.h         |   15 +-
 fs/xfs/xfs_types.h              |    2 +
 26 files changed, 1566 insertions(+), 513 deletions(-)
 create mode 100644 fs/xfs/xfs_log_cil.c


hooks/post-receive
-- 
XFS development tree

<Prev in Thread] Current Thread [Next in Thread>
  • [XFS updates] XFS development tree branch, master, updated. v2.6.34-66-gccf7c23, xfs <=