[XFS updates] XFS development tree branch, master, updated. v2.6.34-66-gccf7c23
xfs at oss.sgi.com
xfs at oss.sgi.com
Mon May 24 13:04:36 CDT 2010
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".
The branch, master has been updated
ccf7c23 xfs: Ensure inode allocation buffers are fully replayed
df80615 xfs: enable background pushing of the CIL
9da1ab1 xfs: forced unmounts need to push the CIL
71e330b xfs: Introduce delayed logging core code
ed3b4d6 xfs: Improve scalability of busy extent tracking
955833c xfs: make the log ticket ID available outside the log infrastructure
169a7b0 xfs: clean up log ticket overrun debug output
c115541 xfs: Clean up XFS_BLI_* flag namespace
64fc35d xfs: modify buffer item reference counting
3383ca5 xfs: allow log ticket allocation to take allocation flags
524ee36 xfs: Don't reuse the same transaction ID for duplicated transactions.
from b4ed4626a9775cd8cb77209280d24839526f94f2 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit ccf7c23fc129e75ef60e6f59f60a485b7a056598
Author: Dave Chinner <dchinner at redhat.com>
Date: Thu May 20 23:19:42 2010 +1000
xfs: Ensure inode allocation buffers are fully replayed
With delayed logging, we can get inode allocation buffers in the
same transaction inode unlink buffers. We don't currently mark inode
allocation buffers in the log, so inode unlink buffers take
precedence over allocation buffers.
The result is that when they are combined into the same checkpoint,
only the unlinked inode chain fields are replayed, resulting in
uninitialised inode buffers being detected when the next inode
modification is replayed.
To fix this, we need to ensure that we do not set the inode buffer
flag in the buffer log item format flags if the inode allocation has
not already hit the log. To avoid requiring a change to log
recovery, we really need to make this a modification that relies
only on in-memory sate.
We can do this by checking during buffer log formatting (while the
CIL cannot be flushed) if we are still in the same sequence when we
commit the unlink transaction as the inode allocation transaction.
If we are, then we do not add the inode buffer flag to the buffer
log format item flags. This means the entire buffer will be
replayed, not just the unlinked fields. We do this while
CIL flusheÑ are locked out to ensure that we don't race with the
sequence numbers changing and hence fail to put the inode buffer
flag in the buffer format flags when we really need to.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit df806158b0f6eb24247773b4a19b8b59d7217e59
Author: Dave Chinner <dchinner at redhat.com>
Date: Mon May 17 15:52:13 2010 +1000
xfs: enable background pushing of the CIL
If we let the CIL grow without bound, it will grow large enough to violate
recovery constraints (must be at least one complete transaction in the log at
all times) or take forever to write out through the log buffers. Hence we need
a check during asynchronous transactions as to whether the CIL needs to be
pushed.
We track the amount of log space the CIL consumes, so it is relatively simple
to limit it on a pure size basis. Make the limit the minimum of just under half
the log size (recovery constraint) or 8MB of log space (which is an awful lot
of metadata).
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit 9da1ab181ac1790f86528b86ba5876f037e8dcdc
Author: Dave Chinner <dchinner at redhat.com>
Date: Mon May 17 15:51:59 2010 +1000
xfs: forced unmounts need to push the CIL
If the filesystem is being shut down and the there is no log error,
the current code forces out the current log buffers. This code now needs
to push the CIL before it forces out the log buffers to acheive the same
result.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit 71e330b593905e40d6c5afa824d38ee02d70ce5f
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 21 14:37:18 2010 +1000
xfs: Introduce delayed logging core code
The delayed logging code only changes in-memory structures and as
such can be enabled and disabled with a mount option. Add the mount
option and emit a warning that this is an experimental feature that
should not be used in production yet.
We also need infrastructure to track committed items that have not
yet been written to the log. This is what the Committed Item List
(CIL) is for.
The log item also needs to be extended to track the current log
vector, the associated memory buffer and it's location in the Commit
Item List. Extend the log item and log vector structures to enable
this tracking.
To maintain the current log format for transactions with delayed
logging, we need to introduce a checkpoint transaction and a context
for tracking each checkpoint from initiation to transaction
completion. This includes adding a log ticket for tracking space
log required/used by the context checkpoint.
To track all the changes we need an io vector array per log item,
rather than a single array for the entire transaction. Using the new
log vector structure for this requires two passes - the first to
allocate the log vector structures and chain them together, and the
second to fill them out. This log vector chain can then be passed
to the CIL for formatting, pinning and insertion into the CIL.
Formatting of the log vector chain is relatively simple - it's just
a loop over the iovecs on each log vector, but it is made slightly
more complex because we re-write the iovec after the copy to point
back at the memory buffer we just copied into.
This code also needs to pin log items. If the log item is not
already tracked in this checkpoint context, then it needs to be
pinned. Otherwise it is already pinned and we don't need to pin it
again.
The only other complexity is calculating the amount of new log space
the formatting has consumed. This needs to be accounted to the
transaction in progress, and the accounting is made more complex
becase we need also to steal space from it for log metadata in the
checkpoint transaction. Calculate all this at insert time and update
all the tickets, counters, etc correctly.
Once we've formatted all the log items in the transaction, attach
the busy extents to the checkpoint context so the busy extents live
until checkpoint completion and can be processed at that point in
time. Transactions can then be freed at this point in time.
Now we need to issue checkpoints - we are tracking the amount of log space
used by the items in the CIL, so we can trigger background checkpoints when the
space usage gets to a certain threshold. Otherwise, checkpoints need ot be
triggered when a log synchronisation point is reached - a log force event.
Because the log write code already handles chained log vectors, writing the
transaction is trivial, too. Construct a transaction header, add it
to the head of the chain and write it into the log, then issue a
commit record write. Then we can release the checkpoint log ticket
and attach the context to the log buffer so it can be called during
Io completion to complete the checkpoint.
We also need to allow for synchronising multiple in-flight
checkpoints. This is needed for two things - the first is to ensure
that checkpoint commit records appear in the log in the correct
sequence order (so they are replayed in the correct order). The
second is so that xfs_log_force_lsn() operates correctly and only
flushes and/or waits for the specific sequence it was provided with.
To do this we need a wait variable and a list tracking the
checkpoint commits in progress. We can walk this list and wait for
the checkpoints to change state or complete easily, an this provides
the necessary synchronisation for correct operation in both cases.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit ed3b4d6cdc81e8feefdbfa3c584614be301b6d39
Author: Dave Chinner <david at fromorbit.com>
Date: Fri May 21 12:07:08 2010 +1000
xfs: Improve scalability of busy extent tracking
When we free a metadata extent, we record it in the per-AG busy
extent array so that it is not re-used before the freeing
transaction hits the disk. This array is fixed size, so when it
overflows we make further allocation transactions synchronous
because we cannot track more freed extents until those transactions
hit the disk and are completed. Under heavy mixed allocation and
freeing workloads with large log buffers, we can overflow this array
quite easily.
Further, the array is sparsely populated, which means that inserts
need to search for a free slot, and array searches often have to
search many more slots that are actually used to check all the
busy extents. Quite inefficient, really.
To enable this aspect of extent freeing to scale better, we need
a structure that can grow dynamically. While in other areas of
XFS we have used radix trees, the extents being freed are at random
locations on disk so are better suited to being indexed by an rbtree.
So, use a per-AG rbtree indexed by block number to track busy
extents. This incures a memory allocation when marking an extent
busy, but should not occur too often in low memory situations. This
should scale to an arbitrary number of extents so should not be a
limitation for features such as in-memory aggregation of
transactions.
However, there are still situations where we can't avoid allocating
busy extents (such as allocation from the AGFL). To minimise the
overhead of such occurences, we need to avoid doing a synchronous
log force while holding the AGF locked to ensure that the previous
transactions are safely on disk before we use the extent. We can do
this by marking the transaction doing the allocation as synchronous
rather issuing a log force.
Because of the locking involved and the ordering of transactions,
the synchronous transaction provides the same guarantees as a
synchronous log force because it ensures that all the prior
transactions are already on disk when the synchronous transaction
hits the disk. i.e. it preserves the free->allocate order of the
extent correctly in recovery.
By doing this, we avoid holding the AGF locked while log writes are
in progress, hence reducing the length of time the lock is held and
therefore we increase the rate at which we can allocate and free
from the allocation group, thereby increasing overall throughput.
The only problem with this approach is that when a metadata buffer is
marked stale (e.g. a directory block is removed), then buffer remains
pinned and locked until the log goes to disk. The issue here is that
if that stale buffer is reallocated in a subsequent transaction, the
attempt to lock that buffer in the transaction will hang waiting
the log to go to disk to unlock and unpin the buffer. Hence if
someone tries to lock a pinned, stale, locked buffer we need to
push on the log to get it unlocked ASAP. Effectively we are trading
off a guaranteed log force for a much less common trigger for log
force to occur.
Ideally we should not reallocate busy extents. That is a much more
complex fix to the problem as it involves direct intervention in the
allocation btree searches in many places. This is left to a future
set of modifications.
Finally, now that we track busy extents in allocated memory, we
don't need the descriptors in the transaction structure to point to
them. We can replace the complex busy chunk infrastructure with a
simple linked list of busy extents. This allows us to remove a large
chunk of code, making the overall change a net reduction in code
size.
Signed-off-by: Dave Chinner <david at fromorbit.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit 955833cf2ad0aa39b336e853cad212d867199984
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 14 21:41:46 2010 +1000
xfs: make the log ticket ID available outside the log infrastructure
The ticket ID is needed to uniquely identify transactions when doing busy
extent matching. Delayed logging changes the lifecycle of busy extents with
respect to the transaction structure lifecycle. Hence we can no longer use
the transaction structure as a means of determining the owner of the busy
extent as it may be freed and reused while the busy extent is still active.
This commit provides the infrastructure to access the xlog_tid_t held in the
ticket from a transaction handle. This avoids the need for callers to peek
into the transaction and log structures to find this out.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit 169a7b078eaa765e6bd09865c985298ee9084a89
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 7 11:05:31 2010 +1000
xfs: clean up log ticket overrun debug output
Push the error message output when a ticket overrun is detected
into the ticket printing functions. Also remove the debug version
of the code as the production version will still panic just as
effectively on a debug kernel via the panic mask being set.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit c11554104f4dcb509fd43973389b097a04b9d51d
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 7 11:05:19 2010 +1000
xfs: Clean up XFS_BLI_* flag namespace
Clean up the buffer log format (XFS_BLI_*) flags because they have a
polluted namespace. They XFS_BLI_ prefix is used for both in-memory
and on-disk flag feilds, but have overlapping values for different
flags. Rename the buffer log format flags to use the XFS_BLF_*
prefix to avoid confusing them with the in-memory XFS_BLI_* prefixed
flags.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit 64fc35de60da3b1fe970168d10914bf1cf34a3e3
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 7 11:04:34 2010 +1000
xfs: modify buffer item reference counting
The buffer log item reference counts used to take referenceÑ for every
transaction, similar to the pin counting. This is symmetric (like the
pin/unpin) with respect to transaction completion, but with dleayed logging
becomes assymetric as the pinning becomes assymetric w.r.t. transaction
completion.
To make both cases the same, allow the buffer pinning to take a reference to
the buffer log item and always drop the reference the transaction has on it
when being unlocked. This is balanced correctly because the unpin operation
always drops a reference to the log item. Hence reference counting becomes
symmetric w.r.t. item pinning as well as w.r.t active transactions and as a
result the reference counting model remain consistent between normal and
delayed logging.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit 3383ca5780f88bb2c119174045ed77d5ece08072
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 7 11:04:17 2010 +1000
xfs: allow log ticket allocation to take allocation flags
Delayed logging currently requires ticket allocation to succeed, so
we need to be able to sleep on allocation. It also should not allow
memory allocation to recurse into the filesystem. hence we need to
pass allocation flags directing the type of allocation the caller
requires.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
commit 524ee36fa4661d745a467c3bba0e1034fd1f4b77
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 7 11:05:05 2010 +1000
xfs: Don't reuse the same transaction ID for duplicated transactions.
The transaction ID is written into the log as the unique identifier
for transactions during recover. When duplicating a transaction, we
reuse the log ticket, which means it has the same transaction ID as
the previous transaction.
Rather than regenerating a random transaction ID for the duplicated
transaction, just add one to the current ID so that duplicated
transaction can be easily spotted in the log and during recovery
during problem diagnosis.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Alex Elder <aelder at sgi.com>
-----------------------------------------------------------------------
Summary of changes:
fs/xfs/Makefile | 1 +
fs/xfs/linux-2.6/xfs_buf.c | 9 +
fs/xfs/linux-2.6/xfs_quotaops.c | 1 +
fs/xfs/linux-2.6/xfs_super.c | 12 +-
fs/xfs/linux-2.6/xfs_trace.h | 83 +++--
fs/xfs/quota/xfs_dquot.c | 6 +-
fs/xfs/xfs_ag.h | 24 +-
fs/xfs/xfs_alloc.c | 357 ++++++++++++++------
fs/xfs/xfs_alloc.h | 7 +-
fs/xfs/xfs_alloc_btree.c | 2 +-
fs/xfs/xfs_buf_item.c | 166 +++++-----
fs/xfs/xfs_buf_item.h | 18 +-
fs/xfs/xfs_error.c | 2 +-
fs/xfs/xfs_log.c | 120 +++++--
fs/xfs/xfs_log.h | 14 +-
fs/xfs/xfs_log_cil.c | 725 +++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_log_priv.h | 118 ++++++-
fs/xfs/xfs_log_recover.c | 46 ++--
fs/xfs/xfs_log_recover.h | 2 +-
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_trans.c | 144 ++++++--
fs/xfs/xfs_trans.h | 44 +--
fs/xfs/xfs_trans_buf.c | 46 ++--
fs/xfs/xfs_trans_item.c | 114 +------
fs/xfs/xfs_trans_priv.h | 15 +-
fs/xfs/xfs_types.h | 2 +
26 files changed, 1566 insertions(+), 513 deletions(-)
create mode 100644 fs/xfs/xfs_log_cil.c
hooks/post-receive
--
XFS development tree
More information about the xfs
mailing list