xfs
[Top] [All Lists]

[XFS updates] XFS development tree branch, master, updated. v2.6.37-rc4-

To: xfs@xxxxxxxxxxx
Subject: [XFS updates] XFS development tree branch, master, updated. v2.6.37-rc4-53-gd0eb2f3
From: xfs@xxxxxxxxxxx
Date: Tue, 4 Jan 2011 20:36:08 -0600
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, master has been updated
  d0eb2f3 xfs: convert grant head manipulations to lockless algorithm
  3f16b98 xfs: introduce new locks for the log grant ticket wait queues
  c8a09ff xfs: convert log grant heads to atomic variables
  1c3cb9e xfs: convert l_tail_lsn to an atomic variable.
  84f3c68 xfs: convert l_last_sync_lsn to an atomic variable
  2ced19c xfs: make AIL tail pushing independent of the grant lock
  eb40a87 xfs: use wait queues directly for the log wait queues
  a69ed03 xfs: combine grant heads into a single 64 bit integer
  663e496 xfs: rework log grant space calculations
  3f336c6 xfs: fact out common grant head/log tail verification code
  1054794 xfs: convert log grant ticket queues to list heads
  9552e7f xfs: use AIL bulk delete function to implement single delete
  e605994 xfs: use AIL bulk update function to implement single updates
  3013683 xfs: remove all the inodes on a buffer from the AIL in bulk
  c90821a xfs: consume iodone callback items on buffers as they are processed
  e677d0f xfs: reduce the number of AIL push wakeups
  0e57f6a xfs: bulk AIL insertion during transaction commit
  eb3efa1 xfs: clean up xfs_ail_delete()
  b199c8a xfs: Pull EFI/EFD handling out from under the AIL lock
  9c5f841 xfs: fix EFI transaction cancellation.
  821eb21 xfs: connect up buffer reclaim priority hooks
  430cbeb xfs: add a lru to the XFS buffer cache
  ff57ab2 xfs: convert xfsbud shrinker to a per-buftarg shrinker.
  1a427ab xfs: convert pag_ici_lock to a spin lock
  1a3e8f3 xfs: convert inode cache lookups to use RCU locking
  d95b7aa xfs: rcu free inodes
  6e85756 xfs: don't truncate prealloc from frequently accessed inodes
  055388a xfs: dynamic speculative EOF preallocation
  622d814 xfs: use KM_NOFS for allocations during attribute list operations
  dcfcf20 xfs: provide a inode iolock lockdep class
      from  489a150f6454e2cd93d9e0ee6d7c5a361844f62a (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit d0eb2f38b250b7d6c993adf81b0e4ded0565497e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:29:14 2010 +1100

    xfs: convert grant head manipulations to lockless algorithm
    
    The only thing that the grant lock remains to protect is the grant head
    manipulations when adding or removing space from the log. These calculations
    are already based on atomic variables, so we can already update them safely
    without locks. However, the grant head manpulations require atomic 
multi-step
    calculations to be executed, which the algorithms currently don't allow.
    
    To make these multi-step calculations atomic, convert the algorithms to
    compare-and-exchange loops on the atomic variables. That is, we sample the 
old
    value, perform the calculation and use atomic64_cmpxchg() to attempt to 
update
    the head with the new value. If the head has not changed since we sampled 
it,
    it will succeed and we are done. Otherwise, we rerun the calculation again 
from
    a new sample of the head.
    
    This allows us to remove the grant lock from around all the grant head space
    manipulations, and that effectively removes the grant lock from the log
    completely. Hence we can remove the grant lock completely from the log at 
this
    point.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 3f16b9850743b702380f098ab5e0308cd6af1792
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:29:01 2010 +1100

    xfs: introduce new locks for the log grant ticket wait queues
    
    The log grant ticket wait queues are currently protected by the log
    grant lock.  However, the queues are functionally independent from
    each other, and operations on them only require serialisation
    against other queue operations now that all of the other log
    variables they use are atomic values.
    
    Hence, we can make them independent of the grant lock by introducing
    new locks just to protect the lists operations. because the lists
    are independent, we can use a lock per list and ensure that reserve
    and write head queuing do not contend.
    
    To ensure forced shutdowns work correctly in conjunction with the
    new fast paths, ensure that we check whether the log has been shut
    down in the grant functions once we hold the relevant spin locks but
    before we go to sleep. This is needed to co-ordinate correctly with
    the wakeups that are issued on the ticket queues so we don't leave
    any processes sleeping on the queues during a shutdown.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit c8a09ff8ca2235bccdaea8a52fbd5349646a8ba4
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Sat Dec 4 00:02:40 2010 +1100

    xfs: convert log grant heads to atomic variables
    
    Convert the log grant heads to atomic64_t types in preparation for
    converting the accounting algorithms to atomic operations. his patch
    just converts the variables; the algorithmic changes are in a
    separate patch for clarity.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 1c3cb9ec07fabf0c0970adc46fd2a1f09c1186dd
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:28:39 2010 +1100

    xfs: convert l_tail_lsn to an atomic variable.
    
    log->l_tail_lsn is currently protected by the log grant lock. The
    lock is only needed for serialising readers against writers, so we
    don't really need the lock if we make the l_tail_lsn variable an
    atomic. Converting the l_tail_lsn variable to an atomic64_t means we
    can start to peel back the grant lock from various operations.
    
    Also, provide functions to safely crack an atomic LSN variable into
    it's component pieces and to recombined the components into an
    atomic variable. Use them where appropriate.
    
    This also removes the need for explicitly holding a spinlock to read
    the l_tail_lsn on 32 bit platforms.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>

commit 84f3c683c4d3f36d3c3ed320babd960a332ac458
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Dec 3 22:11:29 2010 +1100

    xfs: convert l_last_sync_lsn to an atomic variable
    
    log->l_last_sync_lsn is updated in only one critical spot - log
    buffer Io completion - and is protected by the grant lock here. This
    requires the grant lock to be taken for every log buffer IO
    completion. Converting the l_last_sync_lsn variable to an atomic64_t
    means that we do not need to take the grant lock in log buffer IO
    completion to update it.
    
    This also removes the need for explicitly holding a spinlock to read
    the l_last_sync_lsn on 32 bit platforms.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 2ced19cbae5448b720919a494606c62095d4f4db
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:09:20 2010 +1100

    xfs: make AIL tail pushing independent of the grant lock
    
    The xlog_grant_push_ail() currently takes the grant lock internally to 
sample
    the tail lsn, last sync lsn and the reserve grant head. Most of the callers
    already hold the grant lock but have to drop it before calling
    xlog_grant_push_ail(). This is a left over from when the AIL tail pushing 
was
    done in line and hence xlog_grant_push_ail had to drop the grant lock. AIL 
push
    is now done in another thread and hence we can safely hold the grant lock 
over
    the entire xlog_grant_push_ail call.
    
    Push the grant lock outside of xlog_grant_push_ail() to simplify the locking
    and synchronisation needed for tail pushing.  This will reduce traffic on 
the
    grant lock by itself, but this is only one step in preparing for the 
complete
    removal of the grant lock.
    
    While there, clean up the formatting of xlog_grant_push_ail() to match the
    rest of the XFS code.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit eb40a87500ac2f6be7eaf8ebb35610e6d0e60e9a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:09:01 2010 +1100

    xfs: use wait queues directly for the log wait queues
    
    The log grant queues are one of the few places left using sv_t
    constructs for waiting. Given we are touching this code, we should
    convert them to plain wait queues. While there, convert all the
    other sv_t users in the log code as well.
    
    Seeing as this removes the last users of the sv_t type, remove the
    header file defining the wrapper and the fragments that still
    reference it.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit a69ed03c24d4a336c23b7116127713d5a8c5ac4d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:08:20 2010 +1100

    xfs: combine grant heads into a single 64 bit integer
    
    Prepare for switching the grant heads to atomic variables by
    combining the two 32 bit values that make up the grant head into a
    single 64 bit variable.  Provide wrapper functions to combine and
    split the grant heads appropriately for calculations and use them as
    necessary.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 663e496a720a3a9fc08ea70b29724e8906b34e43
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:06:05 2010 +1100

    xfs: rework log grant space calculations
    
    The log grant space calculations are repeated for both write and
    reserve grant heads. To make it simpler to convert the calculations
    toa different algorithm, factor them so both the gratn heads use the
    same calculation functions. Once this is done we can drop the
    wrappers that are used in only a couple of place to update both
    grant heads at once as they don't provide any particular value.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 3f336c6fa17c2b3d14b3dd1bd6e64e9cc97b6359
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:02:52 2010 +1100

    xfs: fact out common grant head/log tail verification code
    
    Factor repeated debug code out of grant head manipulation functions into a
    separate function. This removes ifdef DEBUG spagetti from the code and makes
    the code easier to follow.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 1054794198e39103cb986618c4c10ec2252b7089
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Dec 21 12:02:25 2010 +1100

    xfs: convert log grant ticket queues to list heads
    
    The grant write and reserve queues use a roll-your-own double linked
    list, so convert it to a standard list_head structure and convert
    all the list traversals to use list_for_each_entry(). We can also
    get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty()
    check to tell if the ticket is in a list or not.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 9552e7f2f3dd13a7580e488a7a3582332daad4f5
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Dec 20 12:36:15 2010 +1100

    xfs: use AIL bulk delete function to implement single delete
    
    We now have two copies of AIL delete operations that are mostly
    duplicate functionality. The single log item deletes can be
    implemented via the bulk updates by turning xfs_trans_ail_delete()
    into a simple wrapper. This removes all the duplicate delete
    functionality and associated helpers.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit e60599492990d1b52c70e9ed2f8e062fe11ca937
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Dec 20 12:34:26 2010 +1100

    xfs: use AIL bulk update function to implement single updates
    
    We now have two copies of AIL insert operations that are mostly
    duplicate functionality. The single log item updates can be
    implemented via the bulk updates by turning xfs_trans_ail_update()
    into a simple wrapper. This removes all the duplicate insert
    functionality and associated helpers.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 3013683253ad04f67d8cfaa25be708353686b90a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Dec 20 12:03:17 2010 +1100

    xfs: remove all the inodes on a buffer from the AIL in bulk
    
    When inode buffer IO completes, usually all of the inodes are removed from 
the
    AIL. This involves processing them one at a time and taking the AIL lock 
once
    for every inode. When all CPUs are processing inode IO completions, this 
causes
    excessive amount sof contention on the AIL lock.
    
    Instead, change the way we process inode IO completion in the buffer
    IO done callback. Allow the inode IO done callback to walk the list
    of IO done callbacks and pull all the inodes off the buffer in one
    go and then process them as a batch.
    
    Once all the inodes for removal are collected, take the AIL lock
    once and do a bulk removal operation to minimise traffic on the AIL
    lock.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit c90821a26a8c90ad1e3116393b8a8260ab46bffb
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Dec 3 17:00:52 2010 +1100

    xfs: consume iodone callback items on buffers as they are processed
    
    To allow buffer iodone callbacks to consume multiple items off the
    callback list, first we need to convert the xfs_buf_do_callbacks()
    to consume items and always pull the next item from the head of the
    list.
    
    The means the item list walk is never dependent on knowing the
    next item on the list and hence allows callbacks to remove items
    from the list as well. This allows callbacks to do bulk operations
    by scanning the list for identical callbacks, consuming them all
    and then processing them in bulk, negating the need for multiple
    callbacks of that type.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit e677d0f9548e2245ee3c2977661ca8ca165af188
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Dec 17 20:08:04 2010 +1100

    xfs: reduce the number of AIL push wakeups
    
    The xfaild often tries to rest to wait for congestion to pass of for
    IO to complete, but is regularly woken in tail-pushing situations.
    In severe cases, the xfsaild is getting woken tens of thousands of
    times a second. Reduce the number needless wakeups by only waking
    the xfsaild if the new target is larger than the old one. Further
    make short sleeps uninterruptible as they occur when the xfsaild has
    decided it needs to back off to allow some IO to complete and being
    woken early is counter-productive.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 0e57f6a36f9be03e5abb755f524ee91c4aebe854
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Dec 20 12:02:19 2010 +1100

    xfs: bulk AIL insertion during transaction commit
    
    When inserting items into the AIL from the transaction committed
    callbacks, we take the AIL lock for every single item that is to be
    inserted. For a CIL checkpoint commit, this can be tens of thousands
    of individual inserts, yet almost all of the items will be inserted
    at the same point in the AIL because they have the same index.
    
    To reduce the overhead and contention on the AIL lock for such
    operations, introduce a "bulk insert" operation which allows a list
    of log items with the same LSN to be inserted in a single operation
    via a list splice. To do this, we need to pre-sort the log items
    being committed into a temporary list for insertion.
    
    The complexity is that not every log item will end up with the same
    LSN, and not every item is actually inserted into the AIL. Items
    that don't match the commit LSN will be inserted and unpinned as per
    the current one-at-a-time method (relatively rare), while items that
    are not to be inserted will be unpinned and freed immediately. Items
    that are to be inserted at the given commit lsn are placed in a
    temporary array and inserted into the AIL in bulk each time the
    array fills up.
    
    As a result of this, we trade off AIL hold time for a significant
    reduction in traffic. lock_stat output shows that the worst case
    hold time is unchanged, but contention from AIL inserts drops by an
    order of magnitude and the number of lock traversal decreases
    significantly.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit eb3efa1249b6413be930bdf13d10b6238028a440
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Dec 3 16:42:57 2010 +1100

    xfs: clean up xfs_ail_delete()
    
    xfs_ail_delete() has a needlessly complex interface. It returns the log item
    that was passed in for deletion (which the callers then assert is identical 
to
    the one passed in), and callers of xfs_ail_delete() still need to invalidate
    current traversal cursors.
    
    Make xfs_ail_delete() return void, move the cursor invalidation inside it, 
and
    clean up the callers just to use the log item pointer they passed in.
    
    While cleaning up, remove the messy and unnecessary "/* ARGUSED */" comments
    around all these functions.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit b199c8a4ba11879df87daad496ceee41fdc6aa82
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Dec 20 11:59:49 2010 +1100

    xfs: Pull EFI/EFD handling out from under the AIL lock
    
    EFI/EFD interactions are protected from races by the AIL lock. They
    are the only type of log items that require the the AIL lock to
    serialise internal state, so they need to be separated from the AIL
    lock before we can do bulk insert operations on the AIL.
    
    To acheive this, convert the counter of the number of extents in the
    EFI to an atomic so it can be safely manipulated by EFD processing
    without locks. Also, convert the EFI state flag manipulations to use
    atomic bit operations so no locks are needed to record state
    changes. Finally, use the state bits to determine when it is safe to
    free the EFI and clean up the code to do this neatly.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 9c5f8414efd5eeed9f498d4170337a3eb126341f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Mon Dec 20 11:57:24 2010 +1100

    xfs: fix EFI transaction cancellation.
    
    XFS_EFI_CANCELED has not been set in the code base since
    xfs_efi_cancel() was removed back in 2006 by commit
    065d312e15902976d256ddaf396a7950ec0350a8 ("[XFS] Remove unused
    iop_abort log item operation), and even then xfs_efi_cancel() was
    never called. I haven't tracked it back further than that (beyond
    git history), but it indicates that the handling of EFIs in
    cancelled transactions has been broken for a long time.
    
    Basically, when we get an IOP_UNPIN(lip, 1); call from
    xfs_trans_uncommit() (i.e. remove == 1), if we don't free the log
    item descriptor we leak it. Fix the behviour to be correct and kill
    the XFS_EFI_CANCELED flag.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 821eb21d97a8b686649c08b7284d0b9f34d0e138
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Dec 2 16:31:13 2010 +1100

    xfs: connect up buffer reclaim priority hooks
    
    Now that the buffer reclaim infrastructure can handle different reclaim
    priorities for different types of buffers, reconnect the hooks in the
    XFS code that has been sitting dormant since it was ported to Linux. This
    should finally give use reclaim prioritisation that is on a par with the
    functionality that Irix provided XFS 15 years ago.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 430cbeb86fdcbbdabea7d4aa65307de8de425350
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Dec 2 16:30:55 2010 +1100

    xfs: add a lru to the XFS buffer cache
    
    Introduce a per-buftarg LRU for memory reclaim to operate on. This
    is the last piece we need to put in place so that we can fully
    control the buffer lifecycle. This allows XFS to be responsibile for
    maintaining the working set of buffers under memory pressure instead
    of relying on the VM reclaim not to take pages we need out from
    underneath us.
    
    The implementation introduces a b_lru_ref counter into the buffer.
    This is currently set to 1 whenever the buffer is referenced and so is used 
to
    determine if the buffer should be added to the LRU or not when freed.
    Effectively it allows lazy LRU initialisation of the buffer so we do not 
need
    to touch the LRU list and locks in xfs_buf_find().
    
    Instead, when the buffer is being released and we drop the last
    reference to it, we check the b_lru_ref count and if it is none zero
    we re-add the buffer reference and add the inode to the LRU. The
    b_lru_ref counter is decremented by the shrinker, and whenever the
    shrinker comes across a buffer with a zero b_lru_ref counter, if
    released the LRU reference on the buffer. In the absence of a lookup
    race, this will result in the buffer being freed.
    
    This counting mechanism is used instead of a reference flag so that
    it is simple to re-introduce buffer-type specific reclaim reference
    counts to prioritise reclaim more effectively. We still have all
    those hooks in the XFS code, so this will provide the infrastructure
    to re-implement that functionality.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit ff57ab21995a8636cfc72efeebb09cc6034d756f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Nov 30 17:27:57 2010 +1100

    xfs: convert xfsbud shrinker to a per-buftarg shrinker.
    
    Before we introduce per-buftarg LRU lists, split the shrinker
    implementation into per-buftarg shrinker callbacks. At the moment
    we wake all the xfsbufds to run the delayed write queues to free
    the dirty buffers and make their pages available for reclaim.
    However, with an LRU, we want to be able to free clean, unused
    buffers as well, so we need to separate the xfsbufd from the
    shrinker callbacks.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Alex Elder <aelder@xxxxxxx>

commit 1a427ab0c1b205d1bda8da0b77ea9d295ac23c57
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Dec 16 17:08:41 2010 +1100

    xfs: convert pag_ici_lock to a spin lock
    
    now that we are using RCU protection for the inode cache lookups,
    the lock is only needed on the modification side. Hence it is not
    necessary for the lock to be a rwlock as there are no read side
    holders anymore. Convert it to a spin lock to reflect it's exclusive
    nature.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Alex Elder <aelder@xxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 1a3e8f3da09c7082d25b512a0ffe569391e4c09a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Fri Dec 17 17:29:43 2010 +1100

    xfs: convert inode cache lookups to use RCU locking
    
    With delayed logging greatly increasing the sustained parallelism of inode
    operations, the inode cache locking is showing significant read vs write
    contention when inode reclaim runs at the same time as lookups. There is
    also a lot more write lock acquistions than there are read locks (4:1 ratio)
    so the read locking is not really buying us much in the way of parallelism.
    
    To avoid the read vs write contention, change the cache to use RCU locking 
on
    the read side. To avoid needing to RCU free every single inode, use the 
built
    in slab RCU freeing mechanism. This requires us to be able to detect 
lookups of
    freed inodes, so enÑ?ure that ever freed inode has an inode number of zero 
and
    the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache 
hit
    lookup path, but also add a check for a zero inode number as well.
    
    We canthen convert all the read locking lockups to use RCU read side locking
    and hence remove all read side locking.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Alex Elder <aelder@xxxxxxx>

commit d95b7aaf9ab6738bef1ebcc52ab66563085e44ac
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Dec 16 16:41:39 2010 +1100

    xfs: rcu free inodes
    
    Introduce RCU freeing of XFS inodes so that we can convert lookup
    traversals to use rcu_read_lock() protection. This patch only
    introduces the RCU freeing to minimise the potential conflicts with
    mainline if this is merged into mainline via a VFS patchset. It
    abuses the i_dentry list for the RCU callback structure because the
    VFS patches make this a union so it is safe to use like this and
    simplifies and merge issues.
    
    This patch uses basic RCU freeing rather than SLAB_DESTROY_BY_RCU.
    The later lookup patches need the same "found free inode" protection
    regardless of the RCU freeing method used, so once again the RCU
    freeing method can be dealt with apprpriately at merge time without
    affecting any other code.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

commit 6e857567dbbfe14dd6cc3f7414671b047b1ff5c7
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Dec 23 12:02:31 2010 +1100

    xfs: don't truncate prealloc from frequently accessed inodes
    
    A long standing problem for streaming writeÑ? through the NFS server
    has been that the NFS server opens and closes file descriptors on an
    inode for every write. The result of this behaviour is that the
    ->release() function is called on every close and that results in
    XFS truncating speculative preallocation beyond the EOF.  This has
    an adverse effect on file layout when multiple files are being
    written at the same time - they interleave their extents and can
    result in severe fragmentation.
    
    To avoid this problem, keep track of ->release calls made on a dirty
    inode. For most cases, an inode is only going to be opened once for
    writing and then closed again during it's lifetime in cache. Hence
    if there are multiple ->release calls when the inode is dirty, there
    is a good chance that the inode is being accessed by the NFS server.
    Hence set a flag the first time ->release is called while there are
    delalloc blocks still outstanding on the inode.
    
    If this flag is set when ->release is next called, then do no
    truncate away the speculative preallocation - leave it there so that
    subsequent writes do not need to reallocate the delalloc space. This
    will prevent interleaving of extents of different inodes written
    concurrently to the same AG.
    
    If we get this wrong, it is not a big deal as we truncate
    speculative allocation beyond EOF anyway in xfs_inactive() when the
    inode is thrown out of the cache.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit 055388a3188f56676c21e92962fc366ac8b5cb72
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Jan 4 11:35:03 2011 +1100

    xfs: dynamic speculative EOF preallocation
    
    Currently the size of the speculative preallocation during delayed
    allocation is fixed by either the allocsize mount option of a
    default size. We are seeing a lot of cases where we need to
    recommend using the allocsize mount option to prevent fragmentation
    when buffered writes land in the same AG.
    
    Rather than using a fixed preallocation size by default (up to 64k),
    make it dynamic by basing it on the current inode size. That way the
    EOF preallocation will increase as the file size increases.  Hence
    for streaming writes we are much more likely to get large
    preallocations exactly when we need it to reduce fragementation.
    
    For default settings, the size of the initial extents is determined
    by the number of parallel writers and the amount of memory in the
    machine. For 4GB RAM and 4 concurrent 32GB file writes:
    
    EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET                
 TOTAL
       0: [0..1048575]:         1048672..2097247      0 (1048672..2097247)      
1048576
       1: [1048576..2097151]:   5242976..6291551      0 (5242976..6291551)      
1048576
       2: [2097152..4194303]:   12583008..14680159    0 (12583008..14680159)    
2097152
       3: [4194304..8388607]:   25165920..29360223    0 (25165920..29360223)    
4194304
       4: [8388608..16777215]:  58720352..67108959    0 (58720352..67108959)    
8388608
       5: [16777216..33554423]: 117440584..134217791  0 (117440584..134217791) 
16777208
       6: [33554424..50331511]: 184549056..201326143  0 (184549056..201326143) 
16777088
       7: [50331512..67108599]: 251657408..268434495  0 (251657408..268434495) 
16777088
    
    and for 16 concurrent 16GB file writes:
    
     EXT: FILE-OFFSET           BLOCK-RANGE          AG AG-OFFSET               
  TOTAL
       0: [0..262143]:          2490472..2752615      0 (2490472..2752615)      
 262144
       1: [262144..524287]:     6291560..6553703      0 (6291560..6553703)      
 262144
       2: [524288..1048575]:    13631592..14155879    0 (13631592..14155879)    
 524288
       3: [1048576..2097151]:   30408808..31457383    0 (30408808..31457383)    
1048576
       4: [2097152..4194303]:   52428904..54526055    0 (52428904..54526055)    
2097152
       5: [4194304..8388607]:   104857704..109052007  0 (104857704..109052007)  
4194304
       6: [8388608..16777215]:  209715304..218103911  0 (209715304..218103911)  
8388608
       7: [16777216..33554423]: 452984848..469762055  0 (452984848..469762055) 
16777208
    
    Because it is hard to take back specualtive preallocation, cases
    where there are large slow growing log files on a nearly full
    filesystem may cause premature ENOSPC. Hence as the filesystem nears
    full, the maximum dynamic prealloc size Ñ?s reduced according to this
    table (based on 4k block size):
    
    freespace       max prealloc size
      >5%             full extent (8GB)
      4-5%             2GB (8GB >> 2)
      3-4%             1GB (8GB >> 3)
      2-3%           512MB (8GB >> 4)
      1-2%           256MB (8GB >> 5)
      <1%            128MB (8GB >> 6)
    
    This should reduce the amount of space held in speculative
    preallocation for such cases.
    
    The allocsize mount option turns off the dynamic behaviour and fixes
    the prealloc size to whatever the mount option specifies. i.e. the
    behaviour is unchanged.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>

commit 622d81494fa32343a4b97b607619656c7a4a6d1a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Dec 23 11:57:37 2010 +1100

    xfs: use KM_NOFS for allocations during attribute list operations
    
    When listing attributes, we are doiing memory allocations under the
    inode ilock using only KM_SLEEP. This allows memory allocation to
    recurse back into the filesystem and do writeback, which may the
    ilock we already hold on the current inode. THis will deadlock.
    Hence use KM_NOFS for such allocations outside of transaction
    context to ensure that reclaim recursion does not occur.
    
    Reported-by: Nick Piggin <npiggin@xxxxxxxxx>
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

commit dcfcf20512cb517ac18b9433b676183fa1257911
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Thu Dec 23 11:57:13 2010 +1100

    xfs: provide a inode iolock lockdep class
    
    The XFS iolock needs to be re-initialised to a new lock class before
    it enters reclaim to prevent lockdep false positives. Unfortunately,
    this is not sufficient protection as inodes in the XFS_IRECLAIMABLE
    state can be recycled and not re-initialised before being reused.
    
    We need to re-initialise the lock state when transfering out of
    XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same
    class as if the inode was just allocated. Hence we need a specific
    lockdep class variable for the iolock so that both initialisations
    use the same class.
    
    While there, add a specific class for inodes in the reclaim state so
    that it is easy to tell from lockdep reports what state the inode
    was in that generated the report.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/linux-2.6/sv.h        |   59 ----
 fs/xfs/linux-2.6/xfs_buf.c   |  235 +++++++++-----
 fs/xfs/linux-2.6/xfs_buf.h   |   22 +-
 fs/xfs/linux-2.6/xfs_linux.h |    1 -
 fs/xfs/linux-2.6/xfs_super.c |   22 +-
 fs/xfs/linux-2.6/xfs_sync.c  |   92 ++++--
 fs/xfs/linux-2.6/xfs_trace.h |   30 +-
 fs/xfs/quota/xfs_dquot.c     |    1 -
 fs/xfs/xfs_ag.h              |    2 +-
 fs/xfs/xfs_attr_leaf.c       |    4 +-
 fs/xfs/xfs_btree.c           |    9 +-
 fs/xfs/xfs_buf_item.c        |   32 ++-
 fs/xfs/xfs_extfree_item.c    |   97 +++---
 fs/xfs/xfs_extfree_item.h    |   11 +-
 fs/xfs/xfs_fsops.c           |    1 +
 fs/xfs/xfs_iget.c            |   90 ++++-
 fs/xfs/xfs_inode.c           |   54 +++-
 fs/xfs/xfs_inode.h           |   15 +-
 fs/xfs/xfs_inode_item.c      |   92 +++++-
 fs/xfs/xfs_iomap.c           |   84 +++++-
 fs/xfs/xfs_log.c             |  739 +++++++++++++++++++-----------------------
 fs/xfs/xfs_log_cil.c         |   17 +-
 fs/xfs/xfs_log_priv.h        |  121 ++++++--
 fs/xfs/xfs_log_recover.c     |   35 +--
 fs/xfs/xfs_mount.c           |   23 ++-
 fs/xfs/xfs_mount.h           |   14 +
 fs/xfs/xfs_trans.c           |   79 +++++-
 fs/xfs/xfs_trans.h           |    2 +-
 fs/xfs/xfs_trans_ail.c       |  232 +++++++-------
 fs/xfs/xfs_trans_extfree.c   |    8 +-
 fs/xfs/xfs_trans_priv.h      |   35 ++-
 fs/xfs/xfs_vnodeops.c        |   61 +++--
 32 files changed, 1403 insertions(+), 916 deletions(-)
 delete mode 100644 fs/xfs/linux-2.6/sv.h


hooks/post-receive
-- 
XFS development tree

<Prev in Thread] Current Thread [Next in Thread>
  • [XFS updates] XFS development tree branch, master, updated. v2.6.37-rc4-53-gd0eb2f3, xfs <=