xfs
[Top] [All Lists]

[XFS updates] XFS development tree branch, master, updated. v2.6.34-81-g

To: xfs@xxxxxxxxxxx
Subject: [XFS updates] XFS development tree branch, master, updated. v2.6.34-81-gf936972
From: xfs@xxxxxxxxxxx
Date: Fri, 4 Jun 2010 15:30:43 -0500
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".

The branch, master has been updated
  f936972 xfs: improve xfs_isilocked
  070ecdc xfs: skip writeback from reclaim context
  5b257b4 xfs: fix race in inode cluster freeing failing to stale inodes
      from  fb3b504adeee942e55393396fea8fdf406acf037 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit f9369729496a0f4c607a4cc1ea4dfeddbbfc505a
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date:   Thu Jun 3 16:22:29 2010 +1000

    xfs: improve xfs_isilocked
    
    Use rwsem_is_locked to make the assertations for shared locks work.
    
    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

commit 070ecdca54dde9577d2697088e74e45568f48efb
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date:   Thu Jun 3 16:22:29 2010 +1000

    xfs: skip writeback from reclaim context
    
    Allowing writeback from reclaim context causes massive problems with stack
    overflows as we can call into the writeback code which tends to be a heavy
    stack user both in the generic code and XFS from random contexts that
    perform memory allocations.
    
    Follow the example of btrfs (and in slightly different form ext4) and refuse
    to write out data from reclaim context.  This issue should really be handled
    by the VM so that we can tune better for this case, but until we get it
    sorted out there we have to hack around this in each filesystem with a
    complex writeback path.
    
    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

commit 5b257b4a1f9239624c6b5e669763de04e482c2b3
Author: Dave Chinner <david@xxxxxxxxxxxxx>
Date:   Thu Jun 3 16:22:29 2010 +1000

    xfs: fix race in inode cluster freeing failing to stale inodes
    
    When an inode cluster is freed, it needs to mark all inodes in memory as
    XFS_ISTALE before marking the buffer as stale. This is eeded because the 
inodes
    have a different life cycle to the buffer, and once the buffer is torn down
    during transaction completion, we must ensure none of the inodes get written
    back (which is what XFS_ISTALE does).
    
    Unfortunately, xfs_ifree_cluster() has some bugs that lead to inodes not 
being
    marked with XFS_ISTALE. This shows up when xfs_iflush() is called on these
    inodes either during inode reclaim or tail pushing on the AIL.  The buffer 
is
    read back, but no longer contains inodes and so triggers assert failures and
    shutdowns. This was reproducable with at run.dbench10 invocation from 
xfstests.
    
    There are two main causes of xfs_ifree_cluster() failing. The first is 
simple -
    it checks in-memory inodes it finds in the per-ag icache to see if they are
    clean without holding the flush lock. if they are clean it skips them
    completely. However, If an inode is flushed delwri, it will
    appear clean, but is not guaranteed to be written back until the flush lock 
has
    been dropped. Hence we may have raced on the clean check and the inode may
    actually be dirty. Hence always mark inodes found in memory stale before we
    check properly if they are clean.
    
    The second is more complex, and makes the first problem easier to hit.
    Basically the in-memory inode scan is done with full knowledge it can be 
racing
    with inode flushing and AIl tail pushing, which means that inodes that it 
can't
    get the flush lock on might not be attached to the buffer after then 
in-memory
    inode scan due to IO completion occurring. This is actually documented in 
the
    code as "needs better interlocking". i.e. this is a zero-day bug.
    
    Effectively, the in-memory scan must be done while the inode buffer is 
locked
    and Io cannot be issued on it while we do the in-memory inode scan. This
    ensures that inodes we couldn't get the flush lock on are guaranteed to be
    attached to the cluster buffer, so we can then catch all in-memory inodes 
and
    mark them stale.
    
    Now that the inode cluster buffer is locked before the in-memory scan is 
done,
    there is no need for the two-phase update of the in-memory inodes, so 
simplify
    the code into two loops and remove the allocation of the temporary buffer 
used
    to hold locked inodes across the phases.
    
    Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
    Reviewed-by: Christoph Hellwig <hch@xxxxxx>

-----------------------------------------------------------------------

Summary of changes:
 fs/xfs/linux-2.6/xfs_aops.c |   15 +++++
 fs/xfs/xfs_iget.c           |   26 +++-----
 fs/xfs/xfs_inode.c          |  142 +++++++++++++++++++------------------------
 3 files changed, 87 insertions(+), 96 deletions(-)


hooks/post-receive
-- 
XFS development tree

<Prev in Thread] Current Thread [Next in Thread>
  • [XFS updates] XFS development tree branch, master, updated. v2.6.34-81-gf936972, xfs <=