[XFS updates] XFS development tree branch, master, updated. v2.6.38-10128-ge4d3c4a
xfs at oss.sgi.com
xfs at oss.sgi.com
Mon May 9 19:29:19 CDT 2011
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".
The branch, master has been updated
e4d3c4a xfs: fix race condition in AIL push trigger
fd5670f xfs: make AIL target updates and compares 32bit safe.
cb64026 xfs: always push the AIL to the target
ea35a20 xfs: exit AIL push work correctly when AIL is empty
b223221 xfs: ensure reclaim cursor is reset correctly at end of AG
from 8c1fdd0be5498f852e00c5fbd9cb0c3969e46cc6 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit e4d3c4a43b595d5124ae824d300626e6489ae857
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 6 02:54:08 2011 +0000
xfs: fix race condition in AIL push trigger
The recent conversion of the xfsaild functionality to a work queue
introduced a hard-to-hit log space grant hang. One is caused by a
race condition in determining whether there is a psh in progress or
not.
The XFS_AIL_PUSHING_BIT is used to determine whether a push is
currently in progress. When the AIL push work completes, it checked
whether the target changed and cleared the PUSHING bit to allow a
new push to be requeued. The race condition is as follows:
Thread 1 push work
smp_wmb()
smp_rmb()
check ailp->xa_target unchanged
update ailp->xa_target
test/set PUSHING bit
does not queue
clear PUSHING bit
does not requeue
Now that the push target is updated, new attempts to push the AIL
will not trigger as the push target will be the same, and hence
despite trying to push the AIL we won't ever wake it again.
The fix is to ensure that the AIL push work clears the PUSHING bit
before it checks if the target is unchanged.
As a result, both push triggers operate on the same test/set bit
criteria, so even if we race in the push work and miss the target
update, the thread requesting the push will still set the PUSHING
bit and queue the push work to occur. For safety sake, the same
queue check is done if the push work detects the target change,
though only one of the two will will queue new work due to the use
of test_and_set_bit() checks.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Alex Elder <aelder at sgi.com>
commit fd5670f22fce247754243cf2ed41941e5762d990
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 6 02:54:07 2011 +0000
xfs: make AIL target updates and compares 32bit safe.
The recent conversion of the xfsaild functionality to a work queue
introduced a hard-to-hit log space grant hang. One of the problems
noticed was that updates of the push target are not 32 bit safe as
the target is a 64 bit value.
We cannot copy a 64 bit LSN without the possibility of corrupting
the result when racing with another updating thread. We have
function to do this update safely without needing to care about
32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when
updating the AIL push target.
Also move the reading of the target in the push work inside the AIL
lock, and use XFS_LSN_CMP() for the unlocked comparison during work
termination to close read holes as well.
Signed-off-by: Dave Chinner <david at fromorbit.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Alex Elder <aelder at sgi.com>
commit cb64026b6e8af50db598ec7c3f59d504259b00bb
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 6 02:54:06 2011 +0000
xfs: always push the AIL to the target
The recent conversion of the xfsaild functionality to a work queue
introduced a hard-to-hit log space grant hang. One of the problems
discovered is a target mismatch between the item pushing loop and
the target itself.
The push trigger checks for the target increasing (i.e. new target >
current) while the push loop only pushes items that have a LSN <
current. As a result, we can get the situation where the push target
is X, the items at the tail of the AIL have LSN X and they don't get
pushed. The push work then completes thinking it is done, and cannot
be restarted until the push target increases to >= X + 1. If the
push target then never increases (because the tail is not moving),
then we never run the push work again and we stall.
Fix it by making sure log items with a LSN that matches the target
exactly are pushed during the loop.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Alex Elder <aelder at sgi.com>
commit ea35a20021f8497390d05b93271b4d675516c654
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 6 02:54:05 2011 +0000
xfs: exit AIL push work correctly when AIL is empty
The recent conversion of the xfsaild functionality to a work queue
introduced a hard-to-hit log space grant hang. The main cause is a
regression where a work exit path fails to clear the PUSHING state
and recheck the target correctly.
Make both exit paths do the same PUSHING bit clearing and target
checking when the "no more work to be done" condition is hit.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Alex Elder <aelder at sgi.com>
commit b223221956675ce8a7b436d198ced974bb388571
Author: Dave Chinner <dchinner at redhat.com>
Date: Fri May 6 02:54:04 2011 +0000
xfs: ensure reclaim cursor is reset correctly at end of AG
On a 32 bit highmem PowerPC machine, the XFS inode cache was growing
without bound and exhausting low memory causing the OOM killer to be
triggered. After some effort, the problem was reproduced on a 32 bit
x86 highmem machine.
The problem is that the per-ag inode reclaim index cursor was not
getting reset to the start of the AG if the radix tree tag lookup
found no more reclaimable inodes. Hence every further reclaim
attempt started at the same index beyond where any reclaimable
inodes lay, and no further background reclaim ever occurred from the
AG.
Without background inode reclaim the VM driven cache shrinker
simply cannot keep up with cache growth, and OOM is the result.
While the change that exposed the problem was the conversion of the
inode reclaim to use work queues for background reclaim, it was not
the cause of the bug. The bug was introduced when the cursor code
was added, just waiting for some weird configuration to strike....
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Tested-By: Christian Kujau <lists at nerdbynature.de>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Alex Elder <aelder at sgi.com>
-----------------------------------------------------------------------
Summary of changes:
fs/xfs/linux-2.6/xfs_sync.c | 1 +
fs/xfs/xfs_trans_ail.c | 47 +++++++++++++++++++++++-------------------
2 files changed, 27 insertions(+), 21 deletions(-)
hooks/post-receive
--
XFS development tree
More information about the xfs
mailing list