On Mon, May 21, 2012 at 01:21:26PM -0400, Brian Foster wrote:
> Running xfstests 273 in a loop reproduces an XFS lockup due to
> xfsaild entering idle mode indefinitely. The following
> high-level sequence of events leads to the hang:
> - xfsaild is running with a cached target lsn
> - xfs_ail_push() is invoked, updates ailp->xa_target_lsn and
> invokes wake_up_process(). wake_up_process() returns 0
> because xfsaild is already running.
> - xfsaild enters idle mode having met its current target.
> Once in the described state, xfs_ail_push() is invoked many
> more times with the already set threshold_lsn, but these calls
> do not lead to wake_up_process() calls because no further
> invocations result in moving the threshold_lsn forward. Add a
> flag to xfs_ail to capture whether an issued wake actually
> succeeds. If not, continue issuing wakes until we know one has
> been successful for the current target.
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> fs/xfs/xfs_trans_ail.c | 4 ++--
> fs/xfs/xfs_trans_priv.h | 1 +
> 2 files changed, 3 insertions(+), 2 deletions(-)
> diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
> index 5818076..35ae0d3 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -576,7 +576,7 @@ xfs_ail_push(
> lip = xfs_ail_min(ailp);
> if (!lip || XFS_FORCED_SHUTDOWN(ailp->xa_mount) ||
> - XFS_LSN_CMP(threshold_lsn, ailp->xa_target) <= 0)
> + ((XFS_LSN_CMP(threshold_lsn, ailp->xa_target) <= 0) &&
> @@ -587,7 +587,7 @@ xfs_ail_push(
> xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn);
> - wake_up_process(ailp->xa_task);
> + ailp->xa_pending_wake = !wake_up_process(ailp->xa_task);
That means xa_pending_wake is true when the aild is already running.
Which means that the above check for !ailp->xa_pending_wake mean
that we don't try a wakeup when the aild is not running. Is that
what you meant?
As it is, i don't think this is really all that sane - it has no
locking, so is going to be racy, and is really just working around
problems where the xfsaild goes to sleep without having completed
it's work. So if this patch is necessary, it is just papering over a
problem in the logic in the xfsaild where it is idling instead of
sleeping for a short time and starting again....