xfs
[Top] [All Lists]

Re: [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path

To: Chandra Seetharaman <sekharan@xxxxxxxxxx>
Subject: Re: [PATCH] xfs: Fix a deadlock in xfs_log_commit_cil() code path
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 16 Jul 2013 10:54:55 +1000
Cc: XFS mailing list <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1373928754.20769.41.camel@xxxxxxxxxxxxxxxxxx>
References: <1373928754.20769.41.camel@xxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, Jul 15, 2013 at 05:52:34PM -0500, Chandra Seetharaman wrote:
> While testing and rearranging my pquota/gquota code, I stumbled
> on a xfs_shutdown() during a mount. But the mount just hung.
> 
> I debugged and found that there is a deadlock involving
> &log->l_cilp->xc_ctx_lock.
> 
> It is in a code path where &log->l_cilp->xc_ctx_lock is first
> acquired in read mode and some levels down the same semaphore
> is being acquired in write mode causing a deadlock.
> 
> This is the stack:
> xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
>   xlog_print_tic_res
>     xfs_force_shutdown
>       xfs_log_force_umount
>         xlog_cil_force
>           xlog_cil_force_lsn
>             xlog_cil_push_foreground
>               xlog_cil_push - tries to acquire same semaphore in write mode
> 
> This patch fixes the deadlock by not calling xfs_force_shutdown() while
> holding the semaphore, instead calling it after dropping teh semaphore.
> 
> Thanks to Dave for suggesting this solution.
> 
> Signed-off-by: Chandra Seetharaman <sekharan@xxxxxxxxxx>
> 
> ---
>  fs/xfs/xfs_log.c      |    6 +++---
>  fs/xfs/xfs_log_cil.c  |   10 ++++++----
>  fs/xfs/xfs_log_priv.h |    2 +-
>  fs/xfs/xfs_trans.c    |    2 +-
>  4 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index d852a2b..b9fa2da 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -1837,7 +1837,7 @@ xlog_state_finish_copy(
>   * print out info relating to regions written which consume
>   * the reservation
>   */
> -void
> +int
>  xlog_print_tic_res(
>       struct xfs_mount        *mp,
>       struct xlog_ticket      *ticket)
> @@ -1941,7 +1941,7 @@ xlog_print_tic_res(
>  
>       xfs_alert_tag(mp, XFS_PTAG_LOGRES,
>               "xlog_write: reservation ran out. Need to up reservation");
> -     xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> +     return EFSCORRUPTED;

Note the "SHUTDOWN_CORRUPT_INCORE" reason given here....

> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 35a2299..d96022f 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -1547,7 +1547,7 @@ xfs_trans_commit(
>       xfs_trans_apply_dquot_deltas(tp);
>  
>       error = xfs_log_commit_cil(mp, tp, &commit_lsn, flags);
> -     if (error == ENOMEM) {
> +     if (error) {
>               xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);

Which is different to the reason given here. The shutdown reason
should be maintained for this particular error....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>