[Top] [All Lists]

Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
From: Chandra Seetharaman <sekharan@xxxxxxxxxx>
Date: Tue, 18 Jun 2013 18:18:35 -0500
Cc: XFS mailing list <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130522234129.GN29466@dastard>
Organization: IBM
References: <1369264363.10223.2994.camel@xxxxxxxxxxxxxxxxxx> <20130522234129.GN29466@dastard>
Reply-to: sekharan@xxxxxxxxxx
On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > Hello,
> > 
> > While testing and rearranging my pquota/gquota code, I stumbled on a
> > xfs_shutdown() during a mount. But the mount just hung.
> > 
> > I debugged and found that it is in a code path where
> > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > down the same semaphore is being acquired in write mode causing a
> > deadlock.
> > 
> > This is the stack:
> > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> >   xlog_print_tic_res
> >     xfs_force_shutdown
> >       xfs_log_force_umount
> >         xlog_cil_force
> >           xlog_cil_force_lsn
> >             xlog_cil_push_foreground
> >               xlog_cil_push - tries to acquire same semaphore in write mode
> Which means you had a transaction reservation overrun. Is it
> reproducable? iDo you have the output from xlog_print_tic_res()?
> Because:
> > xfs_trans_commit+0x79/0x270 [xfs]  
> > xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> > xfs_qm_mount_quotas+0x82/0x180 [xfs]
> > xfs_mountfs+0x5f6/0x6b0 [xfs]
> This transaction only modifies the superblock, and it has a buffer
> reservation for a superblock sized buffer, and hence should never
> overrun.
> IOWs, I'm ifar more concerned about the fact there was a
> transaction overrun than they was a hang in the path that handles
> the overrun. The fact this hang has been there since 2.6.35 tells
> you how rare transactions overruns are....
> FWIW, the fix for the hang is to make xlog_print_tic_res() return an
> error and have the caller handle the shutdown.


There are few ways this can be done, but each of them seem to change the
code behavior. Wanted to get your opinion on which is the correct way.

(1) - don't shutdown in xlog_print_tic_res();
    - upon return from xlog_print_tic_res(), do a up_read, and shutdown
    - at the end of fucntion, up_read only if we haven't done it already
  Behavior change: The protection offered by the semaphore is not 
      available to the code block from shutdown to end of function
(2) - don't shoudown in xlog_print_tic_res()
    - upon return just set a flag
    - at the end of function, after up_read, do shutdown
  Behavior change: Shutdown is delayed to a later point.

I prefer (2) since (1) drops the protection. But, do not know the
ramifications of delaying the shutdown. Can you comment ?


> Cheers,
> Dave.

<Prev in Thread] Current Thread [Next in Thread>