deadlock with &log->l_cilp->xc_ctx_lock semaphone
Chandra Seetharaman
sekharan at us.ibm.com
Thu May 23 13:09:02 CDT 2013
On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > Hello,
> >
> > While testing and rearranging my pquota/gquota code, I stumbled on a
> > xfs_shutdown() during a mount. But the mount just hung.
> >
> > I debugged and found that it is in a code path where
> > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > down the same semaphore is being acquired in write mode causing a
> > deadlock.
> >
> > This is the stack:
> > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > xlog_print_tic_res
> > xfs_force_shutdown
> > xfs_log_force_umount
> > xlog_cil_force
> > xlog_cil_force_lsn
> > xlog_cil_push_foreground
> > xlog_cil_push - tries to acquire same semaphore in write mode
>
> Which means you had a transaction reservation overrun. Is it
> reproducable? iDo you have the output from xlog_print_tic_res()?
> Because:
Here it is:
May 23 10:48:52 test46 kernel: [ 77.500728] XFS (sdh8): xlog_write: reservation summary:
May 23 10:48:52 test46 kernel: [ 77.500728] trans type = QM_SBCHANGE (26)
May 23 10:48:52 test46 kernel: [ 77.500728] unit res = 2740 bytes
May 23 10:48:52 test46 kernel: [ 77.500728] current res = -48 bytes
May 23 10:48:52 test46 kernel: [ 77.500728] total reg = 0 bytes (o/flow = 0 bytes)
May 23 10:48:52 test46 kernel: [ 77.500728] ophdrs = 0 (ophdr space = 0 bytes)
May 23 10:48:52 test46 kernel: [ 77.500728] ophdr + reg = 0 bytes
May 23 10:48:52 test46 kernel: [ 77.500728] num regions = 0
May 23 10:48:52 test46 kernel: [ 77.500728]
Yes. I can readily reproduce the problem, but it is with my mangled up
patchsets :). There is a small change that makes this problem reproduce
consistently.
>
> > xfs_trans_commit+0x79/0x270 [xfs]
> > xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> > xfs_qm_mount_quotas+0x82/0x180 [xfs]
> > xfs_mountfs+0x5f6/0x6b0 [xfs]
>
> This transaction only modifies the superblock, and it has a buffer
> reservation for a superblock sized buffer, and hence should never
> overrun.
>
> IOWs, I'm ifar more concerned about the fact there was a
> transaction overrun than they was a hang in the path that handles
As I mentioned above, it may be a manipulation of my patch entanglement.
> the overrun. The fact this hang has been there since 2.6.35 tells
> you how rare transactions overruns are....
>
> FWIW, the fix for the hang is to make xlog_print_tic_res() return an
> error and have the caller handle the shutdown.
>
> Cheers,
>
> Dave.
More information about the xfs
mailing list