xfs
[Top] [All Lists]

Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
From: Chandra Seetharaman <sekharan@xxxxxxxxxx>
Date: Thu, 23 May 2013 13:09:02 -0500
Cc: XFS mailing list <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20130522234129.GN29466@dastard>
Organization: IBM
References: <1369264363.10223.2994.camel@xxxxxxxxxxxxxxxxxx> <20130522234129.GN29466@dastard>
Reply-to: sekharan@xxxxxxxxxx
On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > Hello,
> > 
> > While testing and rearranging my pquota/gquota code, I stumbled on a
> > xfs_shutdown() during a mount. But the mount just hung.
> > 
> > I debugged and found that it is in a code path where
> > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > down the same semaphore is being acquired in write mode causing a
> > deadlock.
> > 
> > This is the stack:
> > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> >   xlog_print_tic_res
> >     xfs_force_shutdown
> >       xfs_log_force_umount
> >         xlog_cil_force
> >           xlog_cil_force_lsn
> >             xlog_cil_push_foreground
> >               xlog_cil_push - tries to acquire same semaphore in write mode
> 
> Which means you had a transaction reservation overrun. Is it
> reproducable? iDo you have the output from xlog_print_tic_res()?
> Because:

Here it is:

May 23 10:48:52 test46 kernel: [   77.500728] XFS (sdh8): xlog_write: 
reservation summary:
May 23 10:48:52 test46 kernel: [   77.500728]   trans type  = QM_SBCHANGE (26)
May 23 10:48:52 test46 kernel: [   77.500728]   unit res    = 2740 bytes
May 23 10:48:52 test46 kernel: [   77.500728]   current res = -48 bytes
May 23 10:48:52 test46 kernel: [   77.500728]   total reg   = 0 bytes (o/flow = 
0 bytes)
May 23 10:48:52 test46 kernel: [   77.500728]   ophdrs      = 0 (ophdr space = 
0 bytes)
May 23 10:48:52 test46 kernel: [   77.500728]   ophdr + reg = 0 bytes
May 23 10:48:52 test46 kernel: [   77.500728]   num regions = 0
May 23 10:48:52 test46 kernel: [   77.500728]

Yes. I can readily reproduce the problem, but it is with my mangled up
patchsets :). There is a small change that makes this problem reproduce
consistently.
> 
> > xfs_trans_commit+0x79/0x270 [xfs]  
> > xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> > xfs_qm_mount_quotas+0x82/0x180 [xfs]
> > xfs_mountfs+0x5f6/0x6b0 [xfs]
> 
> This transaction only modifies the superblock, and it has a buffer
> reservation for a superblock sized buffer, and hence should never
> overrun.
> 
> IOWs, I'm ifar more concerned about the fact there was a
> transaction overrun than they was a hang in the path that handles

As I mentioned above, it may be a manipulation of my patch entanglement.

> the overrun. The fact this hang has been there since 2.6.35 tells
> you how rare transactions overruns are....
> 
> FWIW, the fix for the hang is to make xlog_print_tic_res() return an
> error and have the caller handle the shutdown.
> 
> Cheers,
> 
> Dave.


<Prev in Thread] Current Thread [Next in Thread>