xfs
[Top] [All Lists]

Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_pu

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 31 Jan 2013 08:59:34 +1100
Cc: Mark Tinguely <tinguely@xxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <51094423.8000703@xxxxxxxxxx>
Mail-followup-to: Brian Foster <bfoster@xxxxxxxxxx>, Mark Tinguely <tinguely@xxxxxxx>, xfs@xxxxxxxxxxx
References: <1359492157-30521-1-git-send-email-bfoster@xxxxxxxxxx> <20130130060551.GG7255@xxxxxxxxxxxxxxxxxx> <5109291E.6090303@xxxxxxx> <51094423.8000703@xxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Jan 30, 2013 at 11:02:43AM -0500, Brian Foster wrote:
> (added Dave and the list back on CC)
> 
> On 01/30/2013 09:07 AM, Mark Tinguely wrote:
> > On 01/30/13 00:05, Dave Chinner wrote:
> >> On Tue, Jan 29, 2013 at 03:42:35PM -0500, Brian Foster wrote:
> ...
> > 
> >> So essentially what is happening here is that we are trying to lock
> >> a stale buffer in the AIL to flush it. Well, we can't flush it from
> >> the AIL, and indeed the next line of code is this:
> >>
> >>     if (!xfs_buf_trylock(bp))
> >>         return XFS_ITEM_LOCKED;
> >>
> >>>>>>>     ASSERT(!(bip->bli_flags&  XFS_BLI_STALE));
> >>
> >> The only reason this ASSERT is not firing is that we are failing to
> >> lock stale buffers. Hence we are relying on the failed lock to force
> >> the log, instead of detecting that we need to force the log after we
> >> drop the AIL lock and letting the caller handle it.
> >>
> >> So, wouldn't a better solution be to do something like simply like:
> >>
> >> +    if (bp->b_flags&  XBF_STALE)
> >> +        return XFS_ITEM_PINNED;
> >> +
> >>     if (!xfs_buf_trylock(bp))
> >>         return XFS_ITEM_LOCKED;
> ...
> > 
> 
> Thanks guys. This certainly looks nicer than messing with the lock
> wrapper, but is it susceptible to the same problem? In other words, does
> this fix the problem or just tighten the window?

That's what I need to think about more - the only difference here is
that we are checking the flag before the down_trylock() instead of
after....

> So if the buf lock covers the pinned state (e.g., buffer gets locked,
> added to a transaction, the transaction gets committed and pins and
> unlocks the buffer, IIUC) and the stale state (buf gets locked, added to
> a new transaction and inval'd before the original transaction was
> written ?), but we don't hold the buf lock in xfs_buf_item_push(), how
> can we guarantee the state of either doesn't change between the time we
> check the flags and the time the lock fails?

... but the order of them being set and checked may be significant
and hence checking the stale flag first might be sufficient to avoid
the pin count race and hence the log force. Hence this might just
need a pair of memory barriers - one here and one in xfs_buf_stale()
- to ensure that we always see the XBF_STALE flag without needing to
lock the buffer first.

> > Makes sense. It would prevent the lock recursion. The more that I think
> > about, we do not want to release xa_lock during an AIL scan.
> > 
> 
> FWIW, the other log item abstractions appear to use this model (e.g.,
> xfs_inode_item_push()), where it appears safe to drop xa_lock once the
> actual object lock/ref is acquired and reacquire xa_lock before
> returning.

Right. We can't hold the AIL while doing blocking lock operations
because the AIL lock is an innermost lock.  That's fundamentally the
cause of this problem - the trylock is doing more work that we can
allow in the context that it is being called in...

We also do not hold the AIL lock for the entire
traversal as long holds of the AIL lock result in serialisation
across the entire transaction subsystem. A successful IOP_PUSH()
typically drops and reacquires the AIL lock, but a failure doesn't.
Hence we limit the number of failed pushes under a single traversal
so that we don't spend a long time holding the lock doing no work
and instead back off and try again later....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>