xfs
[Top] [All Lists]

Re: [PATCH 5/6] xfs: fix buffer shudown reference count mismatch

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH 5/6] xfs: fix buffer shudown reference count mismatch
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 7 Nov 2012 06:59:40 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20121106125949.GA32329@xxxxxxxxxxxxx>
References: <1351816724-3000-1-git-send-email-david@xxxxxxxxxxxxx> <1351816724-3000-6-git-send-email-david@xxxxxxxxxxxxx> <20121102024351.GU29378@dastard> <20121102131326.GG12578@xxxxxxxxxxxxx> <20121102234741.GB29378@dastard> <20121106125949.GA32329@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Nov 06, 2012 at 07:59:49AM -0500, Christoph Hellwig wrote:
> On Sat, Nov 03, 2012 at 10:47:41AM +1100, Dave Chinner wrote:
> > I think that's irrelevant here - there will *never* be an IO waiter
> > at this point in time.  This processing is in log buffer IO
> > completion context, so the buffers are still pinned in memory. Hence
> > anyone trying to do IO on it will be waiting in xfs_buf_wait_unpin()
> > and never get to xfs_buf_iowait(). And because xfs_buf_wait_unpin()
> > is called with the buffer lock held, we'll never do the failure
> > handling in xfs_buf_item_unpin until the buffer IO is completed and
> > it is unlocked.
> 
> How do we manage to submit it synchronously then?

I don't follow what problem you are talking about here.
Fundamentally, races with IO are resolved like this regardless of
whether the racing Io is sync or async

                                xfs_buf_lock
                                make modifications
                                .....
 xfs_buf_lock
 .....
                                xfs_trans_commit
                                ....
                                IOP_PIN()
                                IOP_UNLOCK()
 
 xfs_buf_iorequest
   xfs_buf_wait_unpin()
.....

<shutdown, no new buffers can get to xfs_buf_iorequest>

                                IOP_UNPIN(remove)
                                xfs_buf_item_unpin(remove)
                                wake_up_all(pin waiters)
                                xfs_buf_lock()
                                .....
 submit IO
 ......
 xfs_buf_ioend()
   wakeup(b_iowait)
 .....
 xfs_buf_relse()
                                xfs_buf_hold
                                xfs_buf_stale
                                ASYNC
                                xfs_buf_ioend()
                                  bp->b_iodone()
                                    xfs_buf_rele
                                    xfs_buf_ioend()
                                      xfs_buf_rele
                                        xfs_buf_free


What this also points out is that we shoul dbe checking for shutdown
after xfs_buf_wait_unpin(), too, because otherwise we are submitting
IO after the shutdown is initiated....

> The inode and dquot
> reclaim xfs_bwrite calls already wait for an unpin first, so I don't
> think these are the problem.  The only other call on a live fs seems
> to xfs_qm_shake -> xfs_buf_delwri_submit, but that one does wait
> for the complete() call on b_iowait.  I suspect we are hitting that
> and due to it skipping the wait if b_ioerror is set and waiting on
> multiple buffers that complete together might hide the issue.

We are in a shutdown situation. xfs_buf_delwri_submit() goes via
xfs_bdstrat_cb() which will stop any new IOs from being submitted
via this path. If it is blocked on the above case, then it is
also resolved by the above case...

> __xfs_buf_delwri_submit for the wait == true case also seems to be
> the only place that actually skips the ispinned check.

Sure, but that we're in a shutdown situation, so it doesn't matter -
the buffer will never get to xfs_buf_wait_unpin() because of the
shutdown check in xfs_bdstrat_cb().

I still don't see the problem you are trying to explain to me. Maybe
I'm just being dense....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>