[Top] [All Lists]

Re: [PATCH] xfs: fix buffer shudown reference count mismatch

To: Mark Tinguely <tinguely@xxxxxxx>
Subject: Re: [PATCH] xfs: fix buffer shudown reference count mismatch
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 1 Nov 2012 12:26:38 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <50915B51.3050309@xxxxxxx>
References: <1351556454-29723-1-git-send-email-david@xxxxxxxxxxxxx> <50915B51.3050309@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Oct 31, 2012 at 12:09:37PM -0500, Mark Tinguely wrote:
> On 10/29/12 19:20, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@xxxxxxxxxx>
> >
> >When we shut down the filesystem, we have to unpin and free all the
> >buffers currently active in the CIL. To do this we unpin and remove
> >them in one operation as a result of a failed iclogbuf write. For
> >buffers, we do this removal via a simultated IO completion of after
> >marking the buffer stale.
> >
> >At the time we do this, we have two references to the buffer - the
> >active LRU reference and the buf log item.  The LRU reference is
> >removed by marking the buffer stale, and the active CIL reference is
> >by the xfs_buf_iodone() callback that is run by
> >xfs_buf_do_callbacks() during ioend processing (via the bp->b_iodone
> >callback).
> >
> >However, ioend processing requires one more reference - that of the
> >IO that it is completing. We don't have this reference, so we free
> >the buffer prematurely and use it after it is freed. This leads to
> >assert failures in xfs_buf_rele() on debug kernels because the
> >b_hold count is zero.
> >
> >Fix this by making sure we take the necessary IO reference before
> >starting IO completion processing on the stale buffer.
> >
> >Cc:<stable@xxxxxxxxxxxxxxx>
> >Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
> This seems to take care of one of the ASSERT that I experienced after
> the worker move series.

It should - it's the shutdown failure you've reported for some time
now ;)

> With this patch applied, there is a new ASSERT that the perag is not
> empty in filesystem unmount in test 179. I think this is related to
> the worker move series and not this patch. I will send the information
> in a different thread.

That's been around for a long, long time. I've never been able to
reproduce it reliably - I see it maybe once every couple of months -
so I've never been able to get to the bottom of it....

> Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>



Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>