[PATCH 1/2] xfs: xfs_check_page_type buffer checks need help
Brian Foster
bfoster at redhat.com
Wed Mar 5 16:08:20 CST 2014
On Wed, Mar 05, 2014 at 12:11:32PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner at redhat.com>
>
> xfs_aops_discard_page() was introduced in the following commit:
>
> xfs: truncate delalloc extents when IO fails in writeback
>
> ... to clean up left over delalloc ranges after I/O failure in
> ->writepage(). generic/224 tests for this scenario and occasionally
> reproduces panics on sub-4k blocksize filesystems.
>
> The cause of this is failure to clean up the delalloc range on a
> page where the first buffer does not match one of the expected
> states of xfs_check_page_type(). If a buffer is not unwritten,
> delayed or dirty&mapped, xfs_check_page_type() stops and
> immediately returns 0.
>
> The stress test of generic/224 creates a scenario where the first
> several buffers of a page with delayed buffers are mapped & uptodate
> and some subsequent buffer is delayed. If the ->writepage() happens
> to fail for this page, xfs_aops_discard_page() incorrectly skips
> the entire page.
>
> This then causes later failures either when direct IO maps the range
> and finds the stale delayed buffer, or we evict the inode and find
> that the inode still has a delayed block reservation accounted to
> it.
>
> We can easily fix this xfs_aops_discard_page() failure by making
> xfs_check_page_type() check all buffers, but this breaks
> xfs_convert_page() more than it is already broken. Indeed,
> xfs_convert_page() wants xfs_check_page_type() to tell it if the
> first buffers on the pages are of a type that can be aggregated into
> the contiguous IO that is already being built.
>
> xfs_convert_page() should not be writing random buffers out of a
> page, but the current behaviour will cause it to do so if there are
> buffers that don't match the current specification on the page.
> Hence for xfs_convert_page() we need to:
>
> a) return "not ok" if the first buffer on the page does not
> match the specification provided to we don't write anything;
> and
> b) abort it's buffer-add-to-io loop the moment we come
> across a buffer that does not match the specification.
>
> Hence we need to fix both xfs_check_page_type() and
> xfs_convert_page() to work correctly with pages that have mixed
> buffer types, whilst allowing xfs_aops_discard_page() to scan all
> buffers on the page for a type match.
>
> Reported-by: Brian Foster <bfoster at redhat.com>
> Signed-off-by: Dave Chinner <dchinner at redhat.com>
> ---
This looks pretty good to me as well. I notice one little thing that
might not be a real problem, but worth a quick thought...
> fs/xfs/xfs_aops.c | 81 ++++++++++++++++++++++++++++++++++---------------------
> 1 file changed, 50 insertions(+), 31 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index ef62c6b..98016b3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -632,38 +632,46 @@ xfs_map_at_offset(
> }
>
...
> /*
> @@ -697,7 +705,7 @@ xfs_convert_page(
> goto fail_unlock_page;
> if (page->mapping != inode->i_mapping)
> goto fail_unlock_page;
> - if (!xfs_check_page_type(page, (*ioendp)->io_type))
> + if (!xfs_check_page_type(page, (*ioendp)->io_type, false))
> goto fail_unlock_page;
>
> /*
> @@ -742,6 +750,15 @@ xfs_convert_page(
> p_offset = p_offset ? roundup(p_offset, len) : PAGE_CACHE_SIZE;
> page_dirty = p_offset / len;
>
> + /*
> + * The moment we find a buffer that doesn't match our current type
> + * specification or can't be written, abort the loop and start
> + * writeback. As per the above xfs_imap_valid() check, only
> + * xfs_vm_writepage() can handle partial page writeback fully - we are
> + * limited here to the buffers that are contiguous with the current
> + * ioend, and hence a buffer we can't write breaks that contiguity and
> + * we have to defer the rest of the IO to xfs_vm_writepage().
> + */
> bh = head = page_buffers(page);
> do {
> if (offset >= end_offset)
> @@ -750,7 +767,7 @@ xfs_convert_page(
> uptodate = 0;
> if (!(PageUptodate(page) || buffer_uptodate(bh))) {
> done = 1;
> - continue;
> + break;
> }
>
> if (buffer_unwritten(bh) || buffer_delay(bh) ||
> @@ -762,10 +779,11 @@ xfs_convert_page(
> else
> type = XFS_IO_OVERWRITE;
>
> - if (!xfs_imap_valid(inode, imap, offset)) {
> - done = 1;
> - continue;
> - }
> + /*
> + * imap should always be valid because of the above
> + * partial page end_offset check on the imap.
> + */
> + ASSERT(xfs_imap_valid(inode, imap, offset));
>
> lock_buffer(bh);
> if (type != XFS_IO_OVERWRITE)
> @@ -777,6 +795,7 @@ xfs_convert_page(
> count++;
> } else {
> done = 1;
> + break;
> }
> } while (offset += len, (bh = bh->b_this_page) != head);
>
The next couple lines after the loop are:
if (uptodate && bh == head)
SetPageUptodate(page);
Now that we can break out of the loop, the "bh == head" part of that
check might not necessarily mean what it used to mean. The uptodate
variable is initialized to 1 and we reset to 0 the moment we encounter a
!uptodate buffer. Do you think it's possible to get here on the first
buffer of the page, without having reset 'uptodate,' and potentially
incorrectly set the page uptodate?
Brian
> @@ -868,7 +887,7 @@ xfs_aops_discard_page(
> struct buffer_head *bh, *head;
> loff_t offset = page_offset(page);
>
> - if (!xfs_check_page_type(page, XFS_IO_DELALLOC))
> + if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
> goto out_invalidate;
>
> if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> --
> 1.9.0
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
More information about the xfs
mailing list