xfs
[Top] [All Lists]

Re: [PATCH 1/2] xfs: xfs_check_page_type buffer checks need help

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 1/2] xfs: xfs_check_page_type buffer checks need help
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Wed, 5 Mar 2014 17:08:20 -0500
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1393981893-2497-2-git-send-email-david@xxxxxxxxxxxxx>
References: <1393981893-2497-1-git-send-email-david@xxxxxxxxxxxxx> <1393981893-2497-2-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Mar 05, 2014 at 12:11:32PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> xfs_aops_discard_page() was introduced in the following commit:
> 
>   xfs: truncate delalloc extents when IO fails in writeback
> 
> ... to clean up left over delalloc ranges after I/O failure in
> ->writepage(). generic/224 tests for this scenario and occasionally
> reproduces panics on sub-4k blocksize filesystems.
> 
> The cause of this is failure to clean up the delalloc range on a
> page where the first buffer does not match one of the expected
> states of xfs_check_page_type(). If a buffer is not unwritten,
> delayed or dirty&mapped, xfs_check_page_type() stops and
> immediately returns 0.
> 
> The stress test of generic/224 creates a scenario where the first
> several buffers of a page with delayed buffers are mapped & uptodate
> and some subsequent buffer is delayed. If the ->writepage() happens
> to fail for this page, xfs_aops_discard_page() incorrectly skips
> the entire page.
> 
> This then causes later failures either when direct IO maps the range
> and finds the stale delayed buffer, or we evict the inode and find
> that the inode still has a delayed block reservation accounted to
> it.
> 
> We can easily fix this xfs_aops_discard_page() failure by making
> xfs_check_page_type() check all buffers, but this breaks
> xfs_convert_page() more than it is already broken. Indeed,
> xfs_convert_page() wants xfs_check_page_type() to tell it if the
> first buffers on the pages are of a type that can be aggregated into
> the contiguous IO that is already being built.
> 
> xfs_convert_page() should not be writing random buffers out of a
> page, but the current behaviour will cause it to do so if there are
> buffers that don't match the current specification on the page.
> Hence for xfs_convert_page() we need to:
> 
>       a) return "not ok" if the first buffer on the page does not
>       match the specification provided to we don't write anything;
>       and
>       b) abort it's buffer-add-to-io loop the moment we come
>       across a buffer that does not match the specification.
> 
> Hence we need to fix both xfs_check_page_type() and
> xfs_convert_page() to work correctly with pages that have mixed
> buffer types, whilst allowing xfs_aops_discard_page() to scan all
> buffers on the page for a type match.
> 
> Reported-by: Brian Foster <bfoster@xxxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---

This looks pretty good to me as well. I notice one little thing that
might not be a real problem, but worth a quick thought...

>  fs/xfs/xfs_aops.c | 81 
> ++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 50 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index ef62c6b..98016b3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -632,38 +632,46 @@ xfs_map_at_offset(
>  }
>  
...
>  /*
> @@ -697,7 +705,7 @@ xfs_convert_page(
>               goto fail_unlock_page;
>       if (page->mapping != inode->i_mapping)
>               goto fail_unlock_page;
> -     if (!xfs_check_page_type(page, (*ioendp)->io_type))
> +     if (!xfs_check_page_type(page, (*ioendp)->io_type, false))
>               goto fail_unlock_page;
>  
>       /*
> @@ -742,6 +750,15 @@ xfs_convert_page(
>       p_offset = p_offset ? roundup(p_offset, len) : PAGE_CACHE_SIZE;
>       page_dirty = p_offset / len;
>  
> +     /*
> +      * The moment we find a buffer that doesn't match our current type
> +      * specification or can't be written, abort the loop and start
> +      * writeback. As per the above xfs_imap_valid() check, only
> +      * xfs_vm_writepage() can handle partial page writeback fully - we are
> +      * limited here to the buffers that are contiguous with the current
> +      * ioend, and hence a buffer we can't write breaks that contiguity and
> +      * we have to defer the rest of the IO to xfs_vm_writepage().
> +      */
>       bh = head = page_buffers(page);
>       do {
>               if (offset >= end_offset)
> @@ -750,7 +767,7 @@ xfs_convert_page(
>                       uptodate = 0;
>               if (!(PageUptodate(page) || buffer_uptodate(bh))) {
>                       done = 1;
> -                     continue;
> +                     break;
>               }
>  
>               if (buffer_unwritten(bh) || buffer_delay(bh) ||
> @@ -762,10 +779,11 @@ xfs_convert_page(
>                       else
>                               type = XFS_IO_OVERWRITE;
>  
> -                     if (!xfs_imap_valid(inode, imap, offset)) {
> -                             done = 1;
> -                             continue;
> -                     }
> +                     /*
> +                      * imap should always be valid because of the above
> +                      * partial page end_offset check on the imap.
> +                      */
> +                     ASSERT(xfs_imap_valid(inode, imap, offset));
>  
>                       lock_buffer(bh);
>                       if (type != XFS_IO_OVERWRITE)
> @@ -777,6 +795,7 @@ xfs_convert_page(
>                       count++;
>               } else {
>                       done = 1;
> +                     break;
>               }
>       } while (offset += len, (bh = bh->b_this_page) != head);
> 

The next couple lines after the loop are:

        if (uptodate && bh == head)
                SetPageUptodate(page);

Now that we can break out of the loop, the "bh == head" part of that
check might not necessarily mean what it used to mean. The uptodate
variable is initialized to 1 and we reset to 0 the moment we encounter a
!uptodate buffer. Do you think it's possible to get here on the first
buffer of the page, without having reset 'uptodate,'  and potentially
incorrectly set the page uptodate?

Brian

> @@ -868,7 +887,7 @@ xfs_aops_discard_page(
>       struct buffer_head      *bh, *head;
>       loff_t                  offset = page_offset(page);
>  
> -     if (!xfs_check_page_type(page, XFS_IO_DELALLOC))
> +     if (!xfs_check_page_type(page, XFS_IO_DELALLOC, true))
>               goto out_invalidate;
>  
>       if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> -- 
> 1.9.0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>