On Wed, Jul 20, 2016 at 10:21:23AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
>
> In xfs_finish_page_writeback(), we have a loop that looks like this:
>
> do {
> if (off < bvec->bv_offset)
> goto next_bh;
> if (off > end)
> break;
> bh->b_end_io(bh, !error);
> next_bh:
> off += bh->b_size;
> } while ((bh = bh->b_this_page) != head);
>
> The b_end_io function is end_buffer_async_write(), which will call
> end_page_writeback() once all the buffers have marked as no longer
> under IO. This issue here is that the only thing currently
> protecting both the bufferhead chain and the page from being
> reclaimed is the PageWriteback state held on the page.
>
> While we attempt to limit the loop to just the buffers covered by
> the IO, we still read from the buffer size and follow the next
> pointer in the bufferhead chain. There is no guarantee that either
> of these are valid after the PageWriteback flag has been cleared.
> Hence, loops like this are completely unsafe, and result in
> use-after-free issues. One such problem was caught by Calvin Owens
> with KASAN:
>
...
>
>
> Where the access is occuring during IO completion after the buffer
> had been freed from direct memory reclaim.
>
> Prevent use-after-free accidents in this end_io processing loop by
> pre-calculating the loop conditionals before calling bh->b_end_io().
> The loop is already limited to just the bufferheads covered by the
> IO in progress, so the offset checks are sufficient to prevent
> accessing buffers in the chain after end_page_writeback() has been
> called by the the bh->b_end_io() callout.
>
> Yet another example of why Bufferheads Must Die.
>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Reported-and-Tested-by: Calvin Owens <calvinowens@xxxxxx>
> ---
Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> fs/xfs/xfs_aops.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 80714eb..0cfb944 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -87,6 +87,12 @@ xfs_find_bdev_for_inode(
> * We're now finished for good with this page. Update the page state via the
> * associated buffer_heads, paying attention to the start and end offsets
> that
> * we need to process on the page.
> + *
> + * Landmine Warning: bh->b_end_io() will call end_page_writeback() on the
> last
> + * buffer in the IO. Once it does this, it is unsafe to access the
> bufferhead or
> + * the page at all, as we may be racing with memory reclaim and it can free
> both
> + * the bufferhead chain and the page as it will see the page as clean and
> + * unused.
> */
> static void
> xfs_finish_page_writeback(
> @@ -95,8 +101,9 @@ xfs_finish_page_writeback(
> int error)
> {
> unsigned int end = bvec->bv_offset + bvec->bv_len - 1;
> - struct buffer_head *head, *bh;
> + struct buffer_head *head, *bh, *next;
> unsigned int off = 0;
> + unsigned int bsize;
>
> ASSERT(bvec->bv_offset < PAGE_SIZE);
> ASSERT((bvec->bv_offset & ((1 << inode->i_blkbits) - 1)) == 0);
> @@ -105,15 +112,17 @@ xfs_finish_page_writeback(
>
> bh = head = page_buffers(bvec->bv_page);
>
> + bsize = bh->b_size;
> do {
> + next = bh->b_this_page;
> if (off < bvec->bv_offset)
> goto next_bh;
> if (off > end)
> break;
> bh->b_end_io(bh, !error);
> next_bh:
> - off += bh->b_size;
> - } while ((bh = bh->b_this_page) != head);
> + off += bsize;
> + } while ((bh = next) != head);
> }
>
> /*
> --
> 2.8.0.rc3
>
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs
|