[Top] [All Lists]

Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 30 Jun 2011 12:00:13 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20110629140336.950805096@xxxxxxxxxxxxxxxxxxxxxx>
References: <20110629140109.003209430@xxxxxxxxxxxxxxxxxxxxxx> <20110629140336.950805096@xxxxxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Wed, Jun 29, 2011 at 10:01:12AM -0400, Christoph Hellwig wrote:
> Instead of implementing our own writeback clustering use write_cache_pages
> to do it for us.  This means the guts of the current writepage implementation
> become a new helper used both for implementing ->writepage and as a callback
> to write_cache_pages for ->writepages.  A new struct xfs_writeback_ctx
> is used to track block mapping state and the ioend chain over multiple
> invocation of it.
> The advantage over the old code is that we avoid a double pagevec lookup,
> and a more efficient handling of extent boundaries inside a page for
> small blocksize filesystems, as well as having less XFS specific code.

Yes, it should be, but I can't actually measure any noticable CPU
usage difference @800MB/s writeback. The profiles change shape
around the changed code, but overall cpu usage does not change. I
think this is because the second pagevec lookup is pretty much free
because the radix tree is already hot in cache when we do the second

> The downside is that we don't do writeback clustering when called from
> kswapd anyore, but that is a case that should be avoided anyway.  Note
> that we still convert the whole delalloc range from ->writepage, so
> the on-disk allocation pattern is not affected.

All the more reason to ensure the mm subsystem doesn't do this....

>  error:
> -     if (iohead)
> -             xfs_cancel_ioend(iohead);
> -
> -     if (err == -EAGAIN)
> -             goto redirty;
> -

Should this EAGAIN handling be dealt with in the removing-the-non-
blocking-mode patch?

> +STATIC int
>  xfs_vm_writepages(
>       struct address_space    *mapping,
>       struct writeback_control *wbc)
>  {
> +     struct xfs_writeback_ctx ctx = { };
> +     int ret;
> +
>       xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
> -     return generic_writepages(mapping, wbc);
> +
> +     ret = write_cache_pages(mapping, wbc, __xfs_vm_writepage, &ctx);
> +
> +     if (ctx.iohead) {
> +             if (ret)
> +                     xfs_cancel_ioend(ctx.iohead);
> +             else
> +                     xfs_submit_ioend(wbc, ctx.iohead);
> +     }

I think this error handling does not work. If we have put pages into
the ioend (i.e. successful ->writepage calls) and then have a
->writepage call fail, we'll get all the pages under writeback (i.e.
those on the ioend) remain in that state, and not ever get written
back (so move into the clean state) or redirtied (so written again

xfs_cancel_ioend() was only ever called for the first page sent down
to ->writepage, and on error that page was redirtied separately.
Hence it doesn't handle this case at all as it never occurs in the
existing code.

I'd suggest that regardless of whether an error is returned here,
the existence of ctx.iohead indicates a valid ioend that needs to be


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>