xfs
[Top] [All Lists]

Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 1 Jul 2011 19:20:21 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20110701085958.GB30819@xxxxxxxxxxxxx>
References: <20110629140109.003209430@xxxxxxxxxxxxxxxxxxxxxx> <20110629140336.950805096@xxxxxxxxxxxxxxxxxxxxxx> <20110701022248.GM561@dastard> <20110701041851.GN561@dastard> <20110701085958.GB30819@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Fri, Jul 01, 2011 at 04:59:58AM -0400, Christoph Hellwig wrote:
> > xfs: writepage context needs to handle discontiguous page ranges
> > 
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > If the pages sent down by write_cache_pages to the writepage
> > callback are discontiguous, we need to detect this and put each
> > discontiguous page range into individual ioends. This is needed to
> > ensure that the ioend accurately represents the range of the file
> > that it covers so that file size updates during IO completion set
> > the size correctly. Failure to take into account the discontiguous
> > ranges results in files being too small when writeback patterns are
> > non-sequential.
> 
> Looks good.  I still wonder why I haven't been able to hit this.
> Haven't seen any 180 failure for a long time, with both 4k and 512 byte
> filesystems and since yesterday 1k as well.

It requires the test to run the VM out of RAM and then force enough
memory pressure for kswapd to start writeback from the LRU. The
reproducer I have is a 1p, 1GB RAM VM with it's disk image on a
100MB/s HW RAID1 w/ 512MB BBWC disk subsystem.

When kswapd starts doing writeback from the LRU, the iops rate goes
through the roof (from ~300iops @~320k/io to ~7000iops @4k/io) and
throughput drops from 100MB/s to ~30MB/s. BBWC is the only reason
the IOPS stays as high as it does - maybe that is why I saw this and
you haven't.

As it is, the kswapd writeback behaviour is utterly atrocious and,
ultimately, quite easy to provoke. I wish the MM folk would fix that
goddamn problem already - we've only been complaining about it for
the last 6 or 7 years. As such, I'm wondering if it's a bad idea to
even consider removing the .writepage clustering...

> I'll merge this, and to avoid bisect regressions it'll have to go into
> the main writepages patch.  That probaby means folding the add_to_ioend
> cleanup into it as well to not make the calling convention too ugly.

Yup, I figured you'd want to do that.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>