On Mon, Sep 15, 2014 at 09:18:22AM -0400, Brian Foster wrote:
> On Mon, Sep 15, 2014 at 11:46:54AM +1000, Dave Chinner wrote:
> > xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > XFS has been having trouble with stray delayed allocation extents
> > beyond EOF for a long time. Recent changes to the collapse range
> > code has triggered erroneous EBUSY errors on page invalidtion for
> > block size smaller than page size filesystems. These
> > have been caused by dirty buffers beyond EOF on a partial page which
> > do not get written to disk during a sync.
> > The issue is that write-ahead in xfs_cluster_write() finds such a
> > partial page and handles it by leaving the page dirty but pushing it
> > into a writeback state. This used to work just fine, as the
> > write_cache_pages() code would then find the dirty partial page in
> > the next mapping tree lookup as the dirty tag is still set.
> > Unfortunately, when we moved to a mark and sweep approach to
> > writeback to fix other writeback sync issues, we broken this. THe
> > act of marking the page as under writeback now clears the TOWRITE
> > tag in the radix tree, even though the page is still dirty. This
> > causes the TOWRITE tag to be cleared, and hence the next lookup on
> > the mapping tree does not find the dirty partial page and so doesn't
> > try to write it again.
> > This same writeback bug was found recently in ext4 and fixed in
> > commit 1c8349a ("ext4: fix data integrity sync in ordered mode")
> > without communication to the wider filesystem community. We can use
> > exactly the same fix here so the TOWRITE flag is not cleared on
> > partial page writes.
> > cc: stable@xxxxxxxxxxxxxxx # dependent on
> > 1c8349a17137b93f0a83f276c764a6df1b9a116e
> > Root-cause-found-by: Brian Foster <bfoster@xxxxxxxxxx>
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> Looks good and fixes the collapse failure in my test.
> Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> I suppose we should prepend the collapse rework series with this patch
> to avoid the regression as it pertains to collapse (obviously the
> failure to retain towrite goes further back).
Agreed, I will do that.
> I'll continue testing with this. Are you still seeing an increase in
> such failures with the xfs_free_file_space() patch or has this quieted
> those down?
To early to sayi for sure, but signs are good - I've had xfstests
actually complete without any stray delalloc asserts occurring on
1k block size filesystems for the first time in a couple of weeks.