xfs
[Top] [All Lists]

Re: [PATCH v3 3/6] ext4: Online defrag not supported with DAX

To: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>, Jan Kara <jack@xxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, Theodore Ts'o <tytso@xxxxxxx>, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Andreas Dilger <adilger.kernel@xxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Dan Williams <dan.j.williams@xxxxxxxxx>, Jan Kara <jack@xxxxxxxx>, Jeff Layton <jlayton@xxxxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Matthew Wilcox <willy@xxxxxxxxxxxxxxx>, linux-block@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx, linux-nvdimm@xxxxxxxxxxxx, xfs@xxxxxxxxxxx
Subject: Re: [PATCH v3 3/6] ext4: Online defrag not supported with DAX
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 18 Feb 2016 11:12:23 +1100
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20160217215037.GB30126@xxxxxxxxxxxxxxx>
References: <1455680059-20126-1-git-send-email-ross.zwisler@xxxxxxxxxxxxxxx> <1455680059-20126-4-git-send-email-ross.zwisler@xxxxxxxxxxxxxxx> <20160217215037.GB30126@xxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Feb 17, 2016 at 02:50:37PM -0700, Ross Zwisler wrote:
> On Tue, Feb 16, 2016 at 08:34:16PM -0700, Ross Zwisler wrote:
> > Online defrag operations for ext4 are hard coded to use the page cache.
> > See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()
> > 
> > When combined with DAX I/O, which circumvents the page cache, this can
> > result in data corruption.  This was observed with xfstests ext4/307 and
> > ext4/308.
> > 
> > Fix this by only allowing online defrag for non-DAX files.
> 
> Jan,
> 
> Thinking about this a bit more, it's probably the case that the data
> corruption I was observing was due to us skipping the writeback of the dirty
> page cache pages because S_DAX was set.
> 
> I do think we have a problem with defrag because it is doing the extent
> swapping using the page cache, and we won't flush the dirty pages due to
> S_DAX being set.
> 
> This patch is the quick and easy answer, and is perhaps appropriate for v4.5.
> 
> Looking forward, though, what do you think the correct solution is?  Making an
> extent swapper that doesn't use the page cache (as I believe XFS has? see
> xfs_swap_extents()),

XFS does the data copy in userspace using direct IO so we don't
care about whether DAX is enabled or not on either the source or
destination inode. i.e. xfs_swap_extents() is a pure
metadata operation, swapping the entire extent tree between two
inodes if the source data has not changed while the copy was in
progress.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>