xfs
[Top] [All Lists]

Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH v4 5/7] fs: prioritize and separate direct_io from dax_io
From: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Thu, 5 May 2016 09:24:20 -0700
Cc: Boaz Harrosh <boaz@xxxxxxxxxxxxx>, linux-block@xxxxxxxxxxxxxxx, linux-ext4 <linux-ext4@xxxxxxxxxxxxxxx>, Jan Kara <jack@xxxxxxx>, Matthew Wilcox <matthew@xxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, XFS Developers <xfs@xxxxxxxxxxx>, Jens Axboe <axboe@xxxxxx>, Linux MM <linux-mm@xxxxxxxxx>, Al Viro <viro@xxxxxxxxxxxxxxxxxx>, linux-nvdimm <linux-nvdimm@xxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=otUzo4kj3z8VhqT3urmg1BDXql8wnOF+mBU/aG+xO/s=; b=GIRjB97WXR4sPAqpobFq0jKS9TZHpVjMlf+C1qVdMftXI2T8Ly5XvV+EKsNBK4d55r 2RKk0oYJCQKiRffrvtOeQxtFCoJlUlC6WBHIBY/K72IMenZLSENcw5xseZbEZ/vpjHtg 9PIiqxYsr+4eiQ9OWUYiQ5ynQuHjgtujMF50eIh/dPD4d/hWWHzAoAxWBU0GUGMELZwF PxUWnIoK++ZT67aiI6DaOHLm9jv711PQ6ZvGuXeegXzVVyH5sFVHncnAeSD2+vwN2Q3F UFnWg/8ldZ6qBqCdMbmfP+HoRfNv+vnpYDPcalm5GREvBeazF27ffFoKYv/V18WbBMPo NrvA==
In-reply-to: <20160505152230.GA3994@xxxxxxxxxxxxx>
References: <1461878218-3844-1-git-send-email-vishal.l.verma@xxxxxxxxx> <1461878218-3844-6-git-send-email-vishal.l.verma@xxxxxxxxx> <5727753F.6090104@xxxxxxxxxxxxx> <20160505142433.GA4557@xxxxxxxxxxxxx> <CAPcyv4gdmo5m=Arf5sp5izJfNaaAkaaMbOzud8KRcBEC8RRu1Q@xxxxxxxxxxxxxx> <20160505152230.GA3994@xxxxxxxxxxxxx>
On Thu, May 5, 2016 at 8:22 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> On Thu, May 05, 2016 at 08:15:32AM -0700, Dan Williams wrote:
>> > Agreed - makig O_DIRECT less direct than not having it is plain stupid,
>> > and I somehow missed this initially.
>>
>> Of course I disagree because like Dave argues in the msync case we
>> should do the correct thing first and make it fast later, but also
>> like Dave this arguing in circles is getting tiresome.
>
> We should do the right thing first, and make it fast later.  But this
> proposal is not getting it right - it still does not handle errors
> for the fast path, but magically makes it work for direct I/O by
> in general using a less optional path for O_DIRECT.  It's getting the
> worst of all choices.
>
> As far as I can tell the only sensible option is to:
>
>  - always try dax-like I/O first
>  - have a custom get_user_pages + rw_bytes fallback handles bad blocks
>    when hitting EIO

If you're on board with more special fallbacks for dax-capable block
devices that indeed opens up the thinking.  The O_DIRECT approach was
meant to keep the error clearing model close to the traditional block
device case, but yes that does constrain the implementation in
sub-optimal ways.

However, we still have the alignment problem in the rw_bytes case, how
do we communicate to the application that only writes with a certain
size/alignment will clear errors?  That forced alignment assumption
was the other appeal of O_DIRECT.  Perhaps we can at least start with
hole punching and block reallocation as the error clearing method
while we think more about the write-to-clear case?

> And then we need to sort out the concurrent write synchronization.
> Again there I think we absolutely have to obey Posix for the !O_DIRECT
> case and can avoid it for O_DIRECT, similar to the existing non-DAX
> semantics.  If we want any special additional semantics we _will_ need
> a special O_DAX flag.

Ok, makes sense.

<Prev in Thread] Current Thread [Next in Thread>