xfs
[Top] [All Lists]

Re: [PATCH v2 2/3] mm, dax: add VM_DAX flag for DAX VMAs

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH v2 2/3] mm, dax: add VM_DAX flag for DAX VMAs
From: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Thu, 15 Sep 2016 16:19:21 -0700
Cc: Christoph Hellwig <hch@xxxxxx>, Linux MM <linux-mm@xxxxxxxxx>, "linux-nvdimm@xxxxxxxxxxxx" <linux-nvdimm@xxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, Nicholas Piggin <npiggin@xxxxxxxxx>, XFS Developers <xfs@xxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=fBdWSi2JDqwyDEOT50xMiYBUdmt/EfjTU5t5+jQGn1E=; b=gAbGYvGxViwez2INelwh3MQJICbLG1vVJgtGsCk0jfjjom66jm2nqLMkFJ5FWbOX5E 4FTqCn3cdR5+7UpKADRkh8slBAT4lFVFV968j5smcbq9TSvMMdh2PiruGZB8E3fZaI+z zEHuRw8+Fl8sjUEbXXZRmfQU+Gf7yXIpFYuvwPVcnAAvTnR4ZjAh53EKevzdTUwmN4U+ dPezUNCAuwIWAe4NQ177zKgwjZ7kbCkBzhCsbmcFxMY5xUcDy2X58WHSiH+X7KcXMcHy 6/IY9p9bpgX+eOHTuvAIzZvIjlEPVHFJQr1gpz//6aBR5VHfIC5xOo9xwcvrO6wR5+EQ R4gw==
In-reply-to: <20160915230748.GS30497@dastard>
References: <147392246509.9873.17750323049785100997.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <147392247875.9873.4205533916442000884.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20160915082615.GA9772@xxxxxx> <CAPcyv4jTw3cXpmmJRh7t16Xy2uYofDe+fJ+X_jnz+Q=o0uGneg@xxxxxxxxxxxxxx> <20160915230748.GS30497@dastard>
On Thu, Sep 15, 2016 at 4:07 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Thu, Sep 15, 2016 at 10:01:03AM -0700, Dan Williams wrote:
>> On Thu, Sep 15, 2016 at 1:26 AM, Christoph Hellwig <hch@xxxxxx> wrote:
>> > On Wed, Sep 14, 2016 at 11:54:38PM -0700, Dan Williams wrote:
>> >> The DAX property, page cache bypass, of a VMA is only detectable via the
>> >> vma_is_dax() helper to check the S_DAX inode flag.  However, this is
>> >> only available internal to the kernel and is a property that userspace
>> >> applications would like to interrogate.
>> >
>> > They have absolutely no business knowing such an implementation detail.
>>
>> Hasn't that train already left the station with FS_XFLAG_DAX?
>
> No, that's an admin flag, not a runtime hint for applications. Just
> because that flag is set on an inode, it does not mean that DAX is
> actually in use - it will be ignored if the backing dev is not dax
> capable.

Ok, but then VM_DAX does not suffer from that problem.  I'm trying to
understand why VM_DAX has no business being in the smaps "VmFlags"
line, but something ambiguous to userspace like VM_MIXEDMAP does?

>
>> The other problem with hiding the DAX property is that it turns out to
>> not be a transparent acceleration feature.  See xfs/086 xfs/088
>> xfs/089 xfs/091 which fail with DAX and, as far as I understand, it is
>> due to the fact that DAX disallows delayed allocation behavior.
>
> Which is not a bug, nor is it something that app developers should
> be surprised by.
>
> i.e. Subtle differences in error reporting behaviour occur in
> filesystems /all the time/. Run the test on a non-dax filesystem
> with an extent size hint. It fails /exactly the same way as DAX/.
> Run it with direct IO - fails the same way as DAX. Run it
> with synchronous writes - it fails the same way as DAX.
>
> IOWs, if an app can't handle the way DAX reports errors, then they
> are /broken/. Delayed allocation requires checking the return value
> of fsync() or close() to capture the allocation error - many more
> apps get that wrong than the ones that expect the immediate errors
> from write()...
>
> Anyway: to domeonstrate that the nothign is actually broken, and
> you might sometimes need to fix tests and send patches to
> fstests@xxxxxxxxxxxxxxx, this makes xfs/086 pass for me on DAX:
>
> --- a/tests/xfs/086
> +++ b/tests/xfs/086
> @@ -96,7 +96,8 @@ _scratch_mount
>
>  echo "+ modify files"
>  for x in `seq 1 64`; do
> -       $XFS_IO_PROG -f -c "pwrite -S 0x62 0 ${blksz}" "${TESTFILE}.${x}" >> 
> $seqres.full
> +       $XFS_IO_PROG -f -c "pwrite -S 0x62 0 ${blksz}" "${TESTFILE}.${x}" \
> +               >> $seqres.full 2>&1
>  done
>  umount "${SCRATCH_MNT}"

Thanks for that!  Wasn't immediately obvious to me, and didn't get
that response when I asked on the list a while back.

<Prev in Thread] Current Thread [Next in Thread>