xfs
[Top] [All Lists]

Re: [PATCH 3/6] xfs: Don't use unwritten extents for DAX

To: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, Dan Williams <dan.j.williams@xxxxxxxxx>, Brian Foster <bfoster@xxxxxxxxxx>, Jan Kara <jack@xxxxxxx>, xfs@xxxxxxxxxxx, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, "linux-nvdimm@xxxxxxxxxxxx" <linux-nvdimm@xxxxxxxxxxxx>
Subject: Re: [PATCH 3/6] xfs: Don't use unwritten extents for DAX
From: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Tue, 3 Nov 2015 17:02:34 -0800
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel_com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=N4jM+HDPSbVpZcWqeoxEBbchn71lxV50Clk0X0hO9ss=; b=E9fcSw8ZQsiYDJDSSI9tPgb1MRemRo6sf3prz3tGfc4Gfoe+wZtLLXxDnCgJIdij5W 7tTjuhHdgdA2HnIxh7kHBAhbL0nEztMTQvEid8QpzuuaJaftsFEiO/aVY7yGytcRXXab GLgJjoJWAP45DY/FXnFrxyepivGKZDBqDEQALHvimoTM6hFLobyXAUTZBFDSrGdnq3lm tqk2RNTGHr/NRrBNhb4ZRxZhKK/87SyXM+y5+Zgg+Oi4jQ1/C4Ghh2SsRqiNBkcLgJ6w btyJKhlwZnDVuN153jAMYr1fuVOU4eJaTKpiwxT/Cr7d3Y1EY5ZKcQjL41XbZ5N0uudz 7GiQ==
In-reply-to: <20151104005056.GA24710@xxxxxxxxxxxxxxx>
References: <1445225238-30413-1-git-send-email-david@xxxxxxxxxxxxx> <1445225238-30413-4-git-send-email-david@xxxxxxxxxxxxx> <20151029142950.GE11663@xxxxxxxxxxxxxxx> <20151029233756.GS19199@dastard> <20151030123657.GC54905@xxxxxxxxxxxxxxx> <20151102011433.GW19199@dastard> <20151102141509.GA29346@xxxxxxxxxxxxxxx> <20151102214424.GJ10656@dastard> <CAPcyv4i_D6TuV8B6WF-5JoBdgh9FZbeBim8=s45RnQfhWAVpYg@xxxxxxxxxxxxxx> <20151103050413.GB19199@dastard> <20151104005056.GA24710@xxxxxxxxxxxxxxx>
On Tue, Nov 3, 2015 at 4:50 PM, Ross Zwisler
<ross.zwisler@xxxxxxxxxxxxxxx> wrote:
> On Tue, Nov 03, 2015 at 04:04:13PM +1100, Dave Chinner wrote:
>> On Mon, Nov 02, 2015 at 07:53:27PM -0800, Dan Williams wrote:
>> > On Mon, Nov 2, 2015 at 1:44 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> <>
>> > > This comes back to the comments I made w.r.t. the pmem driver
>> > > implementation doing synchronous IO by immediately forcing CPU cache
>> > > flushes and barriers. it's obviously correct, but it looks like
>> > > there's going to be a major performance penalty associated with it.
>> > > This is why I recently suggested that a pmem driver that doesn't do
>> > > CPU cache writeback during IO but does it on REQ_FLUSH is an
>> > > architecture we'll likely have to support.
>> > >
>> >
>> > The only thing we can realistically delay is wmb_pmem() i.e. the final
>> > sync waiting for data that has *left* the cpu cache.  Unless/until we
>> > get a architecturally guaranteed method to write-back the entire
>> > cache, or flush the cache by physical-cache-way we're stuck with
>> > either non-temporal cycles or looping on potentially huge virtual
>> > address ranges.
>>
>> I'm missing something: why won't flushing the address range returned
>> by bdev_direct_access() during a fsync operation work? i.e. we're
>> working with exactly the same address as dax_clear_blocks() and
>> dax_do_io() use, so why can't we look up that address and flush it
>> from fsync?
>
> I could be wrong, but I don't see a reason why DAX can't use the strategy of
> writing data and marking it dirty in one step and then flushing later in
> response to fsync/msync.  I think this could be used everywhere we write or
> zero data - dax_clear_blocks(), dax_io() etc.  (I believe that lots of the
> block zeroing code will go away once we have the XFS and ext4 patches in that
> guarantee we will only get written and zeroed extents from the filesystem in
> response to get_block().)  I think the PMEM driver, lacking the ability to
> mark things as dirty in the radix tree, etc, will need to keep doing things
> synchronously.

Not without numbers showing the relative performance of dirtying cache
followed by flushing vs non-temporal + pcommit.

> Hmm...if we go this path, though, is that an argument against moving the
> zeroing from DAX down into the driver?  True, with BRD it makes things nice
> and efficient because you can zero and never flush, and the driver knows
> there's nothing else to do.
>
> For PMEM, though, you lose the ability to zero the data and then queue the
> flushing for later, as you would be able to do if you left the zeroing code in
> DAX.  The benefit of this is that if you are going to immediately re-write the
> newly zeroed data (which seems common), PMEM will end up doing an extra cache
> flush of the zeroes, only to have them overwritten and marked as dirty by DAX.
> If we leave the zeroing to DAX we can mark it dirty once, zero it once, write
> it once, and flush it once.

Why do we lose the ability to flush later if the driver supports
blkdev_issue_zeroout?

> This would make us lose the ability to do hardware-assisted flushing in the
> future that requires driver specific knowledge, though I don't think that
> exists yet.

ioatdma has supported memset() for a while now, but I would prioritize
a non-temporal SIMD implementation first.

> Perhaps we should leave the zeroing in DAX for now to take
> advantage of the single flush, and then move it down if a driver can improve
> performance with hardware assisted PMEM zeroing?

Not convinced.  I think we should implement the driver zeroing
solution and take a look at performance.

<Prev in Thread] Current Thread [Next in Thread>