xfs
[Top] [All Lists]

Re: [PATCH v5 4/7] dax: add support for fsync/sync

To: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>, Dan Williams <dan.j.williams@xxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>, "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, "Theodore Ts'o" <tytso@xxxxxxx>, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Andreas Dilger <adilger.kernel@xxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Jan Kara <jack@xxxxxxxx>, Jeff Layton <jlayton@xxxxxxxxxxxxxxx>, Matthew Wilcox <willy@xxxxxxxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, linux-ext4 <linux-ext4@xxxxxxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, Linux MM <linux-mm@xxxxxxxxx>, "linux-nvdimm@xxxxxxxxxxxx" <linux-nvdimm@xxxxxxxxxxxx>, X86 ML <x86@xxxxxxxxxx>, XFS Developers <xfs@xxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Matthew Wilcox <matthew.r.wilcox@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Subject: Re: [PATCH v5 4/7] dax: add support for fsync/sync
From: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Mon, 21 Dec 2015 11:27:35 -0800
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=EKD6gTfso+WcJGZXV7hjlaywSAXawAdu51Op3aYpuLI=; b=N2uipnGEPCXd7lLw+SOnE6fzVE02cCPoRNZblS9fnd3TMZhcC5dIk439MKM+V0PFSN uw+r5zmgNBKElpfx1Uu5WNAajdquVd6M+vVbJMaSE9Q3cn4O19WMXILglDuWJ37KIvS0 tDOrMgTcMby/UbHfD0LJR5zu4PJ088XvWZFfySnmoHf6GBdm5urBiQqgpUQST3wKt/mm gMi/JSuRGhtnuOk3ZEE3y8ClusQrmQ3RK9y9YCqfEEWdFmenCXNUM+8FbJKy7Aas3DmL tPMCPGUSGMMf4WNpsgCz+HtYUSUGwyYA8BPB75Ze3yBfMuSto/ZWBbonVb6Fk4zxrqUy OSGA==
In-reply-to: <20151221170545.GA13494@xxxxxxxxxxxxxxx>
References: <1450502540-8744-1-git-send-email-ross.zwisler@xxxxxxxxxxxxxxx> <1450502540-8744-5-git-send-email-ross.zwisler@xxxxxxxxxxxxxxx> <CAPcyv4irspQEPVdYfLK+QfW4t-1_y1gFFVuBm00=i03PFQwEYw@xxxxxxxxxxxxxx> <20151221170545.GA13494@xxxxxxxxxxxxxxx>
On Mon, Dec 21, 2015 at 9:05 AM, Ross Zwisler
<ross.zwisler@xxxxxxxxxxxxxxx> wrote:
> On Sat, Dec 19, 2015 at 10:37:46AM -0800, Dan Williams wrote:
>> On Fri, Dec 18, 2015 at 9:22 PM, Ross Zwisler
>> <ross.zwisler@xxxxxxxxxxxxxxx> wrote:
>> > To properly handle fsync/msync in an efficient way DAX needs to track dirty
>> > pages so it is able to flush them durably to media on demand.
>> >
>> > The tracking of dirty pages is done via the radix tree in struct
>> > address_space.  This radix tree is already used by the page writeback
>> > infrastructure for tracking dirty pages associated with an open file, and
>> > it already has support for exceptional (non struct page*) entries.  We
>> > build upon these features to add exceptional entries to the radix tree for
>> > DAX dirty PMD or PTE pages at fault time.
>> >
>> > Signed-off-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
>> [..]
>> > +static void dax_writeback_one(struct address_space *mapping, pgoff_t 
>> > index,
>> > +               void *entry)
>> > +{
>> > +       struct radix_tree_root *page_tree = &mapping->page_tree;
>> > +       int type = RADIX_DAX_TYPE(entry);
>> > +       struct radix_tree_node *node;
>> > +       void **slot;
>> > +
>> > +       if (type != RADIX_DAX_PTE && type != RADIX_DAX_PMD) {
>> > +               WARN_ON_ONCE(1);
>> > +               return;
>> > +       }
>> > +
>> > +       spin_lock_irq(&mapping->tree_lock);
>> > +       /*
>> > +        * Regular page slots are stabilized by the page lock even
>> > +        * without the tree itself locked.  These unlocked entries
>> > +        * need verification under the tree lock.
>> > +        */
>> > +       if (!__radix_tree_lookup(page_tree, index, &node, &slot))
>> > +               goto unlock;
>> > +       if (*slot != entry)
>> > +               goto unlock;
>> > +
>> > +       /* another fsync thread may have already written back this entry */
>> > +       if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
>> > +               goto unlock;
>> > +
>> > +       radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
>> > +
>> > +       if (type == RADIX_DAX_PMD)
>> > +               wb_cache_pmem(RADIX_DAX_ADDR(entry), PMD_SIZE);
>> > +       else
>> > +               wb_cache_pmem(RADIX_DAX_ADDR(entry), PAGE_SIZE);
>>
>> Hi Ross, I should have realized this sooner, but what guarantees that
>> the address returned by RADIX_DAX_ADDR(entry) is still valid at this
>> point?  I think we need to store the sector in the radix tree and then
>> perform a new dax_map_atomic() operation to either lookup a valid
>> address or fail the sync request.  Otherwise, if the device is gone
>> we'll crash, or write into some other random vmalloc address space.
>
> Ah, good point, thank you.  v4 of this series is based on a version of
> DAX where we aren't properly dealing with PMEM device removal.  I've got an
> updated version that merges with your dax_map_atomic() changes, and I'll add
> this change into v5 which I will send out today.  Thank you for the
> suggestion.

To make the merge simpler you could skip the rebase for now and just
call blk_queue_enter() / blk_queue_exit() around the calls to
wb_cache_pmem.

<Prev in Thread] Current Thread [Next in Thread>