[PATCH v5 4/7] dax: add support for fsync/sync

Dan Williams dan.j.williams at intel.com
Sat Dec 19 12:37:46 CST 2015


On Fri, Dec 18, 2015 at 9:22 PM, Ross Zwisler
<ross.zwisler at linux.intel.com> wrote:
> To properly handle fsync/msync in an efficient way DAX needs to track dirty
> pages so it is able to flush them durably to media on demand.
>
> The tracking of dirty pages is done via the radix tree in struct
> address_space.  This radix tree is already used by the page writeback
> infrastructure for tracking dirty pages associated with an open file, and
> it already has support for exceptional (non struct page*) entries.  We
> build upon these features to add exceptional entries to the radix tree for
> DAX dirty PMD or PTE pages at fault time.
>
> Signed-off-by: Ross Zwisler <ross.zwisler at linux.intel.com>
[..]
> +static void dax_writeback_one(struct address_space *mapping, pgoff_t index,
> +               void *entry)
> +{
> +       struct radix_tree_root *page_tree = &mapping->page_tree;
> +       int type = RADIX_DAX_TYPE(entry);
> +       struct radix_tree_node *node;
> +       void **slot;
> +
> +       if (type != RADIX_DAX_PTE && type != RADIX_DAX_PMD) {
> +               WARN_ON_ONCE(1);
> +               return;
> +       }
> +
> +       spin_lock_irq(&mapping->tree_lock);
> +       /*
> +        * Regular page slots are stabilized by the page lock even
> +        * without the tree itself locked.  These unlocked entries
> +        * need verification under the tree lock.
> +        */
> +       if (!__radix_tree_lookup(page_tree, index, &node, &slot))
> +               goto unlock;
> +       if (*slot != entry)
> +               goto unlock;
> +
> +       /* another fsync thread may have already written back this entry */
> +       if (!radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE))
> +               goto unlock;
> +
> +       radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_TOWRITE);
> +
> +       if (type == RADIX_DAX_PMD)
> +               wb_cache_pmem(RADIX_DAX_ADDR(entry), PMD_SIZE);
> +       else
> +               wb_cache_pmem(RADIX_DAX_ADDR(entry), PAGE_SIZE);

Hi Ross, I should have realized this sooner, but what guarantees that
the address returned by RADIX_DAX_ADDR(entry) is still valid at this
point?  I think we need to store the sector in the radix tree and then
perform a new dax_map_atomic() operation to either lookup a valid
address or fail the sync request.  Otherwise, if the device is gone
we'll crash, or write into some other random vmalloc address space.



More information about the xfs mailing list