[Top] [All Lists]

Re: [PATCH 3/6] xfs: add DAX file operations support

To: Jan Kara <jack@xxxxxxx>
Subject: Re: [PATCH 3/6] xfs: add DAX file operations support
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 5 Mar 2015 09:00:05 +1100
Cc: xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, willy@xxxxxxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150304161848.GB2799@xxxxxxxxxxxxx>
References: <1425425427-16283-1-git-send-email-david@xxxxxxxxxxxxx> <1425425427-16283-4-git-send-email-david@xxxxxxxxxxxxx> <20150304161848.GB2799@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Mar 04, 2015 at 05:18:48PM +0100, Jan Kara wrote:
> On Wed 04-03-15 10:30:24, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > Add the initial support for DAX file operations to XFS. This
> > includes the necessary block allocation and mmap page fault hooks
> > for DAX to function.
> > 
> > Note that the current block allocation code abuses the mapping
> > buffer head to provide a completion callback for unwritten extent
> > allocation when DAX is clearing blocks. The DAX interface needs to
> > be changed to provide a callback similar to get_blocks for this
> > callback.
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > +static int
> > +xfs_filemap_dax_page_mkwrite(
> > +   struct vm_area_struct   *vma,
> > +   struct vm_fault         *vmf)
> > +{
> > +   struct xfs_inode        *ip = XFS_I(vma->vm_file->f_mapping->host);
> > +   int                     error;
> > +
> > +   trace_xfs_filemap_page_mkwrite(ip);
> > +
> > +   xfs_ilock(ip, XFS_MMAPLOCK_SHARED);
>   So I think the lock ordering of XFS_MMAPLOCK and freezing protection is
> suspicious (and actually so is for normal write faults as I'm looking -
> didn't realize that when I was first reading your MMAPLOCK patches).
> Because you take XFS_MMAPLOCK outside of freeze protection however usually
> we want freeze protection to be the outermost lock - in particular in
> xfs_file_fallocate() you take XFS_MMAPLOCK inside freeze protection I
> think.

OK, so why isn't lockdep triggering on that? lockdep is aware of
inode locks and the freeze states, supposedly to pick up these exact

Oh, probably because the sb freeze order is write, pagefault,

i.e. In the fallocate case, we do sb_start_write, MMAP_LOCK. If we are in
a freeze case, we aren't going to freeze page faults until we've
frozen all the writes have drained, so there isn't a lock order
dependency there. Same for any other mnt_want_write/sb-start_write
based modification. 

Hence the fallocate path and anything that runs through setattr will
complete and release the mmap lock and then be prevented from taking
it again by the time sb_start_pagefault() can block with the mmap
lock held.  So there isn't actually a deadlock there because of the
way freeze works, and that's why lockdep is staying silent.

Still, I probably need to fix it so I'm not leaving a potential
landmine around.

> So you'll need to do what ext4 needs to do - take freeze protection, take
> fs specific locks, and then call do_dax_fault(). Matthew has a patch to
> actually export do_dax_fault (as __dax_fault()) for filesystems.

pointer to it? if none, I'll just write my own....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>