xfs
[Top] [All Lists]

Re: [PATCH 3/6] xfs: add DAX file operations support

To: Jan Kara <jack@xxxxxxx>
Subject: Re: [PATCH 3/6] xfs: add DAX file operations support
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 23 Mar 2015 10:02:10 +1100
Cc: xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, willy@xxxxxxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150305110504.GC2836@xxxxxxxxxxxxx>
References: <1425425427-16283-1-git-send-email-david@xxxxxxxxxxxxx> <1425425427-16283-4-git-send-email-david@xxxxxxxxxxxxx> <20150304161848.GB2799@xxxxxxxxxxxxx> <20150304220005.GV18360@dastard> <20150305110504.GC2836@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Mar 05, 2015 at 12:05:04PM +0100, Jan Kara wrote:
> On Thu 05-03-15 09:00:05, Dave Chinner wrote:
> > On Wed, Mar 04, 2015 at 05:18:48PM +0100, Jan Kara wrote:
> > > On Wed 04-03-15 10:30:24, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > > 
> > > > Add the initial support for DAX file operations to XFS. This
> > > > includes the necessary block allocation and mmap page fault hooks
> > > > for DAX to function.
> > > > 
> > > > Note that the current block allocation code abuses the mapping
> > > > buffer head to provide a completion callback for unwritten extent
> > > > allocation when DAX is clearing blocks. The DAX interface needs to
> > > > be changed to provide a callback similar to get_blocks for this
> > > > callback.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > .....
> > > > +static int
> > > > +xfs_filemap_dax_page_mkwrite(
> > > > +       struct vm_area_struct   *vma,
> > > > +       struct vm_fault         *vmf)
> > > > +{
> > > > +       struct xfs_inode        *ip = 
> > > > XFS_I(vma->vm_file->f_mapping->host);
> > > > +       int                     error;
> > > > +
> > > > +       trace_xfs_filemap_page_mkwrite(ip);
> > > > +
> > > > +       xfs_ilock(ip, XFS_MMAPLOCK_SHARED);
> > >   So I think the lock ordering of XFS_MMAPLOCK and freezing protection is
> > > suspicious (and actually so is for normal write faults as I'm looking -
> > > didn't realize that when I was first reading your MMAPLOCK patches).
> > > Because you take XFS_MMAPLOCK outside of freeze protection however usually
> > > we want freeze protection to be the outermost lock - in particular in
> > > xfs_file_fallocate() you take XFS_MMAPLOCK inside freeze protection I
> > > think.
> > 
> > OK, so why isn't lockdep triggering on that? lockdep is aware of
> > inode locks and the freeze states, supposedly to pick up these exact
> > issues...
> > 
> > Oh, probably because the sb freeze order is write, pagefault,
> > transaction.
> > 
> > i.e. In the fallocate case, we do sb_start_write, MMAP_LOCK. If we are in
> > a freeze case, we aren't going to freeze page faults until we've
> > frozen all the writes have drained, so there isn't a lock order
> > dependency there. Same for any other mnt_want_write/sb-start_write
> > based modification. 
> > 
> > Hence the fallocate path and anything that runs through setattr will
> > complete and release the mmap lock and then be prevented from taking
> > it again by the time sb_start_pagefault() can block with the mmap
> > lock held.  So there isn't actually a deadlock there because of the
> > way freeze works, and that's why lockdep is staying silent.
>   Yeah, you're right there isn't a deadlock possibility. After all the lock
> ranking of your MMAP_LOCk is currently the same as of mmap_sem (and the
> difficult lock ordering of that semaphore has been the reason why we have
> special type of freeze protection for page faults).
> 
> > Still, I probably need to fix it so I'm not leaving a potential
> > landmine around.
>   I would find it easier to grasp. Yes.

Finally getting back to this. Fixed this, but...
> 
> > > So you'll need to do what ext4 needs to do - take freeze protection, take
> > > fs specific locks, and then call do_dax_fault(). Matthew has a patch to
> > > actually export do_dax_fault (as __dax_fault()) for filesystems.
> > 
> > pointer to it? if none, I'll just write my own....
>   http://permalink.gmane.org/gmane.comp.file-systems.ext4/47866

I can't find any followup to this patch. Is it in any tree anywhere?

Right now, I've just pulled the dax instructure part of the patch
into my series and modified it to suit because I've fixed the
unwritten extent conversion problem differently (i.e. the extra
callback) and ext4 needs a lot more help to fix the problems than
in that patch.

I'll post the patches when I've at least smoke tested them....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>