xfs
[Top] [All Lists]

Re: [PATCH] dax: allow DAX to look up an inode's block device

To: Jared Hulbert <jaredeh@xxxxxxxxx>
Subject: Re: [PATCH] dax: allow DAX to look up an inode's block device
From: Dan Williams <dan.j.williams@xxxxxxxxx>
Date: Tue, 2 Feb 2016 15:41:37 -0800
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>, Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>, Jeff Layton <jlayton@xxxxxxxxxxxxxxx>, linux-nvdimm <linux-nvdimm@xxxxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, LKML <linux-kernel@xxxxxxxxxxxxxxx>, XFS Developers <xfs@xxxxxxxxxxx>, "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, Jan Kara <jack@xxxxxxxx>, Linux FS Devel <linux-fsdevel@xxxxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+iQo8r442x3cY0nGxALL7y/ijBqyIEQv25Jq0bIQpXo=; b=Y3fu4S587FeKzmGYzA6KFlmviD7z3jzTXJTEevdEbvYGFypzImiK/Wqk5jk/1SUc6m Ui08t4zGdgQrpxnNcQcTgSPu6ukATCjEhgaC7fa0Ga/yDdH/aX384fPTf+jQPLXOY6Wk uzb2/17K43pskjdCE6VqWgtrE+WQg/yqzsPnGeORem5mXO/pjF9irxIiJsP7yBYg+O9N g1jR4UdKwIDjOizPGTZvJIEYrt6JSbpGqSstSI6yGMvNzryclTocbA/UXmyZqi6+nkvm 1l5o7OmlpL43cMpUGu5/WIWdj6tLTxcpeZQ+g8PLKkLkOksheRKXhQcfybO3rvId+uHp LiPg==
In-reply-to: <CA+ZsKJ5Xd1VyMD4KCTw4GLYn_stAUZX0OcQVju72+FPgYsGR6w@xxxxxxxxxxxxxx>
References: <1454454702-11889-1-git-send-email-ross.zwisler@xxxxxxxxxxxxxxx> <20160202231931.GR17997@xxxxxxxxxxxxxxxxxx> <CA+ZsKJ5Xd1VyMD4KCTw4GLYn_stAUZX0OcQVju72+FPgYsGR6w@xxxxxxxxxxxxxx>
On Tue, Feb 2, 2016 at 3:36 PM, Jared Hulbert <jaredeh@xxxxxxxxx> wrote:
> On Tue, Feb 2, 2016 at 3:19 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>>
>> On Tue, Feb 02, 2016 at 04:11:42PM -0700, Ross Zwisler wrote:
>>
>> > However, for raw block devices and for XFS with a real-time device, the
>> > value in inode->i_sb->s_bdev is not correct.  With the code as it is
>> > currently written, an fsync or msync to a DAX enabled raw block device will
>> > cause a NULL pointer dereference kernel BUG.  For this to work correctly we
>> > need to ask the block device or filesystem what struct block_device is
>> > appropriate for our inode.
>> >
>> > To that end, add a get_bdev(struct inode *) entry point to struct
>> > super_operations.  If this function pointer is non-NULL, this notifies DAX
>> > that it needs to use it to look up the correct block_device.  If
>> > i_sb->get_bdev() is NULL DAX will default to inode->i_sb->s_bdev.
>>
>> Umm...  It assumes that bdev will stay pinned for as long as inode is
>> referenced, presumably?  If so, that needs to be documented (and verified
>> for existing fs instances).  In principle, multi-disk fs might want to
>> support things like "silently move the inodes backed by that disk to other
>> ones"...
>
> Dan, This is exactly the kind of thing I'm taking about WRT the
> weirder device models and directly calling bdev_direct_access().
> Filesystems don't have the monogamous relationship with a device that
> is implicitly assumed in DAX, you have to ask the filesystem what the
> relationship is and is migrating to, and allow the filesystem to
> update DAX when the relationship is changing.

That's precisely what ->get_bdev() does.  When the answer
inode->i_sb->s_bdev lookup is invalid, use ->get_bdev().

> As we start to see many
> DIMM's and 10s TiB pmem systems this is going be an even bigger deal
> as load balancing, wear leveling, and fault tolerance concerned are
> inevitably driven by the filesystem.

No, there are no plans on the horizon for an fs to manage these media
specific concerns for persistent memory.

<Prev in Thread] Current Thread [Next in Thread>