[Top] [All Lists]

Re: [RFC] unifying write variants for filesystems

To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: [RFC] unifying write variants for filesystems
From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date: Tue, 4 Feb 2014 12:44:09 +0000
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Mark Fasheh <mfasheh@xxxxxxxx>, Joel Becker <jlbec@xxxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Sage Weil <sage@xxxxxxxxxxx>, Steve French <sfrench@xxxxxxxxx>, Anton Altaparmakov <anton@xxxxxxxxxx>, Zach Brown <zab@xxxxxxxxx>, Kent Overstreet <kmo@xxxxxxxxxxxxx>, Dave Kleikamp <dave.kleikamp@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <52EFC271.3090205@xxxxxxxxxx>
References: <20140118064040.GE10323@xxxxxxxxxxxxxxxxxx> <CA+55aFw4LgyYEkygxHUnpKZg3jMACGzsyENc9a9rWFmLcaRefQ@xxxxxxxxxxxxxx> <20140118074649.GF10323@xxxxxxxxxxxxxxxxxx> <CA+55aFzM0N7WjqnLNnuqTkbj3iws9f3bYxei=ZBCM8hvps4zYg@xxxxxxxxxxxxxx> <20140118201031.GI10323@xxxxxxxxxxxxxxxxxx> <20140119051335.GN10323@xxxxxxxxxxxxxxxxxx> <20140120135514.GA21567@xxxxxxxxxxxxx> <CA+55aFzEA-eM9v2PvsWx4v4ANaKXuRGYyGCkegJg++rhtHvnig@xxxxxxxxxxxxxx> <20140201224301.GS10323@xxxxxxxxxxxxxxxxxx> <52EFC271.3090205@xxxxxxxxxx>
Sender: Al Viro <viro@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, Feb 03, 2014 at 10:23:13AM -0600, Dave Kleikamp wrote:

> Thanks for the feedback. I'd been asking for feedback on this patchset
> for some time now, and have not received very much.
> This is all based on some years-old work by Zach Brown that he probably
> wishes would have disappeared by now. I pretty much left what I could
> alone since 1) it was working, and 2) I didn't hear any objections
> (until now).
> It's clear now that the patchset isn't close to mergable, so treat it
> like a proof-of-concept and we can come up with a better container and
> read/write interface. I won't respond individually to your comments, but
> will take them all into consideration going forward.

FWIW, I suspect that the right way to deal with dio side of things would
be a primitive along the lines of "get first N <page,start,len> for the
iov_iter".  With get_user_pages_fast() for iovec-backed ones and "just
grab references" for array-of-page-subranges ones.

_IF_ direct-io.c can be massaged to use that (and it looks like it should
be able to - AFAICS, we don't really care if pages are mapped in userland or
kernel space there), we get something really neat out of that: not only can
we get rid of generic_file_splice_write(), but we get full zero-copy
sendfile() - just have the target opened with O_DIRECT and everything will
work; ->splice_read() will trigger reads to source pagecache and with that
massage done, ->splice_write() will issue writes directly from those
pages, with no memory-to-memory copying in sight...  We can also get rid of
that kmap() in __swap_writepage(), while we are at it.

I'm going through direct-io.c guts right now and so far that looks feasible,
but I'd really appreciate comments from the folks more familiar with the
damn thing.

The queue so far is in vfs.git#iov_iter; I've gone after the low-hanging
fruits in the review I've posted upthread and I more or less like the
results so far...


<Prev in Thread] Current Thread [Next in Thread>