xfs
[Top] [All Lists]

Re: [RFC] unifying write variants for filesystems

To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: [RFC] unifying write variants for filesystems
From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date: Sun, 2 Feb 2014 19:21:04 +0000
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Mark Fasheh <mfasheh@xxxxxxxx>, Joel Becker <jlbec@xxxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Sage Weil <sage@xxxxxxxxxxx>, Steve French <sfrench@xxxxxxxxx>, Dave Kleikamp <shaggy@xxxxxxxxxx>, Anton Altaparmakov <anton@xxxxxxxxxx>, Miklos Szeredi <miklos@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140201224301.GS10323@xxxxxxxxxxxxxxxxxx>
References: <20140114172033.GU10323@xxxxxxxxxxxxxxxxxx> <20140118064040.GE10323@xxxxxxxxxxxxxxxxxx> <CA+55aFw4LgyYEkygxHUnpKZg3jMACGzsyENc9a9rWFmLcaRefQ@xxxxxxxxxxxxxx> <20140118074649.GF10323@xxxxxxxxxxxxxxxxxx> <CA+55aFzM0N7WjqnLNnuqTkbj3iws9f3bYxei=ZBCM8hvps4zYg@xxxxxxxxxxxxxx> <20140118201031.GI10323@xxxxxxxxxxxxxxxxxx> <20140119051335.GN10323@xxxxxxxxxxxxxxxxxx> <20140120135514.GA21567@xxxxxxxxxxxxx> <CA+55aFzEA-eM9v2PvsWx4v4ANaKXuRGYyGCkegJg++rhtHvnig@xxxxxxxxxxxxxx> <20140201224301.GS10323@xxxxxxxxxxxxxxxxxx>
Sender: Al Viro <viro@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sat, Feb 01, 2014 at 10:43:01PM +0000, Al Viro wrote:

> * pipe_buffer_operations ->map()/->unmap() should die; let the caller do
> k{,un}map{,_atomic}().  All instances have the same method there and
> there's no point to make it different.  PIPE_BUF_FLAG_ATOMIC should also
> go.

BTW, another pile of code interesting in that respect (i.e. getting that
interface right) is fs/fuse/dev.c; I don't like the way it's playing
with get_user_pages_fast() there, and I doubt that sharing the code for
read and write side as it's done there makes much sense, but it's
definitely going to be a test for any API of that kind.  It *does*
try to unify write-from-iovec with write-from-array-of-pages and
similar for reads; the interesting issue is that unlike the usual
write-to-pagecache we can have many chunks picked from one page and
we'd rather avoid doing kmap_atomic/kunmap_atomic for each of those.

I suspect that the right answer is, in addition to a primitive that
does copying from iov_iter to have "copy from iov_iter and be ready
to copy more from soon after" + "done copying"; for the "array of
pages" the former would be allowed to leave the current page mapped,
skipping kmap_atomic() on the next call.  And the latter would unmap.
of course.  The caller is responsible for not blocking or doing
unbalanced map/unmap until it's said "done copying".

BTW, is there any reason why fuse/dev.c doesn't use atomic kmaps for
everything?  After all, as soon as we'd done kmap() in there, we
grab a spinlock and don't drop it until just before kunmap().  With
nothing by memcpy() done in between...  Miklos?  AFAICS, we only win
from switching to kmap_atomic there - we can't block anyway, we don't
need it to be visible on other CPUs and nesting isn't a problem.
Looks like it'll be cheaper in highmem cases and do exactly the same
thing as now for non-highmem...  Comments?

<Prev in Thread] Current Thread [Next in Thread>