xfs
[Top] [All Lists]

Re: [RFC] unifying write variants for filesystems

To: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Subject: Re: [RFC] unifying write variants for filesystems
From: Miklos Szeredi <miklos@xxxxxxxxxx>
Date: Mon, 3 Feb 2014 15:41:55 +0100
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Mark Fasheh <mfasheh@xxxxxxxx>, Joel Becker <jlbec@xxxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Sage Weil <sage@xxxxxxxxxxx>, Steve French <sfrench@xxxxxxxxx>, Dave Kleikamp <shaggy@xxxxxxxxxx>, Anton Altaparmakov <anton@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=YKpC5LPTDpHIPkbSFiMBg8VwA9xbd+IkoVH9D/3BmSQ=; b=G380geLK7R2OVy0rrTzFZ0F4u0mE8uf+7X73pwn/LBdk4mC8m/OibkDU6uW5aHNc2G 6yDPcdWs3ingQe4a3FNoz/CbXyqe/jD9mzOzothbES52+yyXAl4mNK3vJCgNDLDKfn+t Q0ZGjKOJszpq36phX1V4AcT8Xlmxd0mfyXbTQ=
In-reply-to: <20140202192104.GA21959@xxxxxxxxxxxxxxxxxx>
References: <20140118064040.GE10323@xxxxxxxxxxxxxxxxxx> <CA+55aFw4LgyYEkygxHUnpKZg3jMACGzsyENc9a9rWFmLcaRefQ@xxxxxxxxxxxxxx> <20140118074649.GF10323@xxxxxxxxxxxxxxxxxx> <CA+55aFzM0N7WjqnLNnuqTkbj3iws9f3bYxei=ZBCM8hvps4zYg@xxxxxxxxxxxxxx> <20140118201031.GI10323@xxxxxxxxxxxxxxxxxx> <20140119051335.GN10323@xxxxxxxxxxxxxxxxxx> <20140120135514.GA21567@xxxxxxxxxxxxx> <CA+55aFzEA-eM9v2PvsWx4v4ANaKXuRGYyGCkegJg++rhtHvnig@xxxxxxxxxxxxxx> <20140201224301.GS10323@xxxxxxxxxxxxxxxxxx> <20140202192104.GA21959@xxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sun, Feb 02, 2014 at 07:21:04PM +0000, Al Viro wrote:
> On Sat, Feb 01, 2014 at 10:43:01PM +0000, Al Viro wrote:
> 
> > * pipe_buffer_operations ->map()/->unmap() should die; let the caller do
> > k{,un}map{,_atomic}().  All instances have the same method there and
> > there's no point to make it different.  PIPE_BUF_FLAG_ATOMIC should also
> > go.
> 
> BTW, another pile of code interesting in that respect (i.e. getting that
> interface right) is fs/fuse/dev.c; I don't like the way it's playing
> with get_user_pages_fast() there,

It's trying to work around a deadlock by lock_page() recursion.  I.e. fuse
daemon is reading a request into a page for which the readpage is being serviced
by said request (no, that's not something that happens accidentally).

Do all archs set FAULT_FLAG_KILLABLE?  If so, that flag along with code
depending on the lack of it can go away and we can simply depend on page faults
being interruptible by fatal signals.

And that would simplify the fuse device I/O substantially.

>  and I doubt that sharing the code for
> read and write side as it's done there makes much sense, but it's
> definitely going to be a test for any API of that kind.  It *does*
> try to unify write-from-iovec with write-from-array-of-pages and
> similar for reads; the interesting issue is that unlike the usual
> write-to-pagecache we can have many chunks picked from one page and
> we'd rather avoid doing kmap_atomic/kunmap_atomic for each of those.
> 
> I suspect that the right answer is, in addition to a primitive that
> does copying from iov_iter to have "copy from iov_iter and be ready
> to copy more from soon after" + "done copying"; for the "array of
> pages" the former would be allowed to leave the current page mapped,
> skipping kmap_atomic() on the next call.  And the latter would unmap.
> of course.  The caller is responsible for not blocking or doing
> unbalanced map/unmap until it's said "done copying".
> 
> BTW, is there any reason why fuse/dev.c doesn't use atomic kmaps for
> everything?  After all, as soon as we'd done kmap() in there, we
> grab a spinlock and don't drop it until just before kunmap().  With
> nothing by memcpy() done in between...  Miklos?  AFAICS, we only win
> from switching to kmap_atomic there - we can't block anyway, we don't
> need it to be visible on other CPUs and nesting isn't a problem.
> Looks like it'll be cheaper in highmem cases and do exactly the same
> thing as now for non-highmem...  Comments?

We don't hold the spinlock.   But regardless, I don't see any reason why it
couldn't be atomic kmap.

Thanks,
Miklos

<Prev in Thread] Current Thread [Next in Thread>