xfs
[Top] [All Lists]

[RFC] unifying write variants for filesystems

To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Subject: [RFC] unifying write variants for filesystems
From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date: Sun, 19 Jan 2014 05:13:35 +0000
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Mark Fasheh <mfasheh@xxxxxxxx>, Joel Becker <jlbec@xxxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Sage Weil <sage@xxxxxxxxxxx>, Steve French <sfrench@xxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140118201031.GI10323@xxxxxxxxxxxxxxxxxx>
References: <20131212181459.994196463@xxxxxxxxxxxxxxxxxxxxxx> <20140113141416.GA30117@xxxxxxxxxxxxx> <20140113235646.GR10323@xxxxxxxxxxxxxxxxxx> <20140114132207.GA25170@xxxxxxxxxxxxx> <20140114172033.GU10323@xxxxxxxxxxxxxxxxxx> <20140118064040.GE10323@xxxxxxxxxxxxxxxxxx> <CA+55aFw4LgyYEkygxHUnpKZg3jMACGzsyENc9a9rWFmLcaRefQ@xxxxxxxxxxxxxx> <20140118074649.GF10323@xxxxxxxxxxxxxxxxxx> <CA+55aFzM0N7WjqnLNnuqTkbj3iws9f3bYxei=ZBCM8hvps4zYg@xxxxxxxxxxxxxx> <20140118201031.GI10323@xxxxxxxxxxxxxxxxxx>
Sender: Al Viro <viro@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sat, Jan 18, 2014 at 08:10:31PM +0000, Al Viro wrote:

> Ouch...  No, I hadn't meant that kind of insanity, but I'd missed the
> problem with scarcity of mappings completely...

OK, that pretty much kills this approach.  Pity...

Folks, what do you think about the following:
        * a new data structure:
struct io_source {
        enum {IO_IOVEC, IO_PVEC} type;
        union {
                struct iovec *iov;
                struct pvec {
                        struct page *page;
                        unsigned offset;
                        unsigned size;
                } *pvec;
        };
}
        * a new method that would look like aio_write, but take
struct io_source instead of iovec.
        * store the type in iov_iter (normally - IO_UIOVEC) and teach the
code dealing with it to do the right thing depending on type.  I.e. instead
of __copy_from_user_inatomic() do kmap_atomic()/memcpy()/kunmap_atomic() if
it's a IO_PAGEVEC.
        * generic_file_aio_write() analog for new method, converging with
generic_file_aio_write() almost immediately (basically, as soon as iov_iter
has been initialized).
        * new_aio_write() consisting of
{
        struct io_source source = {.type = IO_UIOVEC, .user = iov};
        return file->f_op-><new_method>(iocb, &source, nr_segs, pos);
}
        * new_sync_write(), doing what do_sync_write() does for files
that have new_aio_write() as ->aio_write().
        * new_splice_write() usable for files that provide that method -
it would collect pipe_buffers, put together struct pvec array and pass
it to that method.  All mapping the pages would happen one-by-one
and only around actual copying the data.  And, of course, the locking
would be identical to what we do for write()/writev()/aio write

        Then filesystems can switch to that new method, turning their
flipping their aio_write() instances to new type and replacing ->aio_write
with default_aio_write, ->write with new_write and ->splice_write with
new_splice_write.

        Actually, there's a possibility that it would be possible to use
it for *all* instances of ->splice_write() - we'd need to store something
a pointer to "call this to try and steal this page" function in pvec
and allow the method do actual stealing.  Note that pipe_buffer ->steal()
only uses the page argument - they all ignore which pipe it's in (and
there's nothing they could usefully do if they knew which pipe had it been
in the first place).

        This is very preliminary, of course, and I might easily miss
something - the previous idea was unworkable, after all.  Comments
would be very welcome...

<Prev in Thread] Current Thread [Next in Thread>