xfs
[Top] [All Lists]

Re: [PATCH 0/5] splice: locking changes and code refactoring

To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH 0/5] splice: locking changes and code refactoring
From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date: Sat, 18 Jan 2014 07:46:49 +0000
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Mark Fasheh <mfasheh@xxxxxxxx>, Joel Becker <jlbec@xxxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Sage Weil <sage@xxxxxxxxxxx>, Steve French <sfrench@xxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CA+55aFw4LgyYEkygxHUnpKZg3jMACGzsyENc9a9rWFmLcaRefQ@xxxxxxxxxxxxxx>
References: <20131212181459.994196463@xxxxxxxxxxxxxxxxxxxxxx> <20140113141416.GA30117@xxxxxxxxxxxxx> <20140113235646.GR10323@xxxxxxxxxxxxxxxxxx> <20140114132207.GA25170@xxxxxxxxxxxxx> <20140114172033.GU10323@xxxxxxxxxxxxxxxxxx> <20140118064040.GE10323@xxxxxxxxxxxxxxxxxx> <CA+55aFw4LgyYEkygxHUnpKZg3jMACGzsyENc9a9rWFmLcaRefQ@xxxxxxxxxxxxxx>
Sender: Al Viro <viro@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Jan 17, 2014 at 11:22:04PM -0800, Linus Torvalds wrote:
> On Fri, Jan 17, 2014 at 10:40 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > Objections, comments?
> 
> I certainly object to the "map, then unmap" approach. No VM games.

Um...
int pipe_to_file(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
                 struct splice_desc *sd)
...
        if (buf->page != page) {
                char *src = buf->ops->map(pipe, buf, 1);
                char *dst = kmap_atomic(page);

                memcpy(dst + offset, src + buf->offset, this_len);
                flush_dcache_page(page);
                kunmap_atomic(dst);
                buf->ops->unmap(pipe, buf, src);
        }
...

->map() and ->unmap() (BTW, why are those methods, anyway?  They are
identical for all instances) are
void *generic_pipe_buf_map(struct pipe_inode_info *pipe,
                           struct pipe_buffer *buf, int atomic)
{  
        if (atomic) {
                buf->flags |= PIPE_BUF_FLAG_ATOMIC;
                return kmap_atomic(buf->page);
        }

        return kmap(buf->page);
}
and
void generic_pipe_buf_unmap(struct pipe_inode_info *pipe,
                            struct pipe_buffer *buf, void *map_data)
{
        if (buf->flags & PIPE_BUF_FLAG_ATOMIC) {
                buf->flags &= ~PIPE_BUF_FLAG_ATOMIC;
                kunmap_atomic(map_data);
        } else
                kunmap(buf->page);
}
resp.

If we are going to copy that data (and all users of generic_file_splice_write()
do that memcpy() to page cache), we have to kmap the source ;-/

> But if it can be done more naturally as a writev, then that may well
> be ok. As long as we're talking about just the
> default_file_splice_write() case, and people who want to do special
> things with page movement can continue to do so..

The thing is, after such change default_file_splice_write() is no worse than
generic_file_splice_write().  The only instances that really want something
else are the ones that try to steal pages (e.g. virtio_console, fuse miscdev)
or sockets, with their "do DMA from the sodding page, don't copy it at
anywhere" ->sendpage() method.  IOW, ones those special things you are
talking about.  Normal filesystems do not - not on pipe-to-file splice.
file-to-pipe - sure, that one plays with pagecache and tries hard to
do zero-copy, but that's ->splice_read(), not ->splice_write()...

_If_ somebody figures out how to deal with zero-copy on pipe-to-file - fine,
we'll be able to revisit that.  But there hadn't been one since 2007 and
there was zero activity in that area, so...

What I'm doing right now is taking do_readv_writev() apart and making the
stuff after rw_copy_check_uvector() non-static (visible in fs/internal.h).
As long as we do not go through rw_copy_check_uvector() (we'd just built
that iovec ourselves and it's already in kernel space), we should be fine -
single copy done straight to pagecache, with whatever locks fs wants to
take, etc.

<Prev in Thread] Current Thread [Next in Thread>