[Top] [All Lists]

Re: [PATCH 0/5] splice: locking changes and code refactoring

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH 0/5] splice: locking changes and code refactoring
From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date: Sat, 18 Jan 2014 06:40:40 +0000
Cc: Jens Axboe <axboe@xxxxxxxxx>, Mark Fasheh <mfasheh@xxxxxxxx>, Joel Becker <jlbec@xxxxxxxxxxxx>, linux-fsdevel@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, Sage Weil <sage@xxxxxxxxxxx>, Steve French <sfrench@xxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140114172033.GU10323@xxxxxxxxxxxxxxxxxx>
References: <20131212181459.994196463@xxxxxxxxxxxxxxxxxxxxxx> <20140113141416.GA30117@xxxxxxxxxxxxx> <20140113235646.GR10323@xxxxxxxxxxxxxxxxxx> <20140114132207.GA25170@xxxxxxxxxxxxx> <20140114172033.GU10323@xxxxxxxxxxxxxxxxxx>
Sender: Al Viro <viro@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Jan 14, 2014 at 05:20:33PM +0000, Al Viro wrote:
> On Tue, Jan 14, 2014 at 05:22:07AM -0800, Christoph Hellwig wrote:
> > On Mon, Jan 13, 2014 at 11:56:46PM +0000, Al Viro wrote:
> > > On Mon, Jan 13, 2014 at 06:14:16AM -0800, Christoph Hellwig wrote:
> > > > ping?  Would be nice to get this into 3.14
> > > 
> > > Umm...  The reason for pipe_lock outside of ->i_mutex is this:
> > > default_file_splice_write() calls splice_from_pipe() with
> > > write_pipe_buf for callback.  splice_from_pipe() calls that
> > > callback under pipe_lock(pipe).  And write_pipe_buf() calls
> > > __kernel_write(), which certainly might want to take ->i_mutex.
> > > 
> > > Now, this codepath isn't taken for files that have non-NULL
> > > ->splice_write(), so that's not an issue for XFS and OCFS2,
> > > but having pipe_lock nest between the ->i_mutex for filesystems
> > > that do and do not have ->splice_write()...  Ouch...
> > 
> > What would be the alternative?  Duplicating the code in even more
> > filesystems to enforce an non-natural locking order for filesystems
> > actually implementing splice?  There don't actually seem to be a whole
> > lot of real filesystems not implemting splice_write, the prime use
> > would be for device drivers or synthetic ones.  I'm not even sure
> > how much that fallback gets used in practice.

Hmm...  In principle, the following would be no worse than what
generic_file_splice_write() is doing: confirm and map the pages, build
an iovec and use ->aio_write() to write it out, then unmap the suckers,
release ones entirely written to file and adjust the partially
written one.  All under pipe_lock().  Hell, if we introduce
kernel_writev() (either by calling vfs_writev() or taking do_readv_writev()
sans copying iovec and using that under set_fs()), we could switch
default_file_splice_write() to that and get rid of ->splice_write() for
the majority of filesystems, if not all of them.

Sure, it means copying from pipe buffers to pagecache, but we have
generic_file_splice_write() do that copy anyway - conditional memcpy()
in pipe_to_file() is actually unconditional; that if (page != buf->page) in
there had just been forgotten by Nick back in 2007 ("1/2 splice: dont steal").

Objections, comments?

The problem Christoph was talking about is that generic_file_splice_write()
plays with ->i_mutex and both gets/drops it for each page of IO *and*
causes PITA for any fs that wants some locks of its own taken in addition
to ->i_mutex on the write paths.  What ->splice_write() without page
stealing is doing is pretty much a writev() from array of pages in kernel
space; so it looks like we might as well just reuse writev() guts for that...

<Prev in Thread] Current Thread [Next in Thread>