xfs
[Top] [All Lists]

Re: xfs_file_splice_read: possible circular locking dependency detected

To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: xfs_file_splice_read: possible circular locking dependency detected
From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Date: Sat, 3 Sep 2016 02:45:14 +0100
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, CAI Qian <caiqian@xxxxxxxxxx>, linux-xfs <linux-xfs@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CA+55aFy17iuUYRSLO=e1e+W5oW8GRYy93myk816UjRgEFxQDOw@xxxxxxxxxxxxxx>
References: <723420070.1340881.1472835555274.JavaMail.zimbra@xxxxxxxxxx> <1832555471.1341372.1472835736236.JavaMail.zimbra@xxxxxxxxxx> <20160903003919.GI30056@dastard> <CA+55aFy17iuUYRSLO=e1e+W5oW8GRYy93myk816UjRgEFxQDOw@xxxxxxxxxxxxxx>
Sender: Al Viro <viro@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.6.1 (2016-04-27)
On Fri, Sep 02, 2016 at 05:57:04PM -0700, Linus Torvalds wrote:
> On Fri, Sep 2, 2016 at 5:39 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > Fundamentally a splice infrastructure problem.
> 
> Yeah, I don't really like how we handle the pipe lock.
> 
> It *might* be possible to instead just increment the reference
> counters as we build a kvec[] array of them, and simply do teh write
> without holding the pipe lock at all.
> 
> That has other problems, ie concurrect spices from the same pipe would
> possibly write the same data multiple times, though.
> 
> But yes, the fundamental problem is how splice wants to take the pipe
> lock both early and late. Very annoying.

We could, in principle, add another flavour of iov_iter, with bvec
array attached to it with copy_page_to_iter() sticking an extra ref to that
page into array.  Then, under pipe lock, feed that thing to ->read_iter()
and do an equivalent of splice_to_pipe() that would take bvec array instead
of struct page */struct partial_page arrays.

Hell, we could even have copy_to_iter() for these puppies allocate a page,
stick it into the next bvec and copy into it.  Especially if we have those
bvec zeroed, with copy_page_to_iter() leaving ->bvec pointing to the next
(unused) bvec and copy_to_iter() doing that only when a page had been
completely filled.  I.e.

copy_page_to_iter()
        if (!iter->nr_segs)
                return 0;
        if (iter->bvec->bv_page) {
                iter->bvec++;
                if (!--iter->nr_segs)
                        return 0;
        }
        stick (page,offset,bytes) into iter->bvec
        iter->bvec++;
        iter->nr_segs--;
        return bytes;

copy_to_iter():
        wanted = bytes;
        while (bytes && iter->nr_segs) {
                if (!iter->bvec->bv_page)
                        iter->bvec->bv_page = alloc_page()
                n = min(PAGE_SIZE - iter->bvec->bv_len, bytes);
                memcpy_to_page(addr, iter->bvec->bv_len, n);
                bytes -= n;
                iter->bvec->bv_len += n;
                if (iter->bvec->bv_len == PAGE_SIZE) {
                        iter->bvec++;
                        iter->nr_segs--;
                }
        }
        return wanted - bytes;

That should suffice for quite a few of read_iter-using file_operations,
if not for all of them.  pipe lock is on the outside, same as for write
side *and* for default_file_splice_read().  And filesystem gets the
same locking it would for read(2)/readv(2)/etc...

Comments?

<Prev in Thread] Current Thread [Next in Thread>