[Top] [All Lists]

Re: splice vs execve lockdep trace.

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: splice vs execve lockdep trace.
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 16 Jul 2013 14:02:12 -0700
Cc: Ben Myers <bpm@xxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Oleg Nesterov <oleg@xxxxxxxxxx>, Linux Kernel <linux-kernel@xxxxxxxxxxxxxxx>, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Dave Jones <davej@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=EqkAMHTG88GeECEUbsg9e6fpF60B2M6SQGb2zpVn3M0=; b=R6jAAhbibIkVYf4X/n6beoYuZFmDtYM7kkvMeixzB/q9c2WRbIYL94w64JC/wewoJc TBvnU/5wMq3FCW1mFhUwv7bX3cQmyFczXhzJp6w2/CuTV5mt38Gg3fv6VErKKw51loYm gBYQPWrxtn95xJjMDhMlZk+41iFF3Xrqr5f5GTzZx95lXXU8TuIyNoAAdY4fktTv6k+U KuykpGB9/Ksxyj0nWlMEp0HEKE5LWzl79vM3NmQyP0SJ5YfMLqcaNEZqO/xt3frLkFJt 0/AFO6xnTKHL54dPq/fkenIomH7pVw2eyo+CjGNSzFgFeMsZPAhhW1twvs38COZseYGO ee1w==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=EqkAMHTG88GeECEUbsg9e6fpF60B2M6SQGb2zpVn3M0=; b=bFe6z/hh5URHAWttZlEqZkmQKSh3PN2Cr5znzd1PkwyXuJVbe+YJLV7Nz3sS82e8US XkumHiWoLC0wMkj2ZswuGOoc002bC1R1Xse2Q/nFJimegb80XCdQIxthNtfmTd53oIAJ ycFaALANCvlu/3qcroDMkEjdcjDDB/VHdmdOw=
In-reply-to: <20130716204335.GH11674@dastard>
References: <20130716015305.GB30569@xxxxxxxxxx> <CA+55aFyLbqJp0-=7=HOF9sKGOHwsa7A7-V76b8tbsnra8Z2=-w@xxxxxxxxxxxxxx> <20130716023847.GA31481@xxxxxxxxxx> <CA+55aFxiGXht8+Dox=C2ezYYf1yMaLAzMYr40j=+peP8j5Ha6w@xxxxxxxxxxxxxx> <20130716060351.GE11674@dastard> <20130716193332.GB3572@xxxxxxx> <CA+55aFzTBUKStdZu1GhKoiYc2knybhiaUFr2By98QYew_STE=A@xxxxxxxxxxxxxx> <20130716204335.GH11674@dastard>
Sender: linus971@xxxxxxxxx
On Tue, Jul 16, 2013 at 1:43 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We
> don't use i_mutex for many things IO related, and so internal
> locking is needed to serialise against stuff like truncate, hole
> punching, etc, that are run through non-vfs interfaces.

Umm. But the page IO isn't serialized by i_mutext *either*. You don't
hold it across page faults. In fact you don't even take it at all
across page faults.

That's kind of my point. splice is about the page IO, and it's
serialized purely by the page lock. And then "->readpage()" will get
whatever IO mutex in order to do that right, but think about the case
where things are already in the page cache. There's no reason for any
serialization what-so-ever.

So this isn't about i_mutex. At all.

> Read isn't the problem - it's write that's the deadlock issue...

I agree, and I think your patches are needed, as I said in that email
you replied to. But due to this issue, I was looking at the XFS splice
support, and the read-side splice support seems inefficient and overly
complex. I'm not seeing why it needs that i_iolock.

And no, this really has nothing to do with i_mutex. Go look at
generic_file_splice_read(). There's no i_mutex there at all. It's more
like a series of magic page-faults without the actual page table
actions. Which is kind of the whole point of splice - zero-copy
without bothering with page table setup/teardown.

Now, it's perfectly possible that XFS really needs some odd locking
here, but your reply about i_mutex makes me think that you did it
because you were confused about what it actually wants.

*Every* other local filesystem uses generic_file_splice_read() with
just a single

     .splice_read = generic_file_splice_read,

in the file ops initializer.  Sure, nfs and ocfs2 wrap things like xfs
does, but they basically do it to revalidate their caches.


<Prev in Thread] Current Thread [Next in Thread>