[Top] [All Lists]

Re: splice vs execve lockdep trace.

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: splice vs execve lockdep trace.
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 16 Jul 2013 21:54:09 -0700
Cc: Ben Myers <bpm@xxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Oleg Nesterov <oleg@xxxxxxxxxx>, Linux Kernel <linux-kernel@xxxxxxxxxxxxxxx>, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Dave Jones <davej@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=61/yd/4g6y+TSI01VSElrYEBvrckwVETpOhg+f6VIsw=; b=AoS2GgHMT80S7B4P5rXBZD6+GpTA/fIirffMbB2NUa4dinKGx+fqo/H+bAmXzOugY8 0YforfcETaKM62TNuQwXm6+d5LS5XR1n/Qtsq+uyc7RUpotWAr7ISc73BAIQxcUh3Nn3 aqKn2WuxsxztSMHh3ULljC66bu1laxdDp0tvWDufvSCHQlAEenosMSc1JKd+J6GH+1rc x39VlPlL/gcUYh4v8sHFXr4bAZThmkLYb0DSKz41FLLmds7ePx8vRPSMDUs/bwXMS35H 7LlyCUQkg0WindKoZqvzwHHGu/kMNITCFe1EKEGYvoHyEvt0pxYzgEMLJndyw/g5pl+s 1DWQ==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=61/yd/4g6y+TSI01VSElrYEBvrckwVETpOhg+f6VIsw=; b=ZPBeJI7TU8/3p/K7VhWXuWz+MDx1wU97T2prmuJ1u/09unFQoGXqZ2+ylcx45nX1qB fOwbenf+E0elKFGGtCLu0f4PybLNWiITZHhoFyylnbtvRjMr9NiSU2yvKW+XLUyeovwt 1ULbidq/OM76wkivkdWYvvi8MYx3FZ64bINJY=
In-reply-to: <20130717040616.GI11674@dastard>
References: <20130716015305.GB30569@xxxxxxxxxx> <CA+55aFyLbqJp0-=7=HOF9sKGOHwsa7A7-V76b8tbsnra8Z2=-w@xxxxxxxxxxxxxx> <20130716023847.GA31481@xxxxxxxxxx> <CA+55aFxiGXht8+Dox=C2ezYYf1yMaLAzMYr40j=+peP8j5Ha6w@xxxxxxxxxxxxxx> <20130716060351.GE11674@dastard> <20130716193332.GB3572@xxxxxxx> <CA+55aFzTBUKStdZu1GhKoiYc2knybhiaUFr2By98QYew_STE=A@xxxxxxxxxxxxxx> <20130716204335.GH11674@dastard> <CA+55aFwHMQd-VDeTDh-gm3jyj+5+FSoAHOeU47mwU-mKtEj9RQ@xxxxxxxxxxxxxx> <20130717040616.GI11674@dastard>
Sender: linus971@xxxxxxxxx
On Tue, Jul 16, 2013 at 9:06 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> Right, and that's one of the biggest problems page based IO has - we
> can't serialise it against other IO and other page cache
> manipulation functions like hole punching. What happens when a
> splice read or mmap page fault races with a hole punch? You get
> stale data being left in the page cache because we can't serialise
> the page read with the page cache invalidation and underlying extent
> removal.

But Dave, that's *good*.

You call it "stale data".

I call it "the data was valid at some point".

This is what "splice()" is fundamentally all about.

Think of it this way: even if you are 100% serialized during the
"splice()" operation, what do you think happens afterwards?

Seriously, think it through.

That data is in a kernel buffer - the pipe. The fact that it was
serialized at the time of the original splice() doesn't make _one_
whit of a difference, because after the splice is over, the data still
sits around in that pipe buffer, and you're no longer serializing it.
Somebody else truncating the file or punching a hole in the file DOES
NOT MATTER. It's too late.

In other words, trying to "protect" against that kind of race is stupid.

You're missing the big picture because you're concentrating on the
details. Look beyond what happens inside XFS, and think about the
higher-level meaning of splice() itself.

So the only guarantee splice *should* give is entirely per-page. If
you think it gives any other serialization, you're fundamentally
wrong, because it *cannot*. See?


<Prev in Thread] Current Thread [Next in Thread>