[Top] [All Lists]

Re: splice vs execve lockdep trace.

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: splice vs execve lockdep trace.
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 16 Jul 2013 13:18:06 -0700
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>, Oleg Nesterov <oleg@xxxxxxxxxx>, Linux Kernel <linux-kernel@xxxxxxxxxxxxxxx>, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Dave Jones <davej@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=xiahYX6Hvda6S3breoYvVo5hjWlpCzIXk+XX4RahNa0=; b=IMmGR7GS6cnLP9YtQ7Fg5HfdHCZ9eMmRKlIYE2toRVoEDN2hI8gMSf3Hpy0V9c/dI+ ujQlMwlVwtBQllo2ez/8uiKq0nh9cwzeibH6fPbmIGgnIlkZQXJvEvq0PtbF53+zW2sG Q4G1z3+U4ysnB+7EdtWC0/uh58aIjKVLMfu1+LJs/x8geBpZ7qvNxRPV+SKiJuidS5Ci AhU+9XsVkpAoKwurgxr5skAIHjyAPiC127tk0qZFIF79O8jfiUxx6dogjO4nkJ5AhZJa 9lLIGiiPpd1lQskY87va5uE7jD58AHYDzAlx5dmfONuFe0fdyKU/NyTnHVw243pV0EYt ZuSQ==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=xiahYX6Hvda6S3breoYvVo5hjWlpCzIXk+XX4RahNa0=; b=QVHrcZ3IKaUAzebE6oPlvOnwi3Qn5BkbsuuLQ0GSuCKWk7KO2iCYITUJQd4voFPkbf WQECFeoDgtt1+NXYhKG1jhecIIpLmbh1Ji3yne5CjTL97W2jXdIWXfddrmGxWg1/JavI z4jbdVMvd6bVVXQ8QNd4drB4uSef3IR1mGego=
In-reply-to: <20130716193332.GB3572@xxxxxxx>
References: <20130716015305.GB30569@xxxxxxxxxx> <CA+55aFyLbqJp0-=7=HOF9sKGOHwsa7A7-V76b8tbsnra8Z2=-w@xxxxxxxxxxxxxx> <20130716023847.GA31481@xxxxxxxxxx> <CA+55aFxiGXht8+Dox=C2ezYYf1yMaLAzMYr40j=+peP8j5Ha6w@xxxxxxxxxxxxxx> <20130716060351.GE11674@dastard> <20130716193332.GB3572@xxxxxxx>
Sender: linus971@xxxxxxxxx
On Tue, Jul 16, 2013 at 12:33 PM, Ben Myers <bpm@xxxxxxx> wrote:
>> >
>> > And looking more at that, I'm actually starting to think this is an
>> > XFS locking problem. XFS really should not call back to splice while
>> > holding the inode lock.

.. that was misleading, normally "inode lock" would be i_lock, but
here I meant the XFS-specific i_iolock.

But you clearly picked up on it:

> CPU0                            CPU1                            CPU2
> ----                            ----                            ----
> lock(&sig->cred_guard_mutex);
>                                 lock(&pipe->mutex/1);
> lock(&(&ip->io_lock)->mr_lock);
> lock(&(&ip->io_lock)->mr_lock);
>                                 lock(&sig->cred_guard_mutex);
> lock(&pipe->mutex/1);


> I agree that fixing this in XFS seems to be the most reasonable plan,
> and Dave's approach looks ok to me.  Seems like patch 1 should go
> through Al's tree, but we'll also need to commit it to the xfs tree
> prerequisite to patch 2.

Btw, is there some reason why XFS couldn't just use
generic_file_splice_read() directly?

I'm not arguing against Dave's patches, since the splice_write case
would seem to want them regardless, but I'm not even seeing why XFS
couldn't just avoid the locking for the splice_read case entirely..And
the generic file-splice read function does all that read-ahead stuff
and should actually be better for performance. And because it does it
from the page cache, it can avoid the locking issues (because it gets
the splice pipe lock later, just holding on to the page references).

And splice has mmap semantics - the whole point of splice is about
moving pages around, after all - so I *think* your current
"xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);" is actually over-serialization.
The pages will be shared by the pipe buffers anyway, so splicing by
definition doesn't imply full data serialization (because the reading
of the data itself will happen much later).

So the per-page serialization done by readpage() should already be
sufficient, no?

I dunno. Maybe there's something I'm missing. XFS does seem to wrap
all the other generic functions in that lock too, but since mmap() etc
clearly have to be able to get things one page at a time (without any
wrapping at higher layers), I'm just wondering whether splice_read
might not be able to avoid it.


<Prev in Thread] Current Thread [Next in Thread>