[Top] [All Lists]

Re: XFS Lock debugging noise or real problem?

To: Linda Walsh <xfs@xxxxxxxxx>
Subject: Re: XFS Lock debugging noise or real problem?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 14 Aug 2008 12:01:11 +1000
Cc: xfs-oss <xfs@xxxxxxxxxxx>, LKML <linux-kernel@xxxxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>
In-reply-to: <48A38490.7090604@xxxxxxxxx>
Mail-followup-to: Linda Walsh <xfs@xxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>, LKML <linux-kernel@xxxxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>
References: <48A093A7.40606@xxxxxxxxx> <48A09CA9.9080705@xxxxxxxxxxx> <48A0F686.2090700@xxxxxxxxx> <48A0F9FC.1070805@xxxxxxxxxxx> <48A20E9E.9090100@xxxxxxxxx> <20080813005852.GW6119@disturbed> <48A35A99.1080300@xxxxxxxxx> <20080814004101.GE6119@disturbed> <48A38490.7090604@xxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.18 (2008-05-17)
On Wed, Aug 13, 2008 at 06:04:16PM -0700, Linda Walsh wrote:
> Dave Chinner wrote:
>> I've asked the lockdep ppl to treat stuff like memory reclaim and
>> the iprune_mutex specially because of this recursive calling nature
>> of memory reclaim, but so far nothing has happened....
> ---
>       So it's really a kernel bug, not an XFS bug...(?)
>> FWIW, I think that recent changes have resulted in the xfs_fsr case
>> (swap_extents) being annotated properly so that one should go
>> away.
> ---
>       If it was limited to xfs_fsr, that'd be tolerable -- but its
> cropping up in random user-level-apps (imaps, sort, et al).

None of those applications use the swap extents code in XFS, so
if they are reporting problems related to xfs_fsr, then it's the
xfs_fsr locking that is triggering these later problems. Fixing
the swap extents lock annotations should make them go away.

>> Well, any debugging code is really designed for test and dev systems,
>> not for production systems.....
> ---
>       The lock-correctness code is described as a feature to provide
> "provability".  It's not called "debugging" and I don't regard that as
> "debugging" -- but something that any production system that wants
> operational integrity over a minor 'speed hit', would "theoretically"
> want.
>       If it is "debug" code, it should be labeled as such -- but
> code that can mathematically guarantee that parts of the kernel operate
> correctly seems like a _reliability_ feature, not a debugging feature.

Ummm - that option is described as:

        "Lock debugging: prove locking correctness"

And from the help text in the menuconfig:

Symbol: PROVE_LOCKING [=n]
 Prompt: Lock debugging: prove locking correctness
   Defined at lib/Kconfig.debug:292
     -> Kernel hacking

Looks like it's well labelled as a debug option to me....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>