xfs
[Top] [All Lists]

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

To: Ingo Molnar <mingo@xxxxxxxxxx>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Sun, 8 Mar 2015 11:35:59 -0700
Cc: Mel Gorman <mgorman@xxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Linux-MM <linux-mm@xxxxxxxxx>, xfs@xxxxxxxxxxx, ppc-dev <linuxppc-dev@xxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=dRrWQ5zwFqIBLg27FUU+9Z3a+4rWOjhYuDKv+e3Wi3E=; b=wzkyQvlbCZpbwz+EHE3g2Ahd0dYVTej4572qBSGp7brlB+EGGp5fD9uiZ5BmMXfdTr JSYLWkP2Jpg1cHwg0EzNd1yNbw4xr6NOZfCKNwzQMOK+enP7cHBqZAjd1ABhFmdLOrnS D6hSsebkGONRWyQLBC/L/2mBiAGJMSZpFR1aUYSE4RnLu4mtdCg2bX2qe7pP9V/cpzOI 3eSMUnB2ngOQj5YwDcLhgcqYNIga4f3A+zVQbp9EM8/pRV4vH9835I2oePt69h4BZSE5 Ly9XksW9DA6A+OgOJO7nJB75IIbZAInCIekcgfMIc60PEIfQ1A8eIO77aCiL8xnPPruR BZTA==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=dRrWQ5zwFqIBLg27FUU+9Z3a+4rWOjhYuDKv+e3Wi3E=; b=Q3pEUebqJNOL8Qtf3BGk2AdzBnHS1pu+fBuLtfayHI8dQfKUrKGpRoXorXQ26uc5kM p28cn5rd+Nk/6/gxuE59npEJVuZ69h3zTe+TLR6CR8OvmglQbSG6I0YFExGv6bdYd0qx 5FbaEpSxjp29eYFqg75JIQE2WzLdAnz7ARZ0M=
In-reply-to: <20150308100223.GC15487@xxxxxxxxx>
References: <1425741651-29152-1-git-send-email-mgorman@xxxxxxx> <1425741651-29152-5-git-send-email-mgorman@xxxxxxx> <20150307163657.GA9702@xxxxxxxxx> <CA+55aFwDuzpL-k8LsV3touhNLh+TFSLKP8+-nPwMXkWXDYPhrg@xxxxxxxxxxxxxx> <20150308100223.GC15487@xxxxxxxxx>
Sender: linus971@xxxxxxxxx
On Sun, Mar 8, 2015 at 3:02 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> Well, there's a difference in what we write to the pte:
>
>  #define _PAGE_BIT_NUMA          (_PAGE_BIT_GLOBAL+1)
>  #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL
>
> and our expectation was that the two should be equivalent methods from
> the POV of the NUMA balancing code, right?

Right.

But yes, we might have screwed something up. In particular, there
might be something that thinks it cares about the global bit, but
doesn't notice that the present bit isn't set, so it considers the
protnone mappings to be global and causes lots more tlb flushes etc.

>> I don't like the old pmdp_set_numa() because it can drop dirty bits,
>> so I think the old code was actively buggy.
>
> Could we, as a silly testing hack not to be applied, write a
> hack-patch that re-introduces the racy way of setting the NUMA bit, to
> confirm that it is indeed this difference that changes pte visibility
> across CPUs enough to create so many more faults?

So one of Mel's patches did that, but I don't know if Dave tested it.

And thinking about it, it *may* be safe for huge-pages, if they always
already have the dirty bit set to begin with. And I don't see how we
could have a clean hugepage (apart from the special case of the
zeropage, which is read-only, so races on teh dirty bit aren't an
issue).

So it might actually be that the non-atomic version is safe for
hpages. And we could possibly get rid of the "atomic read-and-clear"
even for the non-numa case.

I'd rather do it for both cases than for just one of them.

But:

> As a second hack (not to be applied), could we change:
>
>  #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL
>
> to:
>
>  #define _PAGE_BIT_PROTNONE      (_PAGE_BIT_GLOBAL+1)
>
> to double check that the position of the bit does not matter?

Agreed. We should definitely try that.

Dave?

Also, is there some sane way for me to actually see this behavior on a
regular machine with just a single socket? Dave is apparently running
in some fake-numa setup, I'm wondering if this is easy enough to
reproduce that I could see it myself.

                          Linus

<Prev in Thread] Current Thread [Next in Thread>