xfs
[Top] [All Lists]

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
From: Ingo Molnar <mingo@xxxxxxxxxx>
Date: Sun, 8 Mar 2015 11:02:23 +0100
Cc: Mel Gorman <mgorman@xxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Linux-MM <linux-mm@xxxxxxxxx>, xfs@xxxxxxxxxxx, ppc-dev <linuxppc-dev@xxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=8PCLiJthf3zlgObuaEifn+gzErhWTTIdkDd4zcjIoD0=; b=gAXZ0i02tAR0Im5wwhb4cDOZ5PeizInAr6w5LICmRrDBPgwaZulV6OzYLNKdLWNL4d P0EU51V89XZhAH4Da3Jb7DYEkF6G1Je2n/db3yAShqDu+e6e4g2pcZrf/9m8q0cXj/06 VO0p2kgcclhyJU1xXrw5n0/rbB8QLUFiAImpHuIi26scaem10DYZMoRbLa0ompAZI2YP RxGOx7HhDOEQ5kCMNSf/UIm9vMbiufdt+0MFVmozxNA58oRz9xyefoQ+KjbFLEXjhtG0 Y89UlW10YUIm7f8xBGmxec4cM2GNAPl5JCCIW+TVOpHaN7/Htij/U1Mop1HKFm4rFSkv pw8w==
In-reply-to: <CA+55aFwDuzpL-k8LsV3touhNLh+TFSLKP8+-nPwMXkWXDYPhrg@xxxxxxxxxxxxxx>
References: <1425741651-29152-1-git-send-email-mgorman@xxxxxxx> <1425741651-29152-5-git-send-email-mgorman@xxxxxxx> <20150307163657.GA9702@xxxxxxxxx> <CA+55aFwDuzpL-k8LsV3touhNLh+TFSLKP8+-nPwMXkWXDYPhrg@xxxxxxxxxxxxxx>
Sender: Ingo Molnar <mingo.kernel.org@xxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > And the patch Dave bisected to is a relatively simple patch. Why 
> > not simply revert it to see whether that cures much of the 
> > problem?
> 
> So the problem with that is that "pmd_set_numa()" and friends simply 
> no longer exist. So we can't just revert that one patch, it's the 
> whole series, and the whole point of the series.

Yeah.

> What confuses me is that the only real change that I can see in that 
> patch is the change to "change_huge_pmd()". Everything else is 
> pretty much a 100% equivalent transformation, afaik. Of course, I 
> may be wrong about that, and missing something silly.

Well, there's a difference in what we write to the pte:

 #define _PAGE_BIT_NUMA          (_PAGE_BIT_GLOBAL+1)
 #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL

and our expectation was that the two should be equivalent methods from 
the POV of the NUMA balancing code, right?

> And the changes to "change_huge_pmd()" were basically re-done
> differently by subsequent patches anyway.
> 
> The *only* change I see remaining is that change_huge_pmd() now does
> 
>    entry = pmdp_get_and_clear_notify(mm, addr, pmd);
>    entry = pmd_modify(entry, newprot);
>    set_pmd_at(mm, addr, pmd, entry);
> 
> for all changes. It used to do that "pmdp_set_numa()" for the
> prot_numa case, which did just
> 
>    pmd_t pmd = *pmdp;
>    pmd = pmd_mknuma(pmd);
>    set_pmd_at(mm, addr, pmdp, pmd);
> 
> instead.
> 
> I don't like the old pmdp_set_numa() because it can drop dirty bits,
> so I think the old code was actively buggy.

Could we, as a silly testing hack not to be applied, write a 
hack-patch that re-introduces the racy way of setting the NUMA bit, to 
confirm that it is indeed this difference that changes pte visibility 
across CPUs enough to create so many more faults?

Because if the answer is 'yes', then we can safely say: 'we regressed 
performance because correctness [not dropping dirty bits] comes before 
performance'.

If the answer is 'no', then we still have a mystery (and a regression) 
to track down.

As a second hack (not to be applied), could we change:

 #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL

to:

 #define _PAGE_BIT_PROTNONE      (_PAGE_BIT_GLOBAL+1)

to double check that the position of the bit does not matter?

I don't think we've exhaused all avenues of analysis here.

Thanks,

        Ingo

<Prev in Thread] Current Thread [Next in Thread>