xfs
[Top] [All Lists]

Re: [PATCH 1/4] mm: thp: Return the correct value for change_huge_pmd

To: Mel Gorman <mgorman@xxxxxxx>
Subject: Re: [PATCH 1/4] mm: thp: Return the correct value for change_huge_pmd
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Sat, 7 Mar 2015 12:31:03 -0800
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Linux-MM <linux-mm@xxxxxxxxx>, xfs@xxxxxxxxxxx, ppc-dev <linuxppc-dev@xxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=E2QNwXSrRKcBhJWerpuxCwB/SXpBhbTS6GTtYs3XZyo=; b=jEPy9k2uG1V8bvlEPK7ivTIUo+MBtjIE8YOdca7KOxRqQU90M2dpwfLOiRjxfx5cqt vmiTfo1EaxL/eKoMVUwHeaWc0dlWs3vpCJDxZ18+wlzf3Cm0QAZRDa9XjPlJPMB4HBkS lGyfjX/aDRIykeBjPECzPSxs5ANvLaoyh3j0g8hjkaGeFyMP8uQw0K+RPX6eByygmKcS IfXoIhvPjeFrvFUsjkVjqgXWO6IUXBtA+FgZoO/8iIaGq+LMIk+XiVuuhNEFBY7V/8Mr 5UZjzHp0s8ZfjyM/0hbMxKAzyE0mh6Oti1zu2ZnLvubcu3Ox6aAFOONCyaq62rB16oNk qtQg==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=E2QNwXSrRKcBhJWerpuxCwB/SXpBhbTS6GTtYs3XZyo=; b=QzLRF7fCfwTPazznfnjzT2XA2kxbJsGOLy13oOR7acMHF3st45cv9xcTqgleCtvmUp JIpnm7rA13lASZOq5psmUnFfBwNpXL89Ibo6KBKeIrth2VJuYUCdACHEJ4huGQpWK1ff g1wB9R/87spmTiNqqBLFR/mTUuMUsKD/yrm8c=
In-reply-to: <1425741651-29152-2-git-send-email-mgorman@xxxxxxx>
References: <1425741651-29152-1-git-send-email-mgorman@xxxxxxx> <1425741651-29152-2-git-send-email-mgorman@xxxxxxx>
Sender: linus971@xxxxxxxxx
On Sat, Mar 7, 2015 at 7:20 AM, Mel Gorman <mgorman@xxxxxxx> wrote:
>
>                 if (!prot_numa || !pmd_protnone(*pmd)) {
> -                       ret = 1;
>                         entry = pmdp_get_and_clear_notify(mm, addr, pmd);
>                         entry = pmd_modify(entry, newprot);
>                         ret = HPAGE_PMD_NR;

Hmm. I know I acked this already, but the return value - which correct
- is still potentially something we could improve upon.

In particular, we don't need to flush the TLB's if the old entry was
not present. Sadly, we don't have a helper function for that.

But the code *could* do something like

    entry = pmdp_get_and_clear_notify(mm, addr, pmd);
    ret = pmd_tlb_cacheable(entry) ? HPAGE_PMD_NR : 1;
    entry = pmd_modify(entry, newprot);

where pmd_tlb_cacheable() on x86 would test if _PAGE_PRESENT (bit #0) is set.

In particular, that would mean that as we change *from* a protnone
(whether NUMA or really protnone) we wouldn't need to flush the TLB.

In fact, we could make it even more aggressive: it's not just an old
non-present TLB entry that doesn't need flushing - we can avoid the
flushing whenever we strictly increase the access rigths. So we could
have something that takes the old entry _and_ the new protections into
account, and avoids the TLB flush if the new entry is strictly more
permissive.

This doesn't explain the extra TLB flushes Dave sees, though, because
the old code didn't make those kinds of optimizations either. But
maybe something like this is worth doing.

                            Linus

<Prev in Thread] Current Thread [Next in Thread>