xfs
[Top] [All Lists]

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Thu, 19 Mar 2015 14:41:48 -0700
Cc: Mel Gorman <mgorman@xxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Linux-MM <linux-mm@xxxxxxxxx>, xfs@xxxxxxxxxxx, ppc-dev <linuxppc-dev@xxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=BRtU9M/TXSDUFVs1eJWbfHVDdjaludDX0mcFkvUjLc0=; b=x/XqfkfoJpzu8Cyh0Mw96FyuximnL2w008/LCRR6Atjdltigk3y+C7CcSlA1ls5T1Q kf0tunh2BZdZoyZU9wmzsTUotUVEeOrUGVQVSnyjW8OLnnEwM49evZ5VD598UgX8TLOO 8tZ0cxbe3ERhysIUJpe6kTIPcgTThSpMJ+DZVKh1w8qjkEDwkXcEaVgy8os/qBdCPrg7 YAI86Zn9QzCH01l79zgaOjKO77w86j5/wjNPSMWOJQ4RyBOBccDw8wdB3nkXvv2geahR 1G/nuaGuN+K9k1XQQBzpmjWmp8fBjEpKEC+/bVMi2MD/3NBRfqOOLkYsYb1qtFXPglTi 8ELg==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=BRtU9M/TXSDUFVs1eJWbfHVDdjaludDX0mcFkvUjLc0=; b=H2nfHtX8yL2qL51M3byx0tI3+QSuYEdG0EsRsZ/sAn6ng92PlzNxi1CCkndp8G4xD5 oULbZrYxLk/A4PQ5jNw+Z3s6LwWp+IHUC4TyoY0g0q0u4LrWLQDlmk5XtxJD1kvsfQiA XHaYfjRxEge4oPIYBELqw/FycnGiyODBUGXGM=
In-reply-to: <CA+55aFy-Mw74rAdLMMMUgnsG3ZttMWVNGz7CXZJY7q9fqyRYfg@xxxxxxxxxxxxxx>
References: <CA+55aFywW5JLq=BU_qb2OG5+pJ-b1v9tiS5Ygi-vtEKbEZ_T5Q@xxxxxxxxxxxxxx> <20150309191943.GF26657@destitution> <CA+55aFzFt-vX5Jerci0Ty4Uf7K4_nQ7wyCp8hhU_dB0X4cBpVQ@xxxxxxxxxxxxxx> <20150312131045.GE3406@xxxxxxx> <CA+55aFx=81BGnQFNhnAGu6CetL7yifPsnD-+v7Y6QRqwgH47gQ@xxxxxxxxxxxxxx> <20150312184925.GH3406@xxxxxxx> <20150317070655.GB10105@dastard> <CA+55aFzdLnFdku-gnm3mGbeS=QauYBNkFQKYXJAGkrMd2jKXhw@xxxxxxxxxxxxxx> <20150317205104.GA28621@dastard> <CA+55aFzSPcNgxw4GC7aAV1r0P5LniyVVC66COz=3cgMcx73Nag@xxxxxxxxxxxxxx> <20150317220840.GC28621@dastard> <CA+55aFwne-fe_Gg-_GTUo+iOAbbNpLBa264JqSFkH79EULyAqw@xxxxxxxxxxxxxx> <CA+55aFy-Mw74rAdLMMMUgnsG3ZttMWVNGz7CXZJY7q9fqyRYfg@xxxxxxxxxxxxxx>
Sender: linus971@xxxxxxxxx
On Wed, Mar 18, 2015 at 10:31 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> So I think there's something I'm missing. For non-shared mappings, I
> still have the idea that pte_dirty should be the same as pte_write.
> And yet, your testing of 3.19 shows that it's a big difference.
> There's clearly something I'm completely missing.

Ahh. The normal page table scanning and page fault handling both clear
and set the dirty bit together with the writable one. But "fork()"
will clear the writable bit without clearing dirty. For some reason I
thought it moved the dirty bit into the struct page like the VM
scanning does, but that was just me having a brainfart. So yeah,
pte_dirty doesn't have to match pte_write even under perfectly normal
circumstances. Maybe there are other cases.

Not that I see a lot of forking in the xfs repair case either, so..

Dave, mind re-running the plain 3.19 numbers to really verify that the
pte_dirty/pte_write change really made that big of a difference. Maybe
your recollection of ~55,000 migrate_pages events was faulty. If the
pte_write ->pte_dirty change is the *only* difference, it's still very
odd how that one difference would make migrate_rate go from ~55k to
471k. That's an order of magnitude difference, for what really
shouldn't be a big change.

I'm running a kernel right now with a hacky "update_mmu_cache()" that
warns if pte_dirty is ever different from pte_write().

+void update_mmu_cache(struct vm_area_struct *vma,
+               unsigned long addr, pte_t *ptep)
+{
+       if (!(vma->vm_flags & VM_SHARED)) {
+               pte_t now = READ_ONCE(*ptep);
+               if (!pte_write(now) != !pte_dirty(now)) {
+                       static int count = 20;
+                       static unsigned int prev = 0;
+                       unsigned int val = pte_val(now) & 0xfff;
+                       if (prev != val && count) {
+                               prev = val;
+                               count--;
+                               WARN(1, "pte value %x", val);
+                       }
+               }
+       }
+}

I haven't seen a single warning so far (and there I wrote all that
code to limit repeated warnings), although admittedly
update_mu_cache() isn't called for all cases where we change a pte
(not for the fork case, for example). But it *is* called for the page
faulting cases

Maybe a system update has changed libraries and memory allocation
patterns, and there is something bigger than that one-liner
pte_dirty/write change going on?

                             Linus

<Prev in Thread] Current Thread [Next in Thread>