xfs
[Top] [All Lists]

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 18 Mar 2015 09:08:44 -0700
Cc: Mel Gorman <mgorman@xxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Linux-MM <linux-mm@xxxxxxxxx>, xfs@xxxxxxxxxxx, ppc-dev <linuxppc-dev@xxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=aiQGVTuGasaA3cEcv/niK5p1DxsTYVwl4/riuhJQZIQ=; b=zO2TSOuiJKJlPdMa5/iNSVhE1okQNN/h0FuKkFYs5UjGN6CmxcbwAjECuFsyH1Poln hZQcECB+y1AuqsKq6mmlSQMkvjXtXh+XR5NThAVzR3qBKSLU0oEvOENVrVSVo1lx+Xb/ JCWa9GCFGp1pvxNNFqEKFHbYc9suzLZFLgmPFkh4hGOArFbbvv1z+x8hNxuH/49VqskR GeaxDdyq40MdrWYSVrq1n5hvwh4YZwreqtg/FHsI7dhgZEsAbEtzoX5ujGIUI9DW7mEV flWkCpqc/NISByzsQu6ykcN14KywW8Z0PGcGB4QnWYYMB76cvyuR+kNI42u1eQOK8UXy /TUA==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=aiQGVTuGasaA3cEcv/niK5p1DxsTYVwl4/riuhJQZIQ=; b=QpLuNdV9DqJ3MkkVUvul+W45faWFztmui3ny0U9WdlvgLw2vvIBKtJUIWtALCl0QYb aA6/XbYpxwjfr1XcGty3w3Bj835ilLkDNgwFv4vOsp0mEFziOyIPbnG0xpnsf0SF86mL 6PwhwQGtWnPezJEWcxMhkyVC7OciseWwghjQE=
In-reply-to: <20150317220840.GC28621@dastard>
References: <CA+55aFywW5JLq=BU_qb2OG5+pJ-b1v9tiS5Ygi-vtEKbEZ_T5Q@xxxxxxxxxxxxxx> <20150309191943.GF26657@destitution> <CA+55aFzFt-vX5Jerci0Ty4Uf7K4_nQ7wyCp8hhU_dB0X4cBpVQ@xxxxxxxxxxxxxx> <20150312131045.GE3406@xxxxxxx> <CA+55aFx=81BGnQFNhnAGu6CetL7yifPsnD-+v7Y6QRqwgH47gQ@xxxxxxxxxxxxxx> <20150312184925.GH3406@xxxxxxx> <20150317070655.GB10105@dastard> <CA+55aFzdLnFdku-gnm3mGbeS=QauYBNkFQKYXJAGkrMd2jKXhw@xxxxxxxxxxxxxx> <20150317205104.GA28621@dastard> <CA+55aFzSPcNgxw4GC7aAV1r0P5LniyVVC66COz=3cgMcx73Nag@xxxxxxxxxxxxxx> <20150317220840.GC28621@dastard>
Sender: linus971@xxxxxxxxx
On Tue, Mar 17, 2015 at 3:08 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>
>> Damn. From a performance number standpoint, it looked like we zoomed
>> in on the right thing. But now it's migrating even more pages than
>> before. Odd.
>
> Throttling problem, like Mel originally suspected?

That doesn't much make sense for the original bisect you did, though.

Although if there are two different issues, maybe that bisect was
wrong. Or rather, incomplete.

>> Can you do a simple stupid test? Apply that commit 53da3bc2ba9e ("mm:
>> fix up numa read-only thread grouping logic") to 3.19, so that it uses
>> the same "pte_dirty()" logic as 4.0-rc4. That *should* make the 3.19
>> and 4.0-rc4 numbers comparable.
>
> patched 3.19 numbers on this test are slightly worse than stock
> 3.19, but nowhere near as bad as 4.0-rc4:
>
>         241,718      migrate:mm_migrate_pages           ( +-  5.17% )

Ok, that's still much worse than plain 3.19, which was ~55,000.
Assuming your memory/measurements were the same.

So apparently the pte_write() -> pte_dirty() check isn't equivalent at
all. My thinking that for the common case (ie private mappings) it
would be *exactly* the same, because all normal COW pages turn dirty
at the same time they turn writable (and, in page_mkclean_one(), turn
clean and read-only again at the same time). But if the numbers change
that much, then clearly my simplistic "they are the same in practice"
is just complete BS.

So why am I wrong? Why is testing for dirty not the same as testing
for writable?

I can see a few cases:

 - your load has lots of writable (but not written-to) shared memory,
and maybe the test should be something like

      pte_dirty(pte) || (vma->vm_flags & (VM_WRITE|VM_SHARED) ==
(VM_WRITE|VM_SHARED))

   and we really should have some helper function for this logic.

 - something completely different that I am entirely missing

What am I missing?

                          Linus

<Prev in Thread] Current Thread [Next in Thread>