xfs
[Top] [All Lists]

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 17 Mar 2015 09:53:57 -0700
Cc: Mel Gorman <mgorman@xxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Aneesh Kumar <aneesh.kumar@xxxxxxxxxxxxxxxxxx>, Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, Linux-MM <linux-mm@xxxxxxxxx>, xfs@xxxxxxxxxxx, ppc-dev <linuxppc-dev@xxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ZxREJ3/qKj6evvVmI8hxkQXUbXiP8SaqOnAAfl9XHNg=; b=W9mnto0TH9zPwVP6r2zQg+vHMnu86TjQEAb1nyPBsd9A3+0aMp2ncAgcqUJ5TLPl8F Tg9yDakBeYlmzczUvWqWSM7WNmJvWR4SEB5wygRQNs6gxcAeRTQrTCd1V+nCyIMcOz8R whAYZsAYyQAjCJ+vmzanIXwPP5fhYbqiibKdpXXgta8zZYikpihaPqIYWpXXijZN2co3 tr+oqPRYTUKOcIiBSiBW8/9mqWIw6TYd8IBJGIMekKnNEWok7aQpz/8Zp1hJbcN5tRK6 XBtej6URwqwhq/ccsT0B/MIvImF/cdYkaMa5IKIyJgnuLGxQ9pS0Otb0iKAUJVw7O2/2 uU3g==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=ZxREJ3/qKj6evvVmI8hxkQXUbXiP8SaqOnAAfl9XHNg=; b=d67pm/uSUREmBeAp0ewjXb0TmwBmD4Cln/bqxi9QTNJqNJad0fGSwFUnsHtbMZRonp S2r+NuI/D+GtOZ0ukqoLJ3A2YxE/FeIenMLzjZwvwqwuq1ZXslrNm6JSf7AxAPyYhiYo rfo8g35m0HcDCHugHXskExj8eR9FVrqGqvaEw=
In-reply-to: <20150317070655.GB10105@dastard>
References: <CA+55aFwDuzpL-k8LsV3touhNLh+TFSLKP8+-nPwMXkWXDYPhrg@xxxxxxxxxxxxxx> <20150308100223.GC15487@xxxxxxxxx> <CA+55aFyQyZXu2fi7X9bWdSX0utk8=sccfBwFaSoToROXoE_PLA@xxxxxxxxxxxxxx> <20150309112936.GD26657@destitution> <CA+55aFywW5JLq=BU_qb2OG5+pJ-b1v9tiS5Ygi-vtEKbEZ_T5Q@xxxxxxxxxxxxxx> <20150309191943.GF26657@destitution> <CA+55aFzFt-vX5Jerci0Ty4Uf7K4_nQ7wyCp8hhU_dB0X4cBpVQ@xxxxxxxxxxxxxx> <20150312131045.GE3406@xxxxxxx> <CA+55aFx=81BGnQFNhnAGu6CetL7yifPsnD-+v7Y6QRqwgH47gQ@xxxxxxxxxxxxxx> <20150312184925.GH3406@xxxxxxx> <20150317070655.GB10105@dastard>
Sender: linus971@xxxxxxxxx
On Tue, Mar 17, 2015 at 12:06 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> TO close the loop here, now I'm back home and can run tests:
>
> config                            3.19      4.0-rc1     4.0-rc4
> defaults                         8m08s        9m34s       9m14s
> -o ag_stride=-1                  4m04s        4m38s       4m11s
> -o bhash=101073                  6m04s       17m43s       7m35s
> -o ag_stride=-1,bhash=101073     4m54s        9m58s       7m50s
>
> It's better but there are still significant regressions, especially
> for the large memory footprint cases. I haven't had a chance to look
> at any stats or profiles yet, so I don't know yet whether this is
> still page fault related or some other problem....

Ok. I'd love to see some data on what changed between 3.19 and rc4 in
the profiles, just to see whether it's "more page faults due to extra
COW", or whether it's due to "more TLB flushes because of the
pte_write() vs pte_dirty()" differences. I'm *guessing*  lot of the
remaining issues are due to extra page fault overhead because I'd
expect write/dirty to be fairly 1:1, but there could be differences
due to shared memory use and/or just writebacks of dirty pages that
become clean.

I guess you can also see in vmstat.mm_migrate_pages whether it's
because of excessive migration (because of bad grouping) or not. So
not just profiles data.

At the same time, I feel fairly happy about the situation - we at
least understand what is going on, and the "3x worse performance" case
is at least gone.  Even if that last case still looks horrible.

So it's still a bad performance regression, but at the same time I
think your test setup (big 500 TB filesystem, but then a fake-numa
thing with just 4GB per node) is specialized and unrealistic enough
that I don't feel it's all that relevant from a *real-world*
standpoint, and so I wouldn't be uncomfortable saying "ok, the page
table handling cleanup caused some issues, but we know about them and
how to fix them longer-term".  So I don't consider this a 4.0
showstopper or a "we need to revert for now" issue.

If it's a case of "we take a lot more page faults because we handle
the NUMA fault and then have a COW fault almost immediately", then the
fix is likely to do the same early-cow that the normal non-numa-fault
case does. In fact, my gut feel is that we should try to unify that
numa/regula fault handling path a bit more, but that would be a pretty
invasive patch.

                     Linus

<Prev in Thread] Current Thread [Next in Thread>