xfs
[Top] [All Lists]

Re: [regression v4.0-rc1] mm: IPIs from TLB flushes causing significant

To: Dave Chinner <david@xxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Matt B <jackdachef@xxxxxxxxx>
Subject: Re: [regression v4.0-rc1] mm: IPIs from TLB flushes causing significant performance degradation.
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 2 Mar 2015 11:47:52 -0800
Cc: Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>, linux-mm <linux-mm@xxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=TVyVcfLoDAcqFfpiZz4YaPIOETPDomJxKpo8lxgRIWM=; b=dXAQDwKadMBhVJLS8bHB3U6XX5gQVZ9EJWxe7J+KeMs0j15MK8gfyw74Cj05NjNa8k R0lUDw8Thu8+4vsrUBnvz0m9q84xRl/Fm106bUMEUb5eVXHZWy8SWvB7RGWhy/kx0mwV uwQ9hwVXXC7OlbDrKQK1/utRifoyom15dwiLgHYQ7pxrQ1qXi5bMRvz6kwV7fk0DzMQ0 QOgzK2DxnHxalRIUaUwmOJx6Jl3MwiPy30z+i/blx57FsAzvvos6SDXgMmGqg7LhLusL f/nUS05CrP4zICwmoisSKLOC1oMY0z6z7KWj6ZatAipIZh0OAYOREFy9heGtr9IQmoF3 uJzA==
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=TVyVcfLoDAcqFfpiZz4YaPIOETPDomJxKpo8lxgRIWM=; b=cV15+sSUS0C3LXhisTbUBPpq3wI/cBZkEDg2n/kQ23Q56QkFZPAhcDKC8NbRDDCEKA GO7Am4eYi41XJp/8wdHpS8uYsA/nfmVgCBihwn7tF2Hy75P7GGjgQqXuVs1qpKvoPpVC P6TccmN25TsDJsyWncKRXaEH/3Ke+16mrNs5U=
In-reply-to: <20150302010413.GP4251@dastard>
References: <20150302010413.GP4251@dastard>
Sender: linus971@xxxxxxxxx
On Sun, Mar 1, 2015 at 5:04 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> Across the board the 4.0-rc1 numbers are much slower, and the
> degradation is far worse when using the large memory footprint
> configs. Perf points straight at the cause - this is from 4.0-rc1
> on the "-o bhash=101073" config:
>
> -   56.07%    56.07%  [kernel]            [k] 
> default_send_IPI_mask_sequence_phys
>       - 99.99% physflat_send_IPI_mask
>          - 99.37% native_send_call_func_ipi
..
>
> And the same profile output from 3.19 shows:
>
> -    9.61%     9.61%  [kernel]            [k] 
> default_send_IPI_mask_sequence_phys
>      - 99.98% physflat_send_IPI_mask
>          - 96.26% native_send_call_func_ipi
...
>
> So either there's been a massive increase in the number of IPIs
> being sent, or the cost per IPI have greatly increased. Either way,
> the result is a pretty significant performance degradatation.

And on Mon, Mar 2, 2015 at 11:17 AM, Matt <jackdachef@xxxxxxxxx> wrote:
>
> Linus already posted a fix to the problem, however I can't seem to
> find the matching commit in his tree (searching for "TLC regression"
> or "TLB cache").

That was commit f045bbb9fa1b, which was then refined by commit
721c21c17ab9, because it turned out that ARM64 had a very subtle
relationship with tlb->end and fullmm.

But both of those hit 3.19, so none of this should affect 4.0-rc1.
There's something else going on.

I assume it's the mm queue from Andrew, so adding him to the cc. There
are changes to the page migration etc, which could explain it.

There are also a fair amount of APIC changes in 4.0-rc1, so I guess it
really could be just that the IPI sending itself has gotten much
slower. Adding Ingo for that, although I don't think
default_send_IPI_mask_sequence_phys() itself hasn't actually changed,
only other things around the apic. So I'd be inclined to blame the mm
changes.

Obviously bisection would find it..

                          Linus

<Prev in Thread] Current Thread [Next in Thread>