xfs
[Top] [All Lists]

RE: [PATCH] mm/vmscan: Do not block forever at shrink_inactive_list().

To: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>, "david@xxxxxxxxxxxxx" <david@xxxxxxxxxxxxx>, "riel@xxxxxxxxxx" <riel@xxxxxxxxxx>
Subject: RE: [PATCH] mm/vmscan: Do not block forever at shrink_inactive_list().
From: Motohiro Kosaki <Motohiro.Kosaki@xxxxxxxxxxxxxx>
Date: Tue, 20 May 2014 09:12:07 -0700
Accept-language: en-US
Acceptlanguage: en-US
Cc: Motohiro Kosaki JP <kosaki.motohiro@xxxxxxxxxxxxxx>, "fengguang.wu@xxxxxxxxx" <fengguang.wu@xxxxxxxxx>, "kamezawa.hiroyu@xxxxxxxxxxxxxx" <kamezawa.hiroyu@xxxxxxxxxxxxxx>, "akpm@xxxxxxxxxxxxxxxxxxxx" <akpm@xxxxxxxxxxxxxxxxxxxx>, "hch@xxxxxxxxxxxxx" <hch@xxxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <201405202358.ADF10119.SMOFOQLFtOVHJF@xxxxxxxxxxxxxxxxxxx>
References: <201405192340.FCD48964.OFQHOOJLVSFFMt@xxxxxxxxxxxxxxxxxxx> <20140520004449.GE18954@dastard> <20140519225915.3370328d.akpm@xxxxxxxxxxxxxxxxxxxx> <20140520063024.GH18954@dastard> <201405202358.ADF10119.SMOFOQLFtOVHJF@xxxxxxxxxxxxxxxxxxx>
Thread-index: Ac90PBjBis44RDgCTnaBUDFH5AOokQACcNNg
Thread-topic: [PATCH] mm/vmscan: Do not block forever at shrink_inactive_list().

> -----Original Message-----
> From: Tetsuo Handa [mailto:penguin-kernel@xxxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, May 20, 2014 11:58 PM
> To: david@xxxxxxxxxxxxx; riel@xxxxxxxxxx
> Cc: Motohiro Kosaki JP; fengguang.wu@xxxxxxxxx; 
> kamezawa.hiroyu@xxxxxxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;
> hch@xxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; xfs@xxxxxxxxxxx
> Subject: Re: [PATCH] mm/vmscan: Do not block forever at 
> shrink_inactive_list().
> 
> Today I discussed with Kosaki-san at LinuxCon Japan 2014 about this issue.
> He does not like the idea of adding timeout to throttle loop. As Dave posted 
> a patch that fixes a bug in XFS delayed allocation, I
> updated my patch accordingly.
> 
> Although the bug in XFS was fixed by Dave's patch, other kernel code would 
> have bugs which would fall into this infinite throttle loop.
> But to keep the possibility of triggering OOM killer minimum, can we agree 
> with this updated patch (and in the future adding some
> warning mechanism like /proc/sys/kernel/hung_task_timeout_secs for detecting 
> memory allocation stall)?
> 
> Dave, if you are OK with this updated patch, please let me know commit ID of 
> your patch.
> 
> Regards.
> ----------
> >From 408e65d9025e8e24838e7bf6ac9066ba8a9391a6 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Date: Tue, 20 May 2014 23:34:34 +0900
> Subject: [PATCH] mm/vmscan: Do not throttle kswapd at shrink_inactive_list().
> 
> I can observe that commit 35cd7815 "vmscan: throttle direct reclaim when too 
> many pages are isolated already" causes RHEL7
> environment to stall with 0% CPU usage when a certain type of memory pressure 
> is given.
> This is because nobody can reclaim memory due to rules listed below.
> 
>   (a) XFS uses a kernel worker thread for delayed allocation
>   (b) kswapd wakes up the kernel worker thread for delayed allocation
>   (c) the kernel worker thread is throttled due to commit 35cd7815
> 
> This patch and commit XXXXXXXX "xfs: block allocation work needs to be kswapd 
> aware" will solve rule (c).
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> ---
>  mm/vmscan.c |   20 +++++++++++++++-----
>  1 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 32c661d..5c6960e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1460,12 +1460,22 @@ shrink_inactive_list(unsigned long nr_to_scan, struct 
> lruvec *lruvec,
>       struct zone *zone = lruvec_zone(lruvec);
>       struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> 
> -     while (unlikely(too_many_isolated(zone, file, sc))) {
> -             congestion_wait(BLK_RW_ASYNC, HZ/10);
> +     /*
> +      * Throttle only direct reclaimers. Allocations by kswapd (and
> +      * allocation workqueue on behalf of kswapd) should not be
> +      * throttled here; otherwise memory allocation will deadlock.
> +      */
> +     if (!sc->hibernation_mode && !current_is_kswapd()) {
> +             while (unlikely(too_many_isolated(zone, file, sc))) {
> +                     congestion_wait(BLK_RW_ASYNC, HZ/10);
> 
> -             /* We are about to die and free our memory. Return now. */
> -             if (fatal_signal_pending(current))
> -                     return SWAP_CLUSTER_MAX;
> +                     /*
> +                      * We are about to die and free our memory.
> +                      * Return now.
> +                      */
> +                     if (fatal_signal_pending(current))
> +                             return SWAP_CLUSTER_MAX;
> +             }
>       }


Acked-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>


Dave, I don't like Tetsuo's first patch because this too_many_isolated exist to 
prevent false oom-kill. So, simple timeout
resurrect it. Please let me know if you need further MM enhancement to solve 
XFS issue. I'd like join and assist this.

Thanks.






<Prev in Thread] Current Thread [Next in Thread>