xfs
[Top] [All Lists]

Re: How to handle TIF_MEMDIE stalls?

To: Johannes Weiner <hannes@xxxxxxxxxxx>
Subject: Re: How to handle TIF_MEMDIE stalls?
From: Michal Hocko <mhocko@xxxxxxx>
Date: Thu, 19 Feb 2015 13:29:14 +0100
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>, dchinner@xxxxxxxxxx, linux-mm@xxxxxxxxx, rientjes@xxxxxxxxxx, oleg@xxxxxxxxxx, akpm@xxxxxxxxxxxxxxxxxxxx, mgorman@xxxxxxx, torvalds@xxxxxxxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150219110124.GC15569@xxxxxxxxxxxxxxxxxxxxxx>
References: <201502102258.IFE09888.OVQFJOMSFtOLFH@xxxxxxxxxxxxxxxxxxx> <20150210151934.GA11212@xxxxxxxxxxxxxxxxxxxxxx> <201502111123.ICD65197.FMLOHSQJFVOtFO@xxxxxxxxxxxxxxxxxxx> <201502172123.JIE35470.QOLMVOFJSHOFFt@xxxxxxxxxxxxxxxxxxx> <20150217125315.GA14287@xxxxxxxxxxxxxxxxxxxxxx> <20150217225430.GJ4251@dastard> <20150218082502.GA4478@xxxxxxxxxxxxxx> <20150218104859.GM12722@dastard> <20150218121602.GC4478@xxxxxxxxxxxxxx> <20150219110124.GC15569@xxxxxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
On Thu 19-02-15 06:01:24, Johannes Weiner wrote:
[...]
> Preferrably, we'd get rid of all nofail allocations and replace them
> with preallocated reserves.  But this is not going to happen anytime
> soon, so what other option do we have than resolving this on the OOM
> killer side?

As I've mentioned in other email, we might give GFP_NOFAIL allocator
access to memory reserves (by giving it __GFP_HIGH). This is still not a
100% solution because reserves could get depleted but this risk is there
even with multiple oom victims. I would still argue that this would be a
better approach because selecting more victims might hit pathological
case more easily (other victims might be blocked on the very same lock
e.g.).

Something like the following:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8d52ab18fe0d..4b5cf28a13f4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2599,6 +2599,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
        enum migrate_mode migration_mode = MIGRATE_ASYNC;
        bool deferred_compaction = false;
        int contended_compaction = COMPACT_CONTENDED_NONE;
+       int oom = 0;
 
        /*
         * In the slowpath, we sanity check order to avoid ever trying to
@@ -2628,6 +2629,15 @@ retry:
                wake_all_kswapds(order, ac);
 
        /*
+        * __GFP_NOFAIL allocations cannot fail but yet the current context
+        * might be blocking resources needed by the OOM victim to terminate.
+        * Allow the caller to dive into memory reserves to succeed the
+        * allocation and break out from a potential deadlock.
+        */
+       if (oom > 10 && (gfp_mask & __GFP_NOFAIL))
+               gfp_mask |= __GFP_HIGH;
+
+       /*
         * OK, we're below the kswapd watermark and have kicked background
         * reclaim. Now things get more complex, so set up alloc_flags according
         * to how we want to proceed.
@@ -2759,6 +2769,8 @@ retry:
                                goto got_pg;
                        if (!did_some_progress)
                                goto nopage;
+
+                       oom++;
                }
                /* Wait for some write requests to complete then retry */
                wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
-- 
Michal Hocko
SUSE Labs

<Prev in Thread] Current Thread [Next in Thread>