On Thu 19-02-15 06:01:24, Johannes Weiner wrote:
[...]
> Preferrably, we'd get rid of all nofail allocations and replace them
> with preallocated reserves. But this is not going to happen anytime
> soon, so what other option do we have than resolving this on the OOM
> killer side?
As I've mentioned in other email, we might give GFP_NOFAIL allocator
access to memory reserves (by giving it __GFP_HIGH). This is still not a
100% solution because reserves could get depleted but this risk is there
even with multiple oom victims. I would still argue that this would be a
better approach because selecting more victims might hit pathological
case more easily (other victims might be blocked on the very same lock
e.g.).
Something like the following:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8d52ab18fe0d..4b5cf28a13f4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2599,6 +2599,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
enum migrate_mode migration_mode = MIGRATE_ASYNC;
bool deferred_compaction = false;
int contended_compaction = COMPACT_CONTENDED_NONE;
+ int oom = 0;
/*
* In the slowpath, we sanity check order to avoid ever trying to
@@ -2628,6 +2629,15 @@ retry:
wake_all_kswapds(order, ac);
/*
+ * __GFP_NOFAIL allocations cannot fail but yet the current context
+ * might be blocking resources needed by the OOM victim to terminate.
+ * Allow the caller to dive into memory reserves to succeed the
+ * allocation and break out from a potential deadlock.
+ */
+ if (oom > 10 && (gfp_mask & __GFP_NOFAIL))
+ gfp_mask |= __GFP_HIGH;
+
+ /*
* OK, we're below the kswapd watermark and have kicked background
* reclaim. Now things get more complex, so set up alloc_flags according
* to how we want to proceed.
@@ -2759,6 +2769,8 @@ retry:
goto got_pg;
if (!did_some_progress)
goto nopage;
+
+ oom++;
}
/* Wait for some write requests to complete then retry */
wait_iff_congested(ac->preferred_zone, BLK_RW_ASYNC, HZ/50);
--
Michal Hocko
SUSE Labs
|