xfs
[Top] [All Lists]

Re: How to handle TIF_MEMDIE stalls?

To: mhocko@xxxxxxx
Subject: Re: How to handle TIF_MEMDIE stalls?
From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Fri, 20 Feb 2015 21:20:58 +0900
Cc: hannes@xxxxxxxxxxx, david@xxxxxxxxxxxxx, dchinner@xxxxxxxxxx, linux-mm@xxxxxxxxx, rientjes@xxxxxxxxxx, oleg@xxxxxxxxxx, akpm@xxxxxxxxxxxxxxxxxxxx, mgorman@xxxxxxx, torvalds@xxxxxxxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, fernando_b1@xxxxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150220091001.GC21248@xxxxxxxxxxxxxx>
References: <20150218121602.GC4478@xxxxxxxxxxxxxx> <20150219110124.GC15569@xxxxxxxxxxxxxxxxxxxxxx> <20150219122914.GH28427@xxxxxxxxxxxxxx> <201502192229.FCJ73987.MFQLOHSJFFtOOV@xxxxxxxxxxxxxxxxxxx> <20150220091001.GC21248@xxxxxxxxxxxxxx>
Michal Hocko wrote:
> On Thu 19-02-15 22:29:37, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Thu 19-02-15 06:01:24, Johannes Weiner wrote:
> > > [...]
> > > > Preferrably, we'd get rid of all nofail allocations and replace them
> > > > with preallocated reserves.  But this is not going to happen anytime
> > > > soon, so what other option do we have than resolving this on the OOM
> > > > killer side?
> > > 
> > > As I've mentioned in other email, we might give GFP_NOFAIL allocator
> > > access to memory reserves (by giving it __GFP_HIGH). This is still not a
> > > 100% solution because reserves could get depleted but this risk is there
> > > even with multiple oom victims. I would still argue that this would be a
> > > better approach because selecting more victims might hit pathological
> > > case more easily (other victims might be blocked on the very same lock
> > > e.g.).
> > > 
> > Does "multiple OOM victims" mean "select next if first does not die"?
> > Then, I think my timeout patch 
> > http://marc.info/?l=linux-mm&m=142002495532320&w=2
> > does not deplete memory reserves. ;-)
> 
> It doesn't because
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2603,9 +2603,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>                       alloc_flags |= ALLOC_NO_WATERMARKS;
>               else if (in_serving_softirq() && (current->flags & PF_MEMALLOC))
>                       alloc_flags |= ALLOC_NO_WATERMARKS;
> -             else if (!in_interrupt() &&
> -                             ((current->flags & PF_MEMALLOC) ||
> -                              unlikely(test_thread_flag(TIF_MEMDIE))))
> +             else if (!in_interrupt() && (current->flags & PF_MEMALLOC))
>                       alloc_flags |= ALLOC_NO_WATERMARKS;
> 
> you disabled the TIF_MEMDIE heuristic and use it only for OOM exclusion
> and break out from the allocator. Exiting task might need a memory to do
> so and you make all those allocations fail basically. How do you know
> this is not going to blow up?
> 

Well, treat exiting tasks to imply __GFP_NOFAIL for clean up?

We cannot determine correct task to kill + allow access to memory reserves
based on lock dependency. Therefore, this patch evenly allow no tasks to
access to memory reserves.

Exiting task might need some memory to exit, and not allowing access to
memory reserves can retard exit of that task. But that task will eventually
get memory released by other tasks killed by timeout-based kill-more
mechanism. If no more killable tasks or expired panic-timeout, it is
the same result with depletion of memory reserves.

I think that this situation (automatically making forward progress as if
the administrator is periodically doing SysRq-f until the OOM condition
is solved, or is doing SysRq-c if no more killable tasks or stalled too
long) is better than current situation (not making forward progress since
the exiting task cannot exit due to lock dependency, caused by failing to
determine correct task to kill + allow access to memory reserves).

> > If we change to permit invocation of the OOM killer for GFP_NOFS / GFP_NOIO,
> > those who do not want to fail (e.g. journal transaction) will start passing
> > __GFP_NOFAIL?
> > 

<Prev in Thread] Current Thread [Next in Thread>