xfs
[Top] [All Lists]

Re: How to handle TIF_MEMDIE stalls?

To: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: How to handle TIF_MEMDIE stalls?
From: Michal Hocko <mhocko@xxxxxxx>
Date: Fri, 20 Feb 2015 13:38:55 +0100
Cc: hannes@xxxxxxxxxxx, david@xxxxxxxxxxxxx, dchinner@xxxxxxxxxx, linux-mm@xxxxxxxxx, rientjes@xxxxxxxxxx, oleg@xxxxxxxxxx, akpm@xxxxxxxxxxxxxxxxxxxx, mgorman@xxxxxxx, torvalds@xxxxxxxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, fernando_b1@xxxxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <201502202120.GHE87026.OFSHLFFOJMVtOQ@xxxxxxxxxxxxxxxxxxx>
References: <20150218121602.GC4478@xxxxxxxxxxxxxx> <20150219110124.GC15569@xxxxxxxxxxxxxxxxxxxxxx> <20150219122914.GH28427@xxxxxxxxxxxxxx> <201502192229.FCJ73987.MFQLOHSJFFtOOV@xxxxxxxxxxxxxxxxxxx> <20150220091001.GC21248@xxxxxxxxxxxxxx> <201502202120.GHE87026.OFSHLFFOJMVtOQ@xxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
On Fri 20-02-15 21:20:58, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Thu 19-02-15 22:29:37, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Thu 19-02-15 06:01:24, Johannes Weiner wrote:
> > > > [...]
> > > > > Preferrably, we'd get rid of all nofail allocations and replace them
> > > > > with preallocated reserves.  But this is not going to happen anytime
> > > > > soon, so what other option do we have than resolving this on the OOM
> > > > > killer side?
> > > > 
> > > > As I've mentioned in other email, we might give GFP_NOFAIL allocator
> > > > access to memory reserves (by giving it __GFP_HIGH). This is still not a
> > > > 100% solution because reserves could get depleted but this risk is there
> > > > even with multiple oom victims. I would still argue that this would be a
> > > > better approach because selecting more victims might hit pathological
> > > > case more easily (other victims might be blocked on the very same lock
> > > > e.g.).
> > > > 
> > > Does "multiple OOM victims" mean "select next if first does not die"?
> > > Then, I think my timeout patch 
> > > http://marc.info/?l=linux-mm&m=142002495532320&w=2
> > > does not deplete memory reserves. ;-)
> > 
> > It doesn't because
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2603,9 +2603,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >                     alloc_flags |= ALLOC_NO_WATERMARKS;
> >             else if (in_serving_softirq() && (current->flags & PF_MEMALLOC))
> >                     alloc_flags |= ALLOC_NO_WATERMARKS;
> > -           else if (!in_interrupt() &&
> > -                           ((current->flags & PF_MEMALLOC) ||
> > -                            unlikely(test_thread_flag(TIF_MEMDIE))))
> > +           else if (!in_interrupt() && (current->flags & PF_MEMALLOC))
> >                     alloc_flags |= ALLOC_NO_WATERMARKS;
> > 
> > you disabled the TIF_MEMDIE heuristic and use it only for OOM exclusion
> > and break out from the allocator. Exiting task might need a memory to do
> > so and you make all those allocations fail basically. How do you know
> > this is not going to blow up?
> > 
> 
> Well, treat exiting tasks to imply __GFP_NOFAIL for clean up?
> 
> We cannot determine correct task to kill + allow access to memory reserves
> based on lock dependency. Therefore, this patch evenly allow no tasks to
> access to memory reserves.
> 
> Exiting task might need some memory to exit, and not allowing access to
> memory reserves can retard exit of that task. But that task will eventually
> get memory released by other tasks killed by timeout-based kill-more
> mechanism. If no more killable tasks or expired panic-timeout, it is
> the same result with depletion of memory reserves.
> 
> I think that this situation (automatically making forward progress as if
> the administrator is periodically doing SysRq-f until the OOM condition
> is solved, or is doing SysRq-c if no more killable tasks or stalled too
> long) is better than current situation (not making forward progress since
> the exiting task cannot exit due to lock dependency, caused by failing to
> determine correct task to kill + allow access to memory reserves).

If you really believe this is an improvement then send a proper patch
with justification. But I am _really_ skeptical about such a change to
be honest.
-- 
Michal Hocko
SUSE Labs

<Prev in Thread] Current Thread [Next in Thread>