xfs
[Top] [All Lists]

Re: How to handle TIF_MEMDIE stalls?

To: Johannes Weiner <hannes@xxxxxxxxxxx>
Subject: Re: How to handle TIF_MEMDIE stalls?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 18 Feb 2015 10:32:43 +1100
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>, dchinner@xxxxxxxxxx, oleg@xxxxxxxxxx, xfs@xxxxxxxxxxx, mhocko@xxxxxxx, linux-mm@xxxxxxxxx, mgorman@xxxxxxx, rientjes@xxxxxxxxxx, akpm@xxxxxxxxxxxxxxxxxxxx, torvalds@xxxxxxxxxxxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150217225430.GJ4251@dastard>
References: <20141230112158.GA15546@xxxxxxxxxxxxxx> <201502092044.JDG39081.LVFOOtFHQFOMSJ@xxxxxxxxxxxxxxxxxxx> <201502102258.IFE09888.OVQFJOMSFtOLFH@xxxxxxxxxxxxxxxxxxx> <20150210151934.GA11212@xxxxxxxxxxxxxxxxxxxxxx> <201502111123.ICD65197.FMLOHSQJFVOtFO@xxxxxxxxxxxxxxxxxxx> <201502172123.JIE35470.QOLMVOFJSHOFFt@xxxxxxxxxxxxxxxxxxx> <20150217125315.GA14287@xxxxxxxxxxxxxxxxxxxxxx> <20150217225430.GJ4251@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Feb 18, 2015 at 09:54:30AM +1100, Dave Chinner wrote:
> On Tue, Feb 17, 2015 at 07:53:15AM -0500, Johannes Weiner wrote:
> > On Tue, Feb 17, 2015 at 09:23:26PM +0900, Tetsuo Handa wrote:
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -2381,9 +2381,6 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int 
> > > order,
> > >                 /* The OOM killer does not needlessly kill tasks for 
> > > lowmem */
> > >                 if (high_zoneidx < ZONE_NORMAL)
> > >                         goto out;
> > > -               /* The OOM killer does not compensate for light reclaim */
> > > -               if (!(gfp_mask & __GFP_FS))
> > > -                       goto out;
> > >                 /*
> > >                  * GFP_THISNODE contains __GFP_NORETRY and we never hit 
> > > this.
> > >                  * Sanity check for bare calls of __GFP_THISNODE, not 
> > > real OOM.
> > 
> > Again, we don't want to OOM kill on behalf of allocations that can't
> > initiate IO, or even actively prevent others from doing it.  Not per
> > default anyway, because most callers can deal with the failure without
> > having to resort to killing tasks, and NOFS reclaim *can* easily fail.
> > It's the exceptions that should be annotated instead:
> > 
> > void *
> > kmem_alloc(size_t size, xfs_km_flags_t flags)
> > {
> >     int     retries = 0;
> >     gfp_t   lflags = kmem_flags_convert(flags);
> >     void    *ptr;
> > 
> >     do {
> >             ptr = kmalloc(size, lflags);
> >             if (ptr || (flags & (KM_MAYFAIL|KM_NOSLEEP)))
> >                     return ptr;
> >             if (!(++retries % 100))
> >                     xfs_err(NULL,
> >             "possible memory allocation deadlock in %s (mode:0x%x)",
> >                                     __func__, lflags);
> >             congestion_wait(BLK_RW_ASYNC, HZ/50);
> >     } while (1);
> > }
> > 
> > This should use __GFP_NOFAIL, which is not only designed to annotate
> > broken code like this, but also recognizes that endless looping on a
> > GFP_NOFS allocation needs the OOM killer after all to make progress.
> > 
> > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > index a7a3a63bb360..17ced1805d3a 100644
> > --- a/fs/xfs/kmem.c
> > +++ b/fs/xfs/kmem.c
> > @@ -45,20 +45,12 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t 
> > maxsize)
> >  void *
> >  kmem_alloc(size_t size, xfs_km_flags_t flags)
> >  {
> > -   int     retries = 0;
> >     gfp_t   lflags = kmem_flags_convert(flags);
> > -   void    *ptr;
> >  
> > -   do {
> > -           ptr = kmalloc(size, lflags);
> > -           if (ptr || (flags & (KM_MAYFAIL|KM_NOSLEEP)))
> > -                   return ptr;
> > -           if (!(++retries % 100))
> > -                   xfs_err(NULL,
> > -           "possible memory allocation deadlock in %s (mode:0x%x)",
> > -                                   __func__, lflags);
> > -           congestion_wait(BLK_RW_ASYNC, HZ/50);
> > -   } while (1);
> > +   if (!(flags & (KM_MAYFAIL | KM_NOSLEEP)))
> > +           lflags |= __GFP_NOFAIL;
> > +
> > +   return kmalloc(size, lflags);
> >  }
> 
> Hmmm - the only reason there is a focus on this loop is that it
> emits warnings about allocations failing. It's obvious that the
> problem being dealt with here is a fundamental design issue w.r.t.
> to locking and the OOM killer, but the proposed special casing
> hack^H^H^H^Hband aid^W^Wsolution is not "working" because some code
> in XFS started emitting warnings about allocations failing more
> often.
>
> So the answer is to remove the warning?  That's like killing the
> canary to stop the methane leak in the coal mine. No canary? No
> problems!

I'll also point out that there are two other identical allocation
loops in XFS, one of which is only 30 lines below this one. That's
further indication that this is a "silence the warning" patch rather
than something that actually fixes a problem....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>