[Top] [All Lists]

Re: How to handle TIF_MEMDIE stalls?

To: Theodore Ts'o <tytso@xxxxxxx>
Subject: Re: How to handle TIF_MEMDIE stalls?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 5 Mar 2015 10:17:40 +1100
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>, mhocko@xxxxxxx, dchinner@xxxxxxxxxx, linux-mm@xxxxxxxxx, rientjes@xxxxxxxxxx, oleg@xxxxxxxxxx, mgorman@xxxxxxx, torvalds@xxxxxxxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150304173841.GB15669@xxxxxxxxx>
References: <20150221235227.GA25079@xxxxxxxxxxxxxxxxxxxxxx> <20150223004521.GK12722@dastard> <20150222172930.6586516d.akpm@xxxxxxxxxxxxxxxxxxxx> <20150223073235.GT4251@dastard> <20150302202228.GA15089@xxxxxxxxxxxxxxxxxxxxxx> <20150302231206.GK18360@dastard> <20150303025023.GA22453@xxxxxxxxxxxxxxxxxxxxxx> <20150304065242.GR18360@dastard> <20150304150436.GA16442@xxxxxxxxxxxxxxxxxxxxxx> <20150304173841.GB15669@xxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Mar 04, 2015 at 12:38:41PM -0500, Theodore Ts'o wrote:
> On Wed, Mar 04, 2015 at 10:04:36AM -0500, Johannes Weiner wrote:
> > Yes, we can make this work if you can tell us which allocations have
> > limited/controllable lifetime.
> It may be helpful to be a bit precise about definitions here.  There
> are a number of different object lifetimes:
> a) will be released before the kernel thread returns control to
> userspace
> b) will be released once the current I/O operation finishes.  (In the
> case of nbd where the remote server has unexpectedy gone away might be
> quite a while, but I'm not sure how much we care about that scenario)
> c) can be trivially released if the mm subsystem asks via calling a
> shrinker
> d) can be released only after doing some amount of bounded work (i.e.,
> cleaning a dirty page)
> e) impossible to predict when it can be released (e.g., dcache, inodes
> attached to an open file descriptors, buffer heads that won't be freed
> until the file system is umounted, etc.)
> I'm guessing that what you mean is (b), but what about cases such as
> (c)?

The thing is, in the XFS transaction case we are hitting e) for
every allocation, and only after IO and/or some processing do we
know whether it will fall into c), d) or whether it will be
permanently consumed.

> Would the mm subsystem find it helpful if it had more information
> about object lifetime?  For example, the CMA folks seem to really care
> about know whether memory allocations falls in category (e) or not.

The problem is that most filesystem allocations fall into category
(e). Worse is that the state of an object can change without
allocations having taken place e.g. an object on a reclaimable LRU
can be found via a cache lookup, then joined to and modified in a
transaction. Hence objects can change state from "reclaimable" to
"permanently consumed" without actually going through memory reclaim
and allocation.

IOWs, what is really required is the ability to say "this amount of
allocation reserve is now consumed" /some time after/ we've done the
allocation. i.e. when we join the object to the transaction and
modify it, that's when we need to be able to reduce the reservation
limit as that memory is now permanently consumed by the transaction
context. Objects that fall into c) and d) don't need to have anyting
special done, because reclaim will eventually free the memory they
hold once the allocating context releases them.

Indeed, this model works even when we find those c) and d) objects
in cache rather than allocating them. They would get correctly
accounted as "consumed reserve" because we no longer need to
allocate that memory in transaction context and so that reserve can
be released back to the free pool....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>