On 02/23/2015 08:32 AM, Dave Chinner wrote:
On Sun, Feb 22, 2015 at 05:29:30PM -0800, Andrew Morton wrote:
On Mon, 23 Feb 2015 11:45:21 +1100 Dave Chinner <david@xxxxxxxxxxxxx> wrote:
Yes, as we do for __GFP_HIGH and PF_MEMALLOC etc. Add a dynamic
reserve. So to reserve N pages we increase the page allocator dynamic
reserve by N, do some reclaim if necessary then deposit N tokens into
the caller's task_struct (it'll be a set of zone/nr-pages tuples I
When allocating pages the caller should drain its reserves in
preference to dipping into the regular freelist. This guy has already
done his reclaim and shouldn't be penalised a second time. I guess
Johannes's preallocation code should switch to doing this for the same
reason, plus the fact that snipping a page off
task_struct.prealloc_pages is super-fast and needs to be done sometime
anyway so why not do it by default.
That is at odds with the requirements of demand paging, which
allocate for objects that are reclaimable within the course of the
transaction. The reserve is there to ensure forward progress for
allocations for objects that aren't freed until after the
transaction completes, but if we drain it for reclaimable objects we
then have nothing left in the reserve pool when we actually need it.
We do not know ahead of time if the object we are allocating is
going to modified and hence locked into the transaction. Hence we
can't say "use the reserve for this *specific* allocation", and so
the only guidance we can really give is "we will to allocate and
*permanently consume* this much memory", and the reserve pool needs
to cover that consumption to guarantee forwards progress.
I'm not sure I understand properly. You don't know if a specific
allocation is permanent or reclaimable, but you can tell in advance how
much in total will be permanent? Is it because you are conservative and
assume everything will be permanent, or how?
Can you at least at some later point in transaction recognize that "OK,
this object was not permanent after all" and tell mm that it can lower
Forwards progress for all other allocations is guaranteed because
they are reclaimable objects - they either freed directly back to
their source (slab, heap, page lists) or they are freed by shrinkers
once they have been released from the transaction.
Which are the "all other allocations?" Above you wrote that all
allocations are treated as potentially permanent. Also how does the fact
that an object is later reclaimable, affect forward progress during its
allocation? Or all you talking about allocations from contexts that
don't use reserves?
Hence we need allocations to come from the free list and trigger
reclaim, regardless of the fact there is a reserve pool there. The
reserve pool needs to be a last resort once there are no other
avenues to allocate memory. i.e. it would be used to replace the OOM
killer for GFP_NOFAIL allocations.
That's probably going to result in lot of wasted memory and I still
don't understand why it's needed, if your reserve estimate is guaranteed
to cover the worst-case.
Both reservation and preallocation are vulnerable to deadlocks - 10,000
tasks all trying to reserve/prealloc 100 pages, they all have 50 pages
and we ran out of memory. Whoops.
Yes, that's the big problem with preallocation, as well as your
proposed "depelete the reserved memory first" approach. They
*require* up front "preallocation" of free memory, either directly
by the application, or internally by the mm subsystem.
I don't see why it would deadlock, if during reserve time the mm can
return ENOMEM as the reserver should be able to back out at that point.