Steve Lord wrote:
> So allocating extents because of memory pressure alone is not really the
> best solution - you could write out some important data and walk away from
> your machine, after a day of being idle the power goes out, and because
> nothing was pushing on memory your data goes bye bye.
> So I suspect even with a flush callout we still need a another mechanism
> to go around pushing on delayed allocate pages.
> As for reservation, we do have a scheme in place at the moment, but it
> needs some work. Probably when requesting a new page we need to tell the
> VM system that it will be allocated delayed alloc.
Rik, the allocate-on-flush may be a little tricky, esp. when triggered
as part of memory pressure ... I'm assuming that its part of shrink_mmap()
path that you're thinking? Perhaps similar to try_to_free_buffers?
Note that delalloc pages in XFS don't even have a buffer_head associated with
these pages are pure data containers without a place in the backing store (yet).
One other issue with allocte-on-flush as triggered by memory pressure is
that allocation may be (and usually is) complicated, esp. in a journaling FS,
involving non-trivial transactions, tail-pushing, etc.
Further, for reasons that Steve points out, the data needs to get out
sooner than by memory pressure alone. To this end, the daemon that
Steve was talking about runs periodically ... right now it is set to
run twice every second.
The scheme we have right now wakes up this daemon (called from shrink_mmap)
to do the allocation (in the back-ground), but doesn't otherwise wait for it ...
Currently, not very many pages can be delayed alloc (upto 25% of memory) ...
My be in the "final" stages of memory pressure the call to wakeup the
daemon can synchronously wait for the page to be allocated-and-flushed.