On Thu, Aug 18, 2011 at 04:54:28PM -0700, Andrew Morton wrote:
> On Wed, 10 Aug 2011 11:47:19 +0100
> Mel Gorman <mgorman@xxxxxxx> wrote:
> > The percentage that must be in writeback depends on the priority. At
> > default priority, all of them must be dirty. At DEF_PRIORITY-1, 50%
> > of them must be, DEF_PRIORITY-2, 25% etc. i.e. as pressure increases
> > the greater the likelihood the process will get throttled to allow
> > the flusher threads to make some progress.
> It'd be nice if the code comment were to capture this piece of implicit
> arithmetic. After all, it's a magic number and magic numbers should
> stick out like sore thumbs.
> And.. how do we know that the chosen magic numbers were optimal?
Good question. The short answer "we don't know but it's not important
to get this particular decision perfect because the real throttling
should happen earlier".
Now the long answer;
For the value to be used, pages under writeback must be reaching the
end of the LRU. This implies that the rate of page consumption is
exceeding the writing speed of the backing storage. Regardless of
what decision is made, the rate of page allocation must be reduced
as the the system is already in a sub-optimal state of requiring more
resources than are available.
The values are based on a simple expontial backoff function with useful
ranges of DEF_PRIORITY to DEF_PRIORITY-2 which is the point where
"kswapd is getting into trouble". However, any decreasing function
within that range is sufficient because while there might be an optimal
choice, it makes little difference overall as the decision is made
too late with no guarantee the process doing the dirtying is throttled.
The truly optimal decision is to throttle writers to slow storage
earlier in balance_dirty_pages() and have dirty_ratio scaled
proportional to the estimate writeback speed of the underlying storage
but we do not have that yet. This patches throttling decision is
fairly close to the best we can do from reclaim context.