[Top] [All Lists]

[PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct recl

To: Linux-MM <linux-mm@xxxxxxxxx>
Subject: [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim
From: Mel Gorman <mgorman@xxxxxxx>
Date: Thu, 21 Jul 2011 17:28:43 +0100
Cc: LKML <linux-kernel@xxxxxxxxxxxxxxx>, XFS <xfs@xxxxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Johannes Weiner <jweiner@xxxxxxxxxx>, Wu Fengguang <fengguang.wu@xxxxxxxxx>, Jan Kara <jack@xxxxxxx>, Rik van Riel <riel@xxxxxxxxxx>, Minchan Kim <minchan.kim@xxxxxxxxx>, Mel Gorman <mgorman@xxxxxxx>
In-reply-to: <1311265730-5324-1-git-send-email-mgorman@xxxxxxx>
References: <1311265730-5324-1-git-send-email-mgorman@xxxxxxx>
From: Mel Gorman <mel@xxxxxxxxx>

When kswapd is failing to keep zones above the min watermark, a process
will enter direct reclaim in the same manner kswapd does. If a dirty
page is encountered during the scan, this page is written to backing
storage using mapping->writepage.

This causes two problems. First, it can result in very deep call
stacks, particularly if the target storage or filesystem are complex.
Some filesystems ignore write requests from direct reclaim as a result.
The second is that a single-page flush is inefficient in terms of IO.
While there is an expectation that the elevator will merge requests,
this does not always happen. Quoting Christoph Hellwig;

        The elevator has a relatively small window it can operate on,
        and can never fix up a bad large scale writeback pattern.

This patch prevents direct reclaim writing back filesystem pages by
checking if current is kswapd. Anonymous pages are still written to
swap as there is not the equivalent of a flusher thread for anonymous
pages. If the dirty pages cannot be written back, they are placed
back on the LRU lists. There is now a direct dependency on dirty page
balancing to prevent too many pages in the system being dirtied which
would prevent reclaim making forward progress.

Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
 include/linux/mmzone.h |    1 +
 mm/vmscan.c            |    9 +++++++++
 mm/vmstat.c            |    1 +
 3 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..b70a0c0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,7 @@ enum zone_stat_item {
        NR_UNSTABLE_NFS,        /* NFS unstable pages */
        NR_WRITEBACK_TEMP,      /* Writeback using temporary buffers */
        NR_ISOLATED_ANON,       /* Temporary isolated pages from anon lru */
        NR_ISOLATED_FILE,       /* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5ed24b9..ee00c94 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head 
                if (PageDirty(page)) {
+                       /*
+                        * Only kswapd can writeback filesystem pages to
+                        * avoid risk of stack overflow
+                        */
+                       if (page_is_file_cache(page) && !current_is_kswapd()) {
+                               inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+                               goto keep_locked;
+                       }
                        if (references == PAGEREF_RECLAIM_CLEAN)
                                goto keep_locked;
                        if (!may_enter_fs)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 20c18b7..fd109f3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,6 +702,7 @@ const char * const vmstat_text[] = {
+       "nr_vmscan_write_skip",

<Prev in Thread] Current Thread [Next in Thread>