[Top] [All Lists]

[PATCH 2/2] xfs: kick inode writeback when low on memory

To: xfs@xxxxxxxxxxx
Subject: [PATCH 2/2] xfs: kick inode writeback when low on memory
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 7 Apr 2011 16:19:56 +1000
Cc: linux-fsdevel@xxxxxxxxxxxxxxx
In-reply-to: <1302157196-1988-1-git-send-email-david@xxxxxxxxxxxxx>
References: <1302157196-1988-1-git-send-email-david@xxxxxxxxxxxxx>
From: Dave Chinner <dchinner@xxxxxxxxxx>

When the inode cache shrinker runs, we may have lots of dirty inodes queued up
in the VFS dirty queues that have not been expired. The typical case for this
with XFS is atime updates. The result is that a highly concurrent workload that
copies files and then later reads them (say to verify checksums) dirties all
the inodes again, even when relatime is used.

In a constrained memory environment, this results in a large number of dirty
inodes using all of available memory and memory reclaim being unable to free
them as dirty inodes areconsidered active. This problem was uncovered by Chris
Mason during recent low memory stress testing.

The fix is to trigger VFS level writeback from the XFS inode cache shrinker if
there isn't already writeback in progress. This ensures that when we enter a
low memory situation we start cleaning inodes (via the flusher thread) on the
filesystem immediately, thereby making it more likely that we will be able to
evict those dirty inodes from the VFS in the near future.

The mechanism is not perfect - it only acts on the current filesystem, so if
all the dirty inodes are on a different filesystem it won't help. However, it
seems to be a valid assumption is that the filesystem with lots of dirty inodes
is going to have the shrinker called very soon after the memory shortage
begins, so this shouldn't be an issue.

The other flaw is that there is no guarantee that the flusher thread will make
progress fast enough to clean the dirty inodes so they can be reclaimed in the
near future. However, this mechanism does improve the resilience of the
filesystem under the test conditions - instead of reliably triggering the OOM
killer 20 minutes into the stress test, it took more than 6 hours before it

This small addition definitely improves the low memory resilience of XFS on
this type of workload, and best of all it has no impact on performance when
memory is not constrained.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
 fs/xfs/linux-2.6/xfs_sync.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 9ad9560..c240d46 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -1038,6 +1038,17 @@ xfs_reclaim_inode_shrink(
                if (!(gfp_mask & __GFP_FS))
                        return -1;
+               /*
+                * make sure VFS is cleaning inodes so they can be pruned
+                * and marked for reclaim in the XFS inode cache. If we don't
+                * do this the VFS can accumulate dirty inodes and we can OOM
+                * before they are cleaned by the periodic VFS writeback.
+                *
+                * This takes VFS level locks, so we can only do this after
+                * the __GFP_FS checks otherwise lockdep gets really unhappy.
+                */
+               writeback_inodes_sb_nr_if_idle(mp->m_super, nr_to_scan);
                xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
                /* terminate if we don't exhaust the scan */

<Prev in Thread] Current Thread [Next in Thread>