[Top] [All Lists]

[PATCH 36/37] repair: Increase default repair parallelism on large files

To: xfs@xxxxxxxxxxx
Subject: [PATCH 36/37] repair: Increase default repair parallelism on large filesystems
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 6 Nov 2013 12:07:22 +1100
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1383700043-32305-1-git-send-email-david@xxxxxxxxxxxxx>
References: <1383700043-32305-1-git-send-email-david@xxxxxxxxxxxxx>
From: Dave Chinner <dchinner@xxxxxxxxxx>

Large filesystems or high AG count filesystems generally have more
inherent parallelism in the backing storage. We shoul dmake use of
this by default to speed up repair times. Make xfs_repair use an
"auto-stride" configuration on filesystems with enough AGs to be
considered "multidisk" configurations.

This difference in elaspsed time to repair a 100TB filesystem with
50 million inodes in it with all metadata in flash is:

                Time    IOPS    BW      CPU     RAM
vanilla:        2719s    2900    55MB/s  25%    0.95GB
patched:         908s   varied  varied  varied  2.33GB

With the patched kernel, there were IO peaks of over 1.3GB/s during
AG scanning. Some phases now run at noticably different speeds
        - phase 3 ran at ~180% CPU, 18,000 IOPS and 130MB/s,
        - phase 4 ran at ~280% CPU, 12,000 IOPS and 100MB/s
        - the other phases were similar to the vanilla repair.

Memory usage is increased because of the increased buffer cache
size as a result of concurrent AG scanning using it.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
 repair/xfs_repair.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 78f8363..a863337 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -614,6 +614,23 @@ main(int argc, char **argv)
        inodes_per_cluster = MAX(mp->m_sb.sb_inopblock,
                        XFS_INODE_CLUSTER_SIZE(mp) >> mp->m_sb.sb_inodelog);
+       /*
+        * Automatic striding for high agcount filesystems.
+        *
+        * More AGs indicates that the filesystem is either large or can handle
+        * more IO parallelism. Either way, we should try to process multiple
+        * AGs at a time in such a configuration to try to saturate the
+        * underlying storage and speed the repair process. Only do this if
+        * prefetching is enabled.
+        *
+        * Given mkfs defaults for 16AGs for "multidisk" configurations, we want
+        * to target these for an increase in thread count. Hence a stride value
+        * of 15 is chosen to ensure we get at least 2 AGs being scanned at once
+        * on such filesystems.
+        */
+       if (!ag_stride && glob_agcount >= 16 && do_prefetch)
+               ag_stride = 15;
        if (ag_stride) {
                thread_count = (glob_agcount + ag_stride - 1) / ag_stride;

<Prev in Thread] Current Thread [Next in Thread>