[Top] [All Lists]

[PATCH 2/2] repair: AGFL rebuild fails if btree split required

To: xfs@xxxxxxxxxxx
Subject: [PATCH 2/2] repair: AGFL rebuild fails if btree split required
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 29 Oct 2014 14:09:04 +1100
Cc: bvowk@xxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1414552144-12627-1-git-send-email-david@xxxxxxxxxxxxx>
References: <1414552144-12627-1-git-send-email-david@xxxxxxxxxxxxx>
From: Dave Chinner <dchinner@xxxxxxxxxx>

In phase 5 we rebuild the freelist btrees, the AGF and the AGFL from
all the free space information we have gathered and resolved during
pahses 3 and 4. If the freespace information is laid out just right,
we end up having to allocate free space for the AGFL blocks.

If the size of the free space we allocate from is larger than the
space we need, then we have to insert the remainder back into the
freespace btree. For the by-size tree, this means we are likely to
be removing a record from one leaf, and then inserting the remainder
- a smaller size - into another leaf.

The issue is that the leaf blocks to the left of the original leaf
block we removed the extent record from are full and hence require a
split to insert the new record. That, of course, requires a free
block in the AGFL to allocate from, and now we have a chicken and
egg situation: there are no free blocks in the AGFL because we are
setting it up.

As a result, setting up the free list silently fails, leaving the
freespace btrees in an inconsistent state and the AGFL in question
empty. When the filesystem is next mounted, the first allocation
from that AGF results in attempting to fix the AGFL, and it then
does exactly the same thing as the repair code, fails to allocate a
block during the split and fails. This results in an immediate
shutdown because the transaction doing the allocation is dirty by
this stage.

The fix for the problem is to make repair handle rebulding the btree
differently. If we leave ispace for a couple of records in each
btree leaf and node, there is never a need for a split to occur when
initially setting up the AGFL. This results in repair doing the
right thing, and hence the runtime problems after mount don't occur.
Further, add error checking the the AGFL setup code and abort repair
if we have a failure to correctly set up the AGFL so we catch this
problem at repair time, not mount time...

Reported-by: Barkley Vowk <bvowk@xxxxxxx>
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
 repair/phase5.c | 44 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/repair/phase5.c b/repair/phase5.c
index d6d3c6d..3d58936 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -335,11 +335,22 @@ finish_cursor(bt_status_t *curs)
+ * We need to leave some free records in the tree for the corner case of
+ * setting up the AGFL. This may require allocation of blocks, and as
+ * such can require insertion of new records into the tree (e.g. moving
+ * a record in the by-count tree when a long extent is shortened). If we
+ * pack the records into the leaves with no slack space, this requires a
+ * leaf split to occur and a block to be allocated from the free list.
+ * If we don't have any blocks on the free list (because we are setting
+ * it up!), then we fail, and the filesystem will fail with the same
+ * failure at runtime. Hence leave a couple of records slack space in
+ * each block to allow immediate modification of the tree without
+ * requiring splits to be done.
+ *
  * XXX(hch): any reason we don't just look at mp->m_alloc_mxr?
 #define XR_ALLOC_BLOCK_MAXRECS(mp, level) \
-                       xfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, \
-                                               (level) == 0)
+       (xfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, (level) == 0) - 2)
  * this calculates a freespace cursor for an ag.
@@ -361,10 +372,6 @@ calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t 
        bt_stat_level_t         *p_lptr;
        extent_tree_node_t      *ext_ptr;
        int                     level;
-       fprintf(stderr,
-               "in init_freespace_cursor, agno = %d\n", agno);
        num_extents = *extents;
        extents_used = 0;
@@ -385,6 +392,13 @@ calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t 
        lptr->num_recs_tot = num_extents;
        level = 1;
+       fprintf(stderr, "%s 0 %d %d %d %d\n", __func__,
+                       lptr->num_blocks,
+                       lptr->num_recs_pb,
+                       lptr->modulo,
+                       lptr->num_recs_tot);
         * if we need more levels, set them up.  # of records
         * per level is the # of blocks in the level below it
@@ -402,6 +416,14 @@ calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t 
                        lptr->num_recs_pb = p_lptr->num_blocks
                                        / lptr->num_blocks;
                        lptr->num_recs_tot = p_lptr->num_blocks;
+                       fprintf(stderr, "%s %d %d %d %d %d\n", __func__,
+                                       level,
+                                       lptr->num_blocks,
+                                       lptr->num_recs_pb,
+                                       lptr->modulo,
+                                       lptr->num_recs_tot);
@@ -546,8 +568,7 @@ calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t 
                                lptr = &btree_curs->level[level];
                                p_lptr = &btree_curs->level[level-1];
                                lptr->num_blocks = howmany(p_lptr->num_blocks,
-                                               XR_ALLOC_BLOCK_MAXRECS(mp,
-                                                               level));
+                                       XR_ALLOC_BLOCK_MAXRECS(mp, level));
                                lptr->modulo = p_lptr->num_blocks
                                                % lptr->num_blocks;
                                lptr->num_recs_pb = p_lptr->num_blocks
@@ -1434,6 +1455,7 @@ build_agf_agfl(xfs_mount_t        *mp,
                xfs_alloc_arg_t args;
                xfs_trans_t     *tp;
                struct xfs_trans_res tres = {0};
+               int             error;
                memset(&args, 0, sizeof(args));
                args.tp = tp = libxfs_trans_alloc(mp, 0);
@@ -1442,8 +1464,12 @@ build_agf_agfl(xfs_mount_t       *mp,
                args.alignment = 1;
                args.pag = xfs_perag_get(mp,agno);
                libxfs_trans_reserve(tp, &tres, XFS_MIN_FREELIST(agf, mp), 0);
-               libxfs_alloc_fix_freelist(&args, 0);
+               error = libxfs_alloc_fix_freelist(&args, 0);
+               if (error) {
+                       do_error(_("failed to fix AGFL on AG %d, error %d\n"),
+                                       agno, error);
+               }
                libxfs_trans_commit(tp, 0);

<Prev in Thread] Current Thread [Next in Thread>