xfs
[Top] [All Lists]

[RFC PATCH] xfs: borrow indirect blocks from freed extent when available

To: xfs@xxxxxxxxxxx
Subject: [RFC PATCH] xfs: borrow indirect blocks from freed extent when available
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Tue, 23 Sep 2014 15:28:58 -0400
Delivered-to: xfs@xxxxxxxxxxx
xfs_bmap_del_extent() handles extent removal from the in-core and
on-disk extent lists. When removing a delalloc range, it updates the
indirect block reservation appropriately based on the removal. It
currently enforces that the new indirect block reservation is less than
or equal to the original. This is normally the case in all situations
except for when the removed range creates a hole in a single delalloc
extent, thus splitting a single delalloc extent in two.

The indirect block reservation is divided evenly between the two new
extents in this scenario. However, it is possible with small enough
extents to split an indlen==1 extent into two such slightly smaller
extents. This leaves one extent with 0 indirect blocks and leads to
assert failures in other areas (e.g., xfs_bunmapi() if the extent
happens to be removed).

Refactor xfs_bunmapi() to make the updates that must be consistent
against the inode (e.g., delalloc block counter, quota reservation)
right before the extent is deleted. Move the sb block counter update
after the extent is deleted and update xfs_bmap_del_extent() to steal
some blocks from the freed extent if a larger overall indirect
reservation is required by the extent removal.

Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
---

Hi all,

I'm seeing the following assert more frequently with fsx and the recent
xfs_free_file_space() changes (at least on 1k fsb fs'):

XFS: Assertion failed: startblockval(del.br_startblock) > 0, file: 
fs/xfs/libxfs/xfs_bmap.c, line: 5281

This occurs for the reason described in the commit log description. This
is a quick experiment I wanted to test to verify the problem goes away
(so far, so good). Very lightly tested so far.

I'm not too fond of changing br_blockcount like this. It seems like a
potential landmine. Alternative approaches could be to kill the assert
if we think indlen==0 extents is not a huge problem in this scenario,
update the counters independently in xfs_bmap_del_extent() to get the
needed blocks or pass a separate output parameter rather than messing
with br_blockcount (e.g., '*ofreedblks'). The latter might mean we could
just move the entire hunk that updates the inode/quota and whatnot
rather than splitting it up.

I wanted to put this on the list for comments. Thoughts? Other ideas?
Thanks.

Brian

 fs/xfs/libxfs/xfs_bmap.c | 64 ++++++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 86df952..1858e6b 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4969,6 +4969,11 @@ xfs_bmap_del_extent(
                        temp2 = xfs_bmap_worst_indlen(ip, temp2);
                        new.br_startblock = nullstartblock((int)temp2);
                        da_new = temp + temp2;
+                       /* pull blocks from extent to make up the difference */
+                       while (da_new > da_old && del->br_blockcount) {
+                               del->br_blockcount--;
+                               da_new--;
+                       }
                        while (da_new > da_old) {
                                if (temp) {
                                        temp--;
@@ -5277,9 +5282,37 @@ xfs_bunmapi(
                                goto nodelete;
                        }
                }
+
+               /*
+                * If it's the case where the directory code is running
+                * with no block reservation, and the deleted block is in
+                * the middle of its extent, and the resulting insert
+                * of an extent would cause transformation to btree format,
+                * then reject it.  The calling code will then swap
+                * blocks around instead.
+                * We have to do this now, rather than waiting for the
+                * conversion to btree format, since the transaction
+                * will be dirty.
+                */
+               if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
+                   XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
+                   XFS_IFORK_NEXTENTS(ip, whichfork) >= /* Note the >= */
+                       XFS_IFORK_MAXEXT(ip, whichfork) &&
+                   del.br_startoff > got.br_startoff &&
+                   del.br_startoff + del.br_blockcount <
+                   got.br_startoff + got.br_blockcount) {
+                       error = -ENOSPC;
+                       goto error0;
+               }
+
+               /*
+                * Unreserve quota and update realtime free space, if
+                * appropriate. If delayed allocation, update the inode delalloc
+                * counter now and wait to update the sb counters as
+                * xfs_bmap_del_extent() might need to borrow some blocks.
+                */
                if (wasdel) {
                        ASSERT(startblockval(del.br_startblock) > 0);
-                       /* Update realtime/data freespace, unreserve quota */
                        if (isrt) {
                                xfs_filblks_t rtexts;
 
@@ -5291,8 +5324,6 @@ xfs_bunmapi(
                                        ip, -((long)del.br_blockcount), 0,
                                        XFS_QMOPT_RES_RTBLKS);
                        } else {
-                               xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
-                                               (int64_t)del.br_blockcount, 0);
                                (void)xfs_trans_reserve_quota_nblks(NULL,
                                        ip, -((long)del.br_blockcount), 0,
                                        XFS_QMOPT_RES_REGBLKS);
@@ -5303,32 +5334,17 @@ xfs_bunmapi(
                                        XFS_BTCUR_BPRV_WASDEL;
                } else if (cur)
                        cur->bc_private.b.flags &= ~XFS_BTCUR_BPRV_WASDEL;
-               /*
-                * If it's the case where the directory code is running
-                * with no block reservation, and the deleted block is in
-                * the middle of its extent, and the resulting insert
-                * of an extent would cause transformation to btree format,
-                * then reject it.  The calling code will then swap
-                * blocks around instead.
-                * We have to do this now, rather than waiting for the
-                * conversion to btree format, since the transaction
-                * will be dirty.
-                */
-               if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
-                   XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
-                   XFS_IFORK_NEXTENTS(ip, whichfork) >= /* Note the >= */
-                       XFS_IFORK_MAXEXT(ip, whichfork) &&
-                   del.br_startoff > got.br_startoff &&
-                   del.br_startoff + del.br_blockcount <
-                   got.br_startoff + got.br_blockcount) {
-                       error = -ENOSPC;
-                       goto error0;
-               }
+
                error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
                                &tmp_logflags, whichfork);
                logflags |= tmp_logflags;
                if (error)
                        goto error0;
+
+               if (!isrt && wasdel)
+                       xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
+                               (int64_t) del.br_blockcount, 0);
+
                bno = del.br_startoff - 1;
 nodelete:
                /*
-- 
1.8.3.1

<Prev in Thread] Current Thread [Next in Thread>