xfs
[Top] [All Lists]

Re: [RFC PATCH] xfs: borrow indirect blocks from freed extent when avail

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: [RFC PATCH] xfs: borrow indirect blocks from freed extent when available
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 24 Sep 2014 07:58:16 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1411500538-6831-1-git-send-email-bfoster@xxxxxxxxxx>
References: <1411500538-6831-1-git-send-email-bfoster@xxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Sep 23, 2014 at 03:28:58PM -0400, Brian Foster wrote:
> xfs_bmap_del_extent() handles extent removal from the in-core and
> on-disk extent lists. When removing a delalloc range, it updates the
> indirect block reservation appropriately based on the removal. It
> currently enforces that the new indirect block reservation is less than
> or equal to the original. This is normally the case in all situations
> except for when the removed range creates a hole in a single delalloc
> extent, thus splitting a single delalloc extent in two.
> 
> The indirect block reservation is divided evenly between the two new
> extents in this scenario. However, it is possible with small enough
> extents to split an indlen==1 extent into two such slightly smaller
> extents. This leaves one extent with 0 indirect blocks and leads to
> assert failures in other areas (e.g., xfs_bunmapi() if the extent
> happens to be removed).

I had long suspected we had an issue in this code, but was never
able to nail down a reproducer that triggered it. Do you have a
reproducer, or did you find this by reading/tracing the code?

> Refactor xfs_bunmapi() to make the updates that must be consistent
> against the inode (e.g., delalloc block counter, quota reservation)
> right before the extent is deleted. Move the sb block counter update
> after the extent is deleted and update xfs_bmap_del_extent() to steal
> some blocks from the freed extent if a larger overall indirect
> reservation is required by the extent removal.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
> 
> Hi all,
> 
> I'm seeing the following assert more frequently with fsx and the recent
> xfs_free_file_space() changes (at least on 1k fsb fs'):
> 
> XFS: Assertion failed: startblockval(del.br_startblock) > 0, file: 
> fs/xfs/libxfs/xfs_bmap.c, line: 5281
> 
> This occurs for the reason described in the commit log description. This
> is a quick experiment I wanted to test to verify the problem goes away
> (so far, so good). Very lightly tested so far.

I suspect it's also the cause of these occasional assert failures
that I see:

XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: 
fs/xfs/xfs_trans.c, line: 327

during delalloc conversion because there wasn't a space reservation
for the blocks allocated (i.e. indlen was zero) and so we overrun
the transaction block reservation.

> I'm not too fond of changing br_blockcount like this. It seems like a
> potential landmine. Alternative approaches could be to kill the assert
> if we think indlen==0 extents is not a huge problem in this scenario,

A delalloc extent always needs a reservation....

> update the counters independently in xfs_bmap_del_extent() to get the
> needed blocks or pass a separate output parameter rather than messing
> with br_blockcount (e.g., '*ofreedblks'). The latter might mean we could
> just move the entire hunk that updates the inode/quota and whatnot
> rather than splitting it up.
> 
> I wanted to put this on the list for comments. Thoughts? Other ideas?
> Thanks.
> 
> Brian
> 
>  fs/xfs/libxfs/xfs_bmap.c | 64 
> ++++++++++++++++++++++++++++++------------------
>  1 file changed, 40 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 86df952..1858e6b 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4969,6 +4969,11 @@ xfs_bmap_del_extent(
>                       temp2 = xfs_bmap_worst_indlen(ip, temp2);
>                       new.br_startblock = nullstartblock((int)temp2);
>                       da_new = temp + temp2;
> +                     /* pull blocks from extent to make up the difference */
> +                     while (da_new > da_old && del->br_blockcount) {
> +                             del->br_blockcount--;
> +                             da_new--;
> +                     }
>                       while (da_new > da_old) {
>                               if (temp) {
>                                       temp--;

So this is the crux of the fix, right? The block stealing from the
deleted extent? I have no objections to doing this given that we
already munge the indirect reservations inteh loop below, but
I think we should make this cleaner and more understandable. i.e.
refactor the indlen adjustment code into a helper, and integrate the
block stealing into the existing adjustment loop?

> @@ -5277,9 +5282,37 @@ xfs_bunmapi(
>                               goto nodelete;
>                       }
>               }
> +
> +             /*
> +              * If it's the case where the directory code is running
> +              * with no block reservation, and the deleted block is in
> +              * the middle of its extent, and the resulting insert
> +              * of an extent would cause transformation to btree format,
> +              * then reject it.  The calling code will then swap
> +              * blocks around instead.
> +              * We have to do this now, rather than waiting for the
> +              * conversion to btree format, since the transaction
> +              * will be dirty.
> +              */
> +             if (!wasdel && xfs_trans_get_block_res(tp) == 0 &&
> +                 XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
> +                 XFS_IFORK_NEXTENTS(ip, whichfork) >= /* Note the >= */
> +                     XFS_IFORK_MAXEXT(ip, whichfork) &&
> +                 del.br_startoff > got.br_startoff &&
> +                 del.br_startoff + del.br_blockcount <
> +                 got.br_startoff + got.br_blockcount) {
> +                     error = -ENOSPC;
> +                     goto error0;
> +             }
> +
> +             /*
> +              * Unreserve quota and update realtime free space, if
> +              * appropriate. If delayed allocation, update the inode delalloc
> +              * counter now and wait to update the sb counters as
> +              * xfs_bmap_del_extent() might need to borrow some blocks.
> +              */
>               if (wasdel) {
>                       ASSERT(startblockval(del.br_startblock) > 0);
> -                     /* Update realtime/data freespace, unreserve quota */
>                       if (isrt) {
>                               xfs_filblks_t rtexts;
>  

Perhaps this should be separated into it's own patch that is applied
first? It doesn't seem to be directly related to the accounting fix...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>