xfs
[Top] [All Lists]

Re: [PATCH] xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctl

To: Brian Foster <bfoster@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Subject: Re: [PATCH] xfs: refactor xfs_reserve_blocks() to handle ENOSPC correctly
From: Paulo Alcantara <pcacjr@xxxxxxxxx>
Date: Tue, 14 Jun 2016 10:32:57 -0300
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1465909139-36329-1-git-send-email-bfoster@xxxxxxxxxx>
References: <1465909139-36329-1-git-send-email-bfoster@xxxxxxxxxx>
User-agent: K-9 Mail for Android

On June 14, 2016 9:58:59 AM GMT-03:00, Brian Foster <bfoster@xxxxxxxxxx> wrote:
>xfs_reserve_blocks() is responsible to update the XFS reserved block
>pool count at mount time or based on user request. When the caller
>requests to increase the reserve pool, blocks must be allocated from
>the
>global counters such that they are no longer available for general
>purpose use. If the requested reserve pool size is too large, XFS
>reserves what blocks are available. The implementation requires looking
>at the percpu counters and making an educated guess as to how many
>blocks to try and allocate from xfs_mod_fdblocks(), which can return
>-ENOSPC if the guess was not accurate due to counters being modified in
>parallel.
>
>xfs_reserve_blocks() retries the guess in this scenario until the
>allocation succeeds or it is determined that there is no space
>available
>in the fs. While not easily reproducible in the current form, the retry
>code doesn't actually work correctly if xfs_mod_fdblocks() actually
>fails. The problem is that the percpu calculations use the m_resblks
>counter to determine how many blocks to allocate, but unconditionally
>update m_resblks before the block allocation has actually succeeded.
>Therefore, if xfs_mod_fdblocks() fails, the code jumps to the retry
>label and uses the already updated m_resblks value to determine how
>many
>blocks to try and allocate. If the percpu counters previously suggested
>that the entire request was available, fdblocks_delta could end up set
>to 0. In that case, m_resblks is updated to the requested value, yet no
>blocks have been reserved at all.
>
>Refactor xfs_reserve_blocks() to use an explicit loop and make the code
>easier to follow. Since we have to drop the spinlock across the
>xfs_mod_fdblocks() call, use a delta value for m_resblks as well and
>only apply the delta once allocation succeeds.
>
>Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
>---
>
>This is something I had laying around from the thin block device
>reservation stuff. That work introduced a more common
>xfs_mod_fdblocks()
>failure scenario that isn't as much of a problem with the current code,
>but the current xfs_reserve_blocks() retry code is clearly broken and
>so
>should probably be fixed up.
>
>Brian
>
>fs/xfs/xfs_fsops.c | 105
>++++++++++++++++++++++++++++++-----------------------
> 1 file changed, 60 insertions(+), 45 deletions(-)
>
>diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
>index b4d7582..003d180 100644
>--- a/fs/xfs/xfs_fsops.c
>+++ b/fs/xfs/xfs_fsops.c
>@@ -667,8 +667,11 @@ xfs_reserve_blocks(
>       __uint64_t              *inval,
>       xfs_fsop_resblks_t      *outval)
> {
>-      __int64_t               lcounter, delta, fdblks_delta;
>+      __int64_t               lcounter, delta;
>+      __int64_t               fdblks_delta = 0;
>       __uint64_t              request;
>+      __int64_t               free;
>+      int                     error = 0;
> 
>       /* If inval is null, report current values and return */
>       if (inval == (__uint64_t *)NULL) {
>@@ -682,24 +685,23 @@ xfs_reserve_blocks(
>       request = *inval;
> 
>       /*
>-       * With per-cpu counters, this becomes an interesting
>-       * problem. we needto work out if we are freeing or allocation
>-       * blocks first, then we can do the modification as necessary.
>+       * With per-cpu counters, this becomes an interesting problem. we
>need
>+       * to work out if we are freeing or allocation blocks first, then we
>can
>+       * do the modification as necessary.
>        *
>-       * We do this under the m_sb_lock so that if we are near
>-       * ENOSPC, we will hold out any changes while we work out
>-       * what to do. This means that the amount of free space can
>-       * change while we do this, so we need to retry if we end up
>-       * trying to reserve more space than is available.
>+       * We do this under the m_sb_lock so that if we are near ENOSPC, we
>will
>+       * hold out any changes while we work out what to do. This means that
>+       * the amount of free space can change while we do this, so we need
>to
>+       * retry if we end up trying to reserve more space than is available.
>        */
>-retry:
>       spin_lock(&mp->m_sb_lock);
> 
>       /*
>        * If our previous reservation was larger than the current value,
>-       * then move any unused blocks back to the free pool.
>+       * then move any unused blocks back to the free pool. Modify the
>resblks
>+       * counters directly since we shouldn't have any problems unreserving
>+       * space.
>        */
>-      fdblks_delta = 0;
>       if (mp->m_resblks > request) {
>               lcounter = mp->m_resblks_avail - request;
>               if (lcounter  > 0) {            /* release unused blocks */
>@@ -707,54 +709,67 @@ retry:
>                       mp->m_resblks_avail -= lcounter;
>               }
>               mp->m_resblks = request;
>-      } else {
>-              __int64_t       free;
>+              if (fdblks_delta) {
>+                      spin_unlock(&mp->m_sb_lock);
>+                      error = xfs_mod_fdblocks(mp, fdblks_delta, 0);
>+                      spin_lock(&mp->m_sb_lock);
>+              }
>+
>+              goto out;
>+      }
> 
>+      /*
>+       * If the request is larger than the current reservation, reserve the
>+       * blocks before we update the reserve counters. Sample m_fdblocks
>and
>+       * perform a partial reservation if the request exceeds free space.
>+       */
>+      error = -ENOSPC;
>+      while (error == -ENOSPC) {

Why don't you make this a "do { } while (error == -ENOSPC)"? xfs_mod_fdblocks() 
will already set the error at the end of that loop.

Paulo

-- 
Paulo Alcantara, HP
Speaking for myself only.

<Prev in Thread] Current Thread [Next in Thread>