xfs
[Top] [All Lists]

Re: [PATCH 1/5] xfs: track AGs with reclaimable inodes in per-ag radix t

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 1/5] xfs: track AGs with reclaimable inodes in per-ag radix tree
From: Alex Elder <aelder@xxxxxxx>
Date: Thu, 15 Jul 2010 13:01:40 -0500
Cc: XFS Mailing List <xfs@xxxxxxxxxxx>
In-reply-to: <1279154300-2018-2-git-send-email-david@xxxxxxxxxxxxx>
References: <1279154300-2018-1-git-send-email-david@xxxxxxxxxxxxx> <1279154300-2018-2-git-send-email-david@xxxxxxxxxxxxx>
Reply-to: aelder@xxxxxxx
On Thu, 2010-07-15 at 10:38 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=16348
> 
> When the filesystem grows to a large number of allocation groups,
> the summing of recalimable inodes gets expensive. In many cases,
> most AGs won't have any reclaimable inodes and so we are wasting CPU
> time aggregating over these AGs. This is particularly important for
> the inode shrinker that gets called frequently under memory
> pressure.
> 
> To avoid the overhead, track AGs with reclaimable inodes in the
> per-ag radix tree so that we can find all the AGs with reclaimable
> inodes via a simple gang tag lookup. This involves setting the tag
> when the first reclaimable inode is tracked in the AG, and removing
> the tag when the last reclaimable inode is removed from the tree.
> Then the summation process becomes a loop walking the radix tree
> summing AGs with the reclaim tag set.
> 
> This significantly reduces the overhead of scanning - a 6400 AG
> filesystea now only uses about 25% of a cpu in kswapd while slab reclaim
> progresses instead of being permanently stuck at 100% CPU and making little
> progress. Clean filesystems filesystems will see no overhead and the
> overhead only increases linearly with the number of dirty AGs.

Using the same tag represent a perag with reclaimable
inodes as the tag representing an inode is reclaimable
is nicely consistent...

I have a few comments below for your consideration
but they are minor (and don't even require a response).

Overall this looks good.

Reviewed-by: Alex Elder <aelder@xxxxxxx>

> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/linux-2.6/xfs_sync.c  |   71 +++++++++++++++++++++++++++++++++++++----
>  fs/xfs/linux-2.6/xfs_trace.h |    3 ++
>  2 files changed, 67 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
> index 56fed91..51da595 100644
> --- a/fs/xfs/linux-2.6/xfs_sync.c
> +++ b/fs/xfs/linux-2.6/xfs_sync.c
> @@ -133,6 +133,41 @@ restart:
>       return last_error;
>  }
>  
> +/*
> + * Select the next per-ag structure to iterate during the walk. The reclaim
> + * walk is optimised only to walk AGs with reclaimable inodes in them.
> + */
> +static struct xfs_perag *
> +xfs_inode_ag_iter_next_pag(
> +     struct xfs_mount        *mp,
> +     xfs_agnumber_t          *first,
> +     int                     tag)
> +{
> +     struct xfs_perag        *pag = NULL;
> +
> +     if (tag == XFS_ICI_RECLAIM_TAG) {
> +             int found;
> +             int ref;
> +
> +             spin_lock(&mp->m_perag_lock);
> +             found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
> +                             (void **)&pag, *first, 1, tag);
> +             if (found <= 0) {
> +                     spin_unlock(&mp->m_perag_lock);
> +                     return NULL;
> +             }
> +             *first = pag->pag_agno + 1;

Maybe move this below, just before the return.  I.e.:
        if (pag)
                *first = pag->pag_agno + 1;

To me it's slightly clearer that we're just setting
up to search next time with the perag following the
one returned.

> +             /* open coded pag reference increment */

By open-coding here you miss the assertions in xfs_perag_get().
(Though a common inline encapsulating them would have to be
in a header because the two functions reside in different files.)

> +             ref = atomic_inc_return(&pag->pag_ref);
> +             spin_unlock(&mp->m_perag_lock);
> +             trace_xfs_perag_get_reclaim(mp, pag->pag_agno, ref, _RET_IP_);
> +     } else {
> +             pag = xfs_perag_get(mp, *first);
> +             (*first)++;
> +     }
> +     return pag;
> +}
> +
>  int
>  xfs_inode_ag_iterator(
>       struct xfs_mount        *mp,

. . .

<Prev in Thread] Current Thread [Next in Thread>