xfs
[Top] [All Lists]

Re: [PATCH 2/2] xfs: don't truncate prealloc from frequently accessed in

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 2/2] xfs: don't truncate prealloc from frequently accessed inodes
From: Alex Elder <aelder@xxxxxxx>
Date: Thu, 14 Oct 2010 12:22:50 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <1286187236-16682-3-git-send-email-david@xxxxxxxxxxxxx>
References: <1286187236-16682-1-git-send-email-david@xxxxxxxxxxxxx> <1286187236-16682-3-git-send-email-david@xxxxxxxxxxxxx>
Reply-to: aelder@xxxxxxx
On Mon, 2010-10-04 at 21:13 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> A long standing problem for streaming writeѕ through the NFS server
> has been that the NFS server opens and closes file descriptors on an
> inode for every write. The result of this behaviour is that the
> ->release() function is called on every close and that results in
> XFS truncating speculative preallocation beyond the EOF.  This has
> an adverse effect on file layout when multiple files are being
> written at the same time - they interleave their extents and can
> result in severe fragmentation.
> 
> To avoid this problem, keep a count of the number of ->release calls
> made on an inode. For most cases, an inode is only going to be opened
> once for writing and then closed again during it's lifetime in
> cache. Hence if there are multiple ->release calls, there is a good
> chance that the inode is being accessed by the NFS server. Hence
> count up every time ->release is called while there are delalloc
> blocks still outstanding on the inode.
> 
> If this count is non-zero when ->release is next called, then do no
> truncate away the speculative preallocation - leave it there so that
> subsequent writes do not need to reallocate the delalloc space. This
> will prevent interleaving of extents of different inodes written
> concurrently to the same AG.
> 
> If we get this wrong, it is not a big deal as we truncate
> speculative allocation beyond EOF anyway in xfs_inactive() when the
> inode is thrown out of the cache.
> 
> The new counter in the struct xfs_inode fits into a hole in the
> structure on 64 bit machines, so does not grow the size of the inode
> at all.

This seems reasonable, and I have no real objection to
it.  However, I have a question and a comment related
to the affected code (and not your specific change).

> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_inode.h    |    1 +
>  fs/xfs/xfs_vnodeops.c |   15 ++++++++++++++-
>  2 files changed, 15 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 1594190..82aad5e 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -261,6 +261,7 @@ typedef struct xfs_inode {
>       xfs_fsize_t             i_size;         /* in-memory size */
>       xfs_fsize_t             i_new_size;     /* size when write completes */
>       atomic_t                i_iocount;      /* outstanding I/O count */
> +     int                     i_dirty_releases; /* dirty ->release calls */
>  
>       /* VFS inode */
>       struct inode            i_vnode;        /* embedded VFS inode */
> diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
> index b7bdc43..0c8eeba 100644
> --- a/fs/xfs/xfs_vnodeops.c
> +++ b/fs/xfs/xfs_vnodeops.c

OK, this comment is unrelated to your exact change.  But just above
the next hunk there's a big nasty condition, which appears to
be *almost* duplicated in xfs_inactive() (twice!).  It would be
very nice if, while you're at modifying this nearby code, you
could encapsulate that condition in a macro that has a meaningful
name.

> @@ -979,14 +979,27 @@ xfs_release(
>                        * chance to drop them once the last reference to
>                        * the inode is dropped, so we'll never leak blocks
>                        * permanently.

I'm curious what the effect is if we simply don't do the truncate
*except* when the inode becomes inactive.  It means we hang onto
the stuff for a while longer, and maybe it makes things messier
in the event of a crash.  Can you tell me why we do the truncate
here as well as in xfs_inactive() (or what the problem is of
*not* doing it here)?

> +                      *
> +                      * Further, count the number of times we get here in
> +                      * the life of this inode. If the inode is being
> +                      * opened, written and closed frequently and we have
> +                      * delayed allocation blocks oustanding (e.g. streaming
> +                      * writes from the NFS server), truncating the
> +                      * blocks past EOF will cause fragmentation to occur.
> +                      * In this case don't do the truncation, either.
>                        */
> +                     if (ip->i_delayed_blks)
> +                             ip->i_dirty_releases++;
> +                     if (ip->i_dirty_releases > 1)
> +                                     goto out;
> +
>                       error = xfs_free_eofblocks(mp, ip,
>                                                  XFS_FREE_EOF_TRYLOCK);
>                       if (error)
>                               return error;
>               }
>       }
> -
> +out:
>       return 0;
>  }
>  



<Prev in Thread] Current Thread [Next in Thread>