On Mon, 2010-10-04 at 21:13 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
>
> A long standing problem for streaming writeѕ through the NFS server
> has been that the NFS server opens and closes file descriptors on an
> inode for every write. The result of this behaviour is that the
> ->release() function is called on every close and that results in
> XFS truncating speculative preallocation beyond the EOF. This has
> an adverse effect on file layout when multiple files are being
> written at the same time - they interleave their extents and can
> result in severe fragmentation.
>
> To avoid this problem, keep a count of the number of ->release calls
> made on an inode. For most cases, an inode is only going to be opened
> once for writing and then closed again during it's lifetime in
> cache. Hence if there are multiple ->release calls, there is a good
> chance that the inode is being accessed by the NFS server. Hence
> count up every time ->release is called while there are delalloc
> blocks still outstanding on the inode.
>
> If this count is non-zero when ->release is next called, then do no
> truncate away the speculative preallocation - leave it there so that
> subsequent writes do not need to reallocate the delalloc space. This
> will prevent interleaving of extents of different inodes written
> concurrently to the same AG.
>
> If we get this wrong, it is not a big deal as we truncate
> speculative allocation beyond EOF anyway in xfs_inactive() when the
> inode is thrown out of the cache.
>
> The new counter in the struct xfs_inode fits into a hole in the
> structure on 64 bit machines, so does not grow the size of the inode
> at all.
This seems reasonable, and I have no real objection to
it. However, I have a question and a comment related
to the affected code (and not your specific change).
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
> fs/xfs/xfs_inode.h | 1 +
> fs/xfs/xfs_vnodeops.c | 15 ++++++++++++++-
> 2 files changed, 15 insertions(+), 1 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 1594190..82aad5e 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -261,6 +261,7 @@ typedef struct xfs_inode {
> xfs_fsize_t i_size; /* in-memory size */
> xfs_fsize_t i_new_size; /* size when write completes */
> atomic_t i_iocount; /* outstanding I/O count */
> + int i_dirty_releases; /* dirty ->release calls */
>
> /* VFS inode */
> struct inode i_vnode; /* embedded VFS inode */
> diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
> index b7bdc43..0c8eeba 100644
> --- a/fs/xfs/xfs_vnodeops.c
> +++ b/fs/xfs/xfs_vnodeops.c
OK, this comment is unrelated to your exact change. But just above
the next hunk there's a big nasty condition, which appears to
be *almost* duplicated in xfs_inactive() (twice!). It would be
very nice if, while you're at modifying this nearby code, you
could encapsulate that condition in a macro that has a meaningful
name.
> @@ -979,14 +979,27 @@ xfs_release(
> * chance to drop them once the last reference to
> * the inode is dropped, so we'll never leak blocks
> * permanently.
I'm curious what the effect is if we simply don't do the truncate
*except* when the inode becomes inactive. It means we hang onto
the stuff for a while longer, and maybe it makes things messier
in the event of a crash. Can you tell me why we do the truncate
here as well as in xfs_inactive() (or what the problem is of
*not* doing it here)?
> + *
> + * Further, count the number of times we get here in
> + * the life of this inode. If the inode is being
> + * opened, written and closed frequently and we have
> + * delayed allocation blocks oustanding (e.g. streaming
> + * writes from the NFS server), truncating the
> + * blocks past EOF will cause fragmentation to occur.
> + * In this case don't do the truncation, either.
> */
> + if (ip->i_delayed_blks)
> + ip->i_dirty_releases++;
> + if (ip->i_dirty_releases > 1)
> + goto out;
> +
> error = xfs_free_eofblocks(mp, ip,
> XFS_FREE_EOF_TRYLOCK);
> if (error)
> return error;
> }
> }
> -
> +out:
> return 0;
> }
>
|