xfs
[Top] [All Lists]

Re: [PATCH 04/60] xfs: don't use speculative prealloc for small files

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 04/60] xfs: don't use speculative prealloc for small files
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Wed, 19 Jun 2013 08:59:58 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1371617468-32559-5-git-send-email-david@xxxxxxxxxxxxx>
References: <1371617468-32559-1-git-send-email-david@xxxxxxxxxxxxx> <1371617468-32559-5-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6
On 06/19/2013 12:50 AM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> Dedicated small file workloads have been seeing significant free
> space fragmentation causing premature inode allocation failure
> when large inode sizes are in use. A particular test case showed
> that a workload that runs to a real ENOSPC on 256 byte inodes would
> fail inode allocation with ENOSPC about about 80% full with 512 byte
> inodes, and at about 50% full with 1024 byte inodes.
> 
> The same workload, when run with -o allocsize=4096 on 1024 byte
> inodes would run to being 100% full before giving ENOSPC. That is,
> no freespace fragmentation at all.
> 
> The issue was caused by the specific IO pattern the application had
> - the framework it was using did not support direct IO, and so it
> was emulating it by using fadvise(DONT_NEED). The result was that
> the data was getting written back before the speculative prealloc
> had been trimmed from memory by the close(), and so small single
> block files were being allocated with 2 blocks, and then having one
> truncated away. The result was lots of small 4k free space extents,
> and hence each new 8k allocation would take another 8k from
> contiguous free space and turn it into 4k of allocated space and 4k
> of free space.
> 
> Hence inode allocation, which requires contiguous, aligned
> allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k
> (1024 byte inodes) can fail to find sufficiently large freespace and
> hence fail while there is still lots of free space available.
> 
> There's a simple fix for this, and one that has precendence in the
> allocator code already - don't do speculative allocation unless the
> size of the file is larger than a certain size. In this case, that
> size is the minimum default preallocation size:
> mp->m_writeio_blocks. And to keep with the concept of being nice to
> people when the files are still relatively small, cap the prealloc
> to mp->m_writeio_blocks until the file goes over a stripe unit is
> size, at which point we'll fall back to the current behaviour based
> on the last extent size.
> 
> This will effectively turn off speculative prealloc for very small
> files, keep preallocation low for small files, and behave as it
> currently does for any file larger than a stripe unit. This
> completely avoids the freespace fragmentation problem this
> particular IO pattern was causing.
> 
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---

Looks good.

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

>  fs/xfs/xfs_iomap.c |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 8f8aaee..6a70964 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -284,6 +284,15 @@ xfs_iomap_eof_want_preallocate(
>               return 0;
>  
>       /*
> +      * If the file is smaller than the minimum prealloc and we are using
> +      * dynamic preallocation, don't do any preallocation at all as it is
> +      * likely this is the only write to the file that is going to be done.
> +      */
> +     if (!(mp->m_flags & XFS_MOUNT_DFLT_IOSIZE) &&
> +         XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_writeio_blocks))
> +             return 0;
> +
> +     /*
>        * If there are any real blocks past eof, then don't
>        * do any speculative allocation.
>        */
> @@ -345,6 +354,10 @@ xfs_iomap_eof_prealloc_initial_size(
>       if (mp->m_flags & XFS_MOUNT_DFLT_IOSIZE)
>               return 0;
>  
> +     /* If the file is small, then use the minimum prealloc */
> +     if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign))
> +             return 0;
> +
>       /*
>        * As we write multiple pages, the offset will always align to the
>        * start of a page and hence point to a hole at EOF. i.e. if the size is
> 

<Prev in Thread] Current Thread [Next in Thread>