On Mon, Jun 11, 2012 at 02:29:48PM -0500, Ben Myers wrote:
> Testing has shown that there is a regression when using extent size
> hints with NFS. There are many more extents created than expected.
> This is the workload, from a single NFS client:
> # for e in `seq 1 8`;
> mkdir -p $e; (cd $e; dd if=/dev/zero of=file bs=4096k count=100) &
> # xfs_io -c extsize .
>  .
> Here is the extent count for each file without this patch:
> 385 387 387 387 400 398 379 334
So it is a 400MB file with roughly 400 1MB extents? IOWs, you're
getting one extent per 1MB NFS write over the wire?
Why aren't the extents being allocated adjacently? The allocator is
supposed to taka the last block of the previous adjacent extent as
the hint for the location of the new extent. This result implies
that either the allocator is broken or you are crossing the streams.
How many AGs do you have in your filesystem?
I just ran that test with a filesystem that has more than 8 AGs, and
the result it a single extent per file. When I have less than 8 AGs,
there is contention at the AG level for allocation, and the result
is allocation interleaving giving the above fragmentation levels.
So the allocator is working just fine, it's just that your
configuration is causing the streams to interleave, just like
delalloc used to before the speculative prealloc beyond EOF changes
in 2.6.38. (i.e. the I_DIRTY_RELEASE flag)
> In a comparison of traces, one with the above workload run locally and
> the other with the above workload over NFS, there were occasional calls
> to xfs_free_eofblocks in NFS but not on local workloads. These turned
> out to be from xfs_release.
The reason that xfs_release() is doing this for NFS is that the
I_DIRTY_RELEASE mitigation code triggers off delayed allocation
blocks on the inode, and preallocation does not result in delalloc
> Not calling xfs_free_eofblocks when using the extent size hint resolves this
> issue when using extsize hints and NFS. I suspect that delayed allocation
> mitigated the effect of xfs_free_eofblocks until commit aff3a9ed.
No, the I_DIRTY_RELEASE code is what mitigated the effect for
delayed allocation. Using preallocation for extsize hints does not
result in delalloc being present, hence this stopped working for
So this could be fixed simply by checking for dirty pages whenteh
extentsize hint is set after eof truncation, just like we do for
if (ip->i_delayed_blks ||
((ip->i_d.di_flags & XFS_DIFLAG_EXTSIZE) &&
Alternatively, (and I think a better solution) is to treat files
with extent size hints on them exactly the same as files with
preallocation on them. i.e. in xfs_set_diflags() we set
XFS_DIFLAG_PREALLOC at the same time we set the XFS_DIFLAG_EXTSIZE,
and then all our truncate logic just works as expected for
physical preallocation beyond EOF on extsize hint files without
changing anything else....