On 4/7/14, 10:39 AM, Brian Foster wrote:
> Hi all,
> This is v2 of the speculative preallocation FAQ bits. The initial
> proposal was here:
> This version includes some updates based on review from arekm and
> dchinner. Most notably, the content has been broken down into a few more
> questions. Unless there are further major changes required, I'll plan to
> post something along these lines to the wiki when my account is
> approved. Thanks for the feedback!
> Q: Why do files on XFS use more data blocks than expected?
> The XFS speculative preallocation algorithm allocates extra blocks
> beyond end of file (EOF) to minimise file fragmentation during buffered
> write workloads. Workloads that benefit from this behaviour include
> slowly growing files, concurrent writers and mixed reader/writer
> workloads. It also provides fragmentation resistence in situations where
> memory pressure prevents adequate buffering of dirty data to allow
> formation of large contiguous regions of data in memory.
> This post-EOF block allocation is accounted identically to blocks within
> EOF. It is visible in 'st_blocks' counts via stat() system calls,
> accounted as globally allocated space and against quotas that apply to
> the associated file. The space is reported by various userspace
> utilities (stat, du, df, ls) and thus provides a common source of
> confusion for administrators. Post-EOF blocks are temporary in most
> situations and are usually reclaimed via several possible mechanisms in
"usually reclaimed" - is it ever "never" reclaimed, then?
> See the FAQ entry on speculative preallocation for details.
> Q: What is speculative preallocation?
> XFS speculatively preallocates post-EOF blocks on file extending writes
> in anticipation of future extending writes. The size of a preallocation
> is dynamic and depends on the runtime state of the file and fs.
> Generally speaking, preallocation is disabled for very small files and
> preallocation sizes grow as files grow larger.
> Preallocations are capped to the maximum extent size supported by the
> filesystem. Preallocation size is throttled automatically as the
> filesystem approaches low free space conditions or other allocation
> limits on a file (such as a quota).
> In most cases, speculative preallocation is automatically reclaimed when
> a file is closed. Preallocation may also persist beyond the lifecycle of
> the file descriptor. Certain application behaviors that are known to
> cause fragmentation, such as file server workloads, slowly growing
> files, etc., benefit from this and delay the removal of preallocated
> blocks beyond fd close.
this is a little handwavy. "It's reclaimed when it's closed, except
when it's not?" Can we say something more informative here?
> Q: How can I speed up or avoid delayed removal of speculative
> Remove the inode from the VFS cache or unmount the filesystem to remove
> speculative preallocations associated with an inode.
How does a user remove an inode from the VFS cache? ;)
So far the answer to this question sounds like "no."
We can't remove a single inode; drop_caches is way too heavy weight,
and unmount isn't really viable in most cases.
> Linux 3.8 (and later) includes a scanner to perform background trimming
> of files with lingering post-EOF preallocations. The scanner bypasses
> dirty files to avoid interference with ongoing writes. A 5 minute scan
> interval is used by default and can be adjusted via the following file
> (value in seconds):
> Q: Is speculative preallocation permanent?
> Although speculative preallocation can lead to reports of excess space
> usage, the preallocated space is not permanent unless explicitly made so
> via fallocate or a similar interface. Preallocated space can also be
> encoded permanently in situations where file size is extended beyond a
> range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated
(maybe "an extending truncate")
> blocks are reclaimed on file close, inode reclaim, unmount or in the
> background once file write activity subsides.
> Q: My workload has known characteristics - can I tune speculative
> preallocation to an optimal fixed size?
> The 'allocsize=' mount option configures the XFS block allocation
> algorithm to use a fixed allocation size. Speculative preallocation is
> not dynamically resized when the allocsize mount option is set and thus
> the potential for fragmentation is increased. XFS historically set
> allocsize to 64k by default.