[Top] [All Lists]

[FAQ v2] XFS speculative preallocation

To: xfs@xxxxxxxxxxx
Subject: [FAQ v2] XFS speculative preallocation
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Mon, 7 Apr 2014 11:39:06 -0400
Delivered-to: xfs@xxxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
Hi all,

This is v2 of the speculative preallocation FAQ bits. The initial
proposal was here:


This version includes some updates based on review from arekm and
dchinner. Most notably, the content has been broken down into a few more
questions. Unless there are further major changes required, I'll plan to
post something along these lines to the wiki when my account is
approved. Thanks for the feedback!



Q: Why do files on XFS use more data blocks than expected?


The XFS speculative preallocation algorithm allocates extra blocks
beyond end of file (EOF) to minimise file fragmentation during buffered
write workloads. Workloads that benefit from this behaviour include
slowly growing files, concurrent writers and mixed reader/writer
workloads. It also provides fragmentation resistence in situations where
memory pressure prevents adequate buffering of dirty data to allow
formation of large contiguous regions of data in memory.

This post-EOF block allocation is accounted identically to blocks within
EOF. It is visible in 'st_blocks' counts via stat() system calls,
accounted as globally allocated space and against quotas that apply to
the associated file. The space is reported by various userspace
utilities (stat, du, df, ls) and thus provides a common source of
confusion for administrators. Post-EOF blocks are temporary in most
situations and are usually reclaimed via several possible mechanisms in

See the FAQ entry on speculative preallocation for details.

Q: What is speculative preallocation?


XFS speculatively preallocates post-EOF blocks on file extending writes
in anticipation of future extending writes. The size of a preallocation
is dynamic and depends on the runtime state of the file and fs.
Generally speaking, preallocation is disabled for very small files and
preallocation sizes grow as files grow larger.

Preallocations are capped to the maximum extent size supported by the
filesystem. Preallocation size is throttled automatically as the
filesystem approaches low free space conditions or other allocation
limits on a file (such as a quota).
In most cases, speculative preallocation is automatically reclaimed when
a file is closed. Preallocation may also persist beyond the lifecycle of
the file descriptor. Certain application behaviors that are known to
cause fragmentation, such as file server workloads, slowly growing
files, etc., benefit from this and delay the removal of preallocated
blocks beyond fd close.

Q: How can I speed up or avoid delayed removal of speculative


Remove the inode from the VFS cache or unmount the filesystem to remove
speculative preallocations associated with an inode.

Linux 3.8 (and later) includes a scanner to perform background trimming
of files with lingering post-EOF preallocations. The scanner bypasses
dirty files to avoid interference with ongoing writes. A 5 minute scan
interval is used by default and can be adjusted via the following file
(value in seconds):


Q: Is speculative preallocation permanent?


Although speculative preallocation can lead to reports of excess space
usage, the preallocated space is not permanent unless explicitly made so
via fallocate or a similar interface. Preallocated space can also be
encoded permanently in situations where file size is extended beyond a
range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated
blocks are reclaimed on file close, inode reclaim, unmount or in the
background once file write activity subsides.

Q: My workload has known characteristics - can I tune speculative
preallocation to an optimal fixed size?


The 'allocsize=' mount option configures the XFS block allocation
algorithm to use a fixed allocation size. Speculative preallocation is
not dynamically resized when the allocsize mount option is set and thus
the potential for fragmentation is increased. XFS historically set
allocsize to 64k by default.

<Prev in Thread] Current Thread [Next in Thread>