[Top] [All Lists]

Re: [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Fri, 13 Apr 2012 11:45:40 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <1334319061-12968-6-git-send-email-david@xxxxxxxxxxxxx>
References: <1334319061-12968-1-git-send-email-david@xxxxxxxxxxxxx> <1334319061-12968-6-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 04/13/12 07:10, Dave Chinner wrote:
From: Dave Chinner<dchinner@xxxxxxxxxx>

xfstest 229 exposes a problem with buffered IO, delayed allocation
and extent size hints. That is when we do delayed allocation during
buffered IO, we reserve space for the extent size hint alignment and
allocate the physical space to align the extent, but we do not zero
the regions of the extent that aren't written by the write(2)
syscall. The result is that we expose stale data in unwritten
regions of the extent size hints.

There are two ways to fix this. The first is to detect that we are
doing unaligned writes, check if there is already a mapping or data
over the extent size hint range, and if not zero the page cache
first before then doing the real write. This can be very expensive
for large extent size hints, especially if the subsequent writes
fill then entire extent size before the data is written to disk.

The second, and simpler way, is simply to turn off delayed
allocation when the extent size hint is set and use preallocation
instead. This results in unwritten extents being laid down on disk
and so only the written portions will be converted. This matches the
behaviour for direct IO, and will also work for the real time
device. The disadvantage of this approach is that for small extent
size hints we can get file fragmentation, but in general extent size
hints are fairly large (e.g. stripe width sized) so this isn't a big

Implement the second approach as it is simple and effective.

Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
  fs/xfs/xfs_aops.c |    2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2fc12db..19ce2e2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1175,7 +1175,7 @@ __xfs_get_blocks(
            (!nimaps ||
             (imap.br_startblock == HOLESTARTBLOCK ||
              imap.br_startblock == DELAYSTARTBLOCK))) {
-               if (direct) {
+               if (direct || xfs_get_extsz_hint(ip)) {
                        xfs_iunlock(ip, lockmode);

                        error = xfs_iomap_write_direct(ip, offset, size,


Christoph had reposted the ilock series. This file does not apply cleanly to the new post because he added a new comment before the xfs_iunlock().

Thank-you for reposting the patches.

-Mark Tinguely <tinguely@xxxxxxx>

<Prev in Thread] Current Thread [Next in Thread>