xfs
[Top] [All Lists]

Re: Strange hole creation behavior

To: Pádraig Brady <P@xxxxxxxxxxxxxx>
Subject: Re: Strange hole creation behavior
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Fri, 11 Apr 2014 16:43:39 -0400
Cc: xfs-oss <xfs@xxxxxxxxxxx>, OndÅej VaÅÃk <ovasik@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <534822D7.7090803@xxxxxxxxxxxxxx>
References: <534822D7.7090803@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Apr 11, 2014 at 06:13:59PM +0100, Pádraig Brady wrote:
> So this coreutils test is failing on XFS:
> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/dd/sparse.sh;h=06efc7017
> Specifically the last hole check on line 66.
> 
> In summary what's happening is that a write(1MiB), lseek(1MiB), write(1MiB)
> creates only a 64KiB hole. Is that expected?
> 

This is expected behavior due to speculative preallocation. An FAQ with
regard to this behavior is pending, but see here for reference:

http://oss.sgi.com/archives/xfs/2014-04/msg00083.html

In that particular write(1MB), lseek(+1MB), write(1MB) workload, each
write is preallocating some extra space beyond the current EOF. The seek
then moves past that space, but the space doesn't go away. The
subsequent writes will extend EOF. The previously preallocated space now
resides in the middle of the file and can't be trimmed away when the
file is closed.

> Now a 1MiB hole is supported using truncate:
>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock
>   truncate -s+1M file.in
>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock conv=notrunc 
> oflag=append
>   $ du -k file.in
>   2048  file.in
> 

This works simply because it is broken into multiple commands. When the
first dd exits, the excess space is trimmed off (the file descriptor is
closed). The subsequent truncate extends the file size without any
extra space getting caught between the old and new EOF.

You can confirm this by using the 'allocsize=4k' mount option to the XFS
mount. If you wanted something more generic for the purpose of testing
the coreutils functionality, you could also set the size of file.out in
advance. E.g., with preallocation in effect:

# dd if=file.in of=file.out bs=1M conv=sparse
# xfs_bmap -v file.out 
file.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..3967]:       9773944..9777911  1 (9080..13047)     3968
   1: [3968..4095]:    hole                                   128
   2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048

... and then prevent preallocation by ensuring writes do not extend the
file:

# rm -f file.out 
# truncate --size=3M file.out
# dd if=file.in of=file.out bs=1M conv=sparse,notrunc
# xfs_bmap -v file.out 
file.out:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..2047]:       9773944..9775991  1 (9080..11127)     2048
   1: [2048..4095]:    hole                                  2048
   2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048

Hope that helps.

Brian

> But when trying to create the 1MiB hole with dd (lseek) it fails?
> 
>   # Create 3MiB input file file
>   $ dd if=/dev/urandom of=file.in bs=1M count=3 iflag=fullblock
>   $ dd if=/dev/zero    of=file.in bs=1M count=1 seek=1 conv=notrunc
>   $ du -k file.in
>   3072  file.in
> 
>   # Convert to 1MiB hole doesn't work :(
>   $ dd if=file.in of=file.out bs=1M conv=sparse
>   $ du -k file.out
>   3008  file.out
> 
>   # Again with syscall details:
>   $ strace -e write,lseek dd if=file.in of=file.out bs=1M conv=sparse
>   write(1, "...", 1048576) = 1048576
>   lseek(1, 1048576, SEEK_CUR)             = 2097152
>   write(1, "...", 1048576) = 1048576
> 
> So it seems that the lseeks are treated differently to the truncate
> that was done in the first example, which is surprising.
> If we look at the file layout we can see the hole is
> only at the last 64KiB of the middle 1MiB of zeros,
> rather than for the whole middle 1MiB as in the first example??
> 
>   $ filefrag -v file.out
>   Filesystem type is: 58465342
>   File size of file.out is 3145728 (768 blocks of 4096 bytes)
>    ext:     logical_offset:        physical_offset: length:   expected: flags:
>      0:        0..     495:      31271..     31766:    496:
>      1:      512..     767:      31783..     32038:    256:      31767: eof
> 
> thanks,
> Pádraig.
> 
> Versions etc. in case useful
> 
> $ uname -a
> Linux tp2 3.12.6-300.fc20.x86_64 #1 SMP Mon Dec 23 16:44:31 UTC 2013 x86_64 
> x86_64 x86_64 GNU/Linux
> 
> $ xfs_info .
> meta-data=/dev/loop2             isize=256    agcount=4, agsize=65536 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=262144, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=2560, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>