xfs
[Top] [All Lists]

Re: Strange hole creation behavior

To: PÃdraig Brady <P@xxxxxxxxxxxxxx>, Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: Strange hole creation behavior
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Fri, 11 Apr 2014 18:05:49 -0500
Cc: Coreutils <coreutils@xxxxxxx>, OndÅej VaÅÃk <ovasik@xxxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <53487383.6040001@xxxxxxxxxxxxxx>
References: <534822D7.7090803@xxxxxxxxxxxxxx> <20140411204338.GA38024@xxxxxxxxxxxxxxx> <53487383.6040001@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
On 4/11/14, 5:58 PM, PÃdraig Brady wrote:
> On 04/11/2014 09:43 PM, Brian Foster wrote:
>> On Fri, Apr 11, 2014 at 06:13:59PM +0100, PÃdraig Brady wrote:
>>> So this coreutils test is failing on XFS:
>>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=tests/dd/sparse.sh;h=06efc7017
>>> Specifically the last hole check on line 66.
>>>
>>> In summary what's happening is that a write(1MiB), lseek(1MiB), write(1MiB)
>>> creates only a 64KiB hole. Is that expected?
>>>
>>
>> This is expected behavior due to speculative preallocation. An FAQ with
>> regard to this behavior is pending, but see here for reference:
>>
>> http://oss.sgi.com/archives/xfs/2014-04/msg00083.html
>>
>> In that particular write(1MB), lseek(+1MB), write(1MB) workload, each
>> write is preallocating some extra space beyond the current EOF. The seek
>> then moves past that space, but the space doesn't go away. The
>> subsequent writes will extend EOF. The previously preallocated space now
>> resides in the middle of the file and can't be trimmed away when the
>> file is closed.
>>
>>> Now a 1MiB hole is supported using truncate:
>>>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock
>>>   truncate -s+1M file.in
>>>   dd if=/dev/urandom of=file.in bs=1M count=1 iflag=fullblock conv=notrunc 
>>> oflag=append
>>>   $ du -k file.in
>>>   2048  file.in
>>>
>>
>> This works simply because it is broken into multiple commands. When the
>> first dd exits, the excess space is trimmed off (the file descriptor is
>> closed). The subsequent truncate extends the file size without any
>> extra space getting caught between the old and new EOF.
>>
>> You can confirm this by using the 'allocsize=4k' mount option to the XFS
>> mount. If you wanted something more generic for the purpose of testing
>> the coreutils functionality, you could also set the size of file.out in
>> advance. E.g., with preallocation in effect:
>>
>> # dd if=file.in of=file.out bs=1M conv=sparse
>> # xfs_bmap -v file.out 
>> file.out:
>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>    0: [0..3967]:       9773944..9777911  1 (9080..13047)     3968
>>    1: [3968..4095]:    hole                                   128
>>    2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048
>>
>> ... and then prevent preallocation by ensuring writes do not extend the
>> file:
>>
>> # rm -f file.out 
>> # truncate --size=3M file.out
>> # dd if=file.in of=file.out bs=1M conv=sparse,notrunc
>> # xfs_bmap -v file.out 
>> file.out:
>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>    0: [0..2047]:       9773944..9775991  1 (9080..11127)     2048
>>    1: [2048..4095]:    hole                                  2048
>>    2: [4096..6143]:    9778040..9780087  1 (13176..15223)    2048
>>
>> Hope that helps.
> 
> Excellent info thanks.
> With that I can adjust the test so it passes (patch attached).
> 
> So for reference this means that cp can no longer recreate holes
> <= 1MiB from source to dest (with the default XFS allocation size):

Well, the allocation size changes based on the filesize; there's a
heuristic involved.  So I fear that if you hard-code it into your
tests, you risk failing again in the future...

> We could I suppose use FALLOC_FL_PUNCH_HOLE where available
> to cater for this case. I'll see whether this is worth adding.

That might make sense.

But filesystems get to pick their layout; even ext4 may opportunistically
fill in holes, etc - so I think you need to be pretty careful with these
sorts of tests...

-Eric

<Prev in Thread] Current Thread [Next in Thread>