xfs
[Top] [All Lists]

Re: XFS Preallocate using ALLOCSP

To: Eric Sandeen <sandeen@xxxxxxxxxxx>, Felix Blyakher <felixb@xxxxxxx>
Subject: Re: XFS Preallocate using ALLOCSP
From: Smit Shah <getsmit@xxxxxxxxx>
Date: Tue, 16 Jun 2009 10:28:41 -0700
Cc: linux-xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZNYR9VFMOiX9mzYUg5vwENAcSEojA6TsjBOqEf7dwjM=; b=SRYr7b3G6/+QC/nuQoH//P+JzXi3gXSWpUthKqdsTKmPws53IJaTG0JuRckAShonmp 6BgJ2bmhlxbAzV0D+7+hd+Ppf2JXlFQ3gYGtWPbLxCeg0KYmEaxjIjEVk8QrMiDEqywh AVUEbl9DB0MeGt7KLr2bOFZoenYga39afSlrs=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=prVdpsGVg6WdfRuKolJWcUXXDAxNdmJhRYhTcOdr03/NfYLHb51e6o+d9nBJysE0tw Z/4OvVf/Ig6ifv4yPleGL6zpacuvVE0bv5fu0Nh7OL8rpJT1s75eJcpwLNYSBg99rg3E 4FjnCeMkkgY++lRNLMVahbTv+k5nWdbwu2L9c=
In-reply-to: <0B774481-16A5-42FC-89C3-91096E59E861@xxxxxxx>
References: <24042506.post@xxxxxxxxxxxxxxx> <4A3712BF.7030101@xxxxxxxxxxx> <8770d98c0906152344p185533a9rc144a5667d13d2de@xxxxxxxxxxxxxx> <4A37B744.9030301@xxxxxxxxxxx> <0B774481-16A5-42FC-89C3-91096E59E861@xxxxxxx>
On 6/16/09, Felix Blyakher <felixb@xxxxxxx> wrote:
>
> On Jun 16, 2009, at 10:16 AM, Eric Sandeen wrote:
>
>> Smit Shah wrote:
>>
>>> Even the man page of fallocate says that it allocates and initializes
>>> to zero the disk space allocated
>>
>> Bleah, so it does:
>>
>>       FALLOC_FL_KEEP_SIZE
>>              This flag allocates and initializes to zero the disk
>> space
>>
>> well, that's misleading and/or wrong.
>>
>>> but when i saw the code i did found out that it does not zero it out.
>>> Hence was a kindof confused. So posix_allocate is similar to ALLOCSP
>>> when falloc is not supported by the underlying filesystem  that is to
>>> ftruncate the file  and zero it out. So all of them try to allocate
>>> contiguous blocks but the only difference is when we use the
>>> fallocate in ext4/xfs it does not zero out the preallocated space. Am
>>> i right ?
>>
>> fallocate / sys_fallocate marks the region as uninitialized so that
>> you
>> get back 0s when you read.  It's implemented on xfs, ext4, ocfs2,
>> and btrfs.
>>
>> posix_fallocate manages to reach sys_fallocate when all the stars
>> align:
>> kernel,  glibc, and filesystgem.  Otherwise it writes 0s.
>>
>>> But  when i fallocate in ext4 i can see the write performance
>>> improvement but not in xfs
>>
>> Testing how?

I use IOmeter to test it.

>>
>>> and reason i found out in one of your previous comments is  because
>>> of the unwritten flag set in xfs. So how do we see if the unwritten
>>> flag is set or not ? I did use xfs_info but it didnt show any such
>>> information.
>>
>> ext4 & xfs are doing the same basic thing, they must maintain the
>> unwritten state on the preallocated extents, and manage that as it
>> changes when portions are written with real data.
>
> Well, the difference in managing the unwritten state can
> theoretically result in different performance. Not that I'd
> expect ext4 being better than xfs in this respect.
> More data is needed here.

Like when i do preallocation i see the thorughput for seq writes being
the same in ext4 but for xfs its goes down by say 10Mps or so.

>
>> xfs_bmap -v -v -p on a file will show you extent state for xfs.
>>
Thanks a lot.
>>> I guess i am not right here ftruncate simply does a lseek and wirtes
>>
>> ftruncate simply sets i_size, it does no data IO.
>
> ... and no block reservation/allocation either.
>
Yeah rite what i meant to say was that posix_fallocate uses ftruncate
which in turn just updates the i_size and then posix_fallocate zeros
out the whole thing

>>> to it which might not be contiguous whereas fallocate tries to
>>> allocate contiguous block so as to reduce fragmentation
>>
>> Actually fallocate's only official job is to reserve blocks so you
>> don't
>> get ENOSPC later.  Because the request comes in all at once, you are
>> very likely to get an optimal allocation, and that's a nice side
>> effect,
>> but it's not actually required by the interface.
>>
>>> and hence i
>>> thought to reduce fragmentation and for security reasons
>>
>> None of these normal interfaces poses any security risk.  If you build
>> xfs without the unwritten extent feature
>
> I don't think, it's possible. Not in any configurable way,
> at least.
>
>> you could allocate w/o flagging
>> uninitialized and expose stale data, but that's not a normal mode of
>> operation.
>
> That's was possible with mount option unwritten=0, but
> AFAIK, it's been recently completely removed from code.
>
>>
>>
>>> its better
>>> to use ALLOCSP rather than something like ftruncate /posix_fallocate
>>> or RSEVSP which kindof performs bad for writes with unwritten flag
>>> set and now there being a no direct way while creating the fs to
>>> disable unwritten.
>>
>> In the end, there are only 2 ways to preallocate blocks: explicitly
>> write 0s, or flag regions as unwritten (as xfs/ext4/... can do).
>
> Exactly.
> That's a trade-off between spending time on setup or at the
> write time. And if explicit zeros are desirable for the former
> approach, it can be driven from the user space (after
> preallocation) rather than from the kernel with the exactly the
> same outcome.
>
> Just restating the same what Eric already said :)
>
> Felix
>
>>  (Ok,
>> or a 3rd sorta-way, which is to reserve w/o flagging, maybe that's
>> what
>> you're looking for, but that's deprecated or not really available at
>> this point).
Yes that is what i was looking for :) but i guess its no more
available through mkfs.xfs
and i will have to do it with xfs_db as stated in one of the Eric's
replies in the previous posts

>>
>> Maybe I should ask what the end goal is here.  :)
>>
Just to see if preallocation using fallocate helps reducing the
fragmentation and increases the throughput.  I guess it wll help
reduce the fragmentation but the write performance is going to suffer.

>> -Eric
>>
>>> Thanks, Smit
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@xxxxxxxxxxx
>> http://oss.sgi.com/mailman/listinfo/xfs
>
>

<Prev in Thread] Current Thread [Next in Thread>