[Top] [All Lists]

Re: file preallocation without unwritten flag being set

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: file preallocation without unwritten flag being set
From: p v <pvlogin@xxxxxxxxx>
Date: Wed, 13 May 2009 14:05:16 -0700 (PDT)
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1242248718; bh=AfgOHrctxu+XpbAZUkhLZNyt2NNF12dta12ZFAYDSPQ=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=CDV6yVv7NLR4TYFNJv/7CBEKl2LSnLInHs1qdW7g53CEO582EtPdOXTMWNL1GR3JKgroXKc5ds6oLYQkHcLx6q9/ovwXOsvlY9aau9vC6nAi1RtdDz698j77L/gj8z7aURJJvWzSyrZ08goSf7iNmectEH/vgFVjXD3BALJEJcs=
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=a/+gWrCPdY1rTDgECiuIJqx58ba8X8wXPVibNno5YhQjNoKN0BKEysGhfZHW3n8Ag6NJxn199xfWFLjcR6DSdxk+8SKq5JfyMuO19Ht9yrGgUZhY8U7ZvuH+A0ORTCnrkP8dGyJkl+fxKtiBx1m2JHZnCKCVmNhrs1MsZHK+jss=;
In-reply-to: <4A0A55E0.4010202@xxxxxxxxxxx>
References: <283244.29270.qm@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <4A0A0E76.6000701@xxxxxxxxxxx> <618437.93111.qm@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <4A0A55E0.4010202@xxxxxxxxxxx>
doesn't seem to work - I tried to clear the extflg in the versionnum of the 
superblock (in every copy of it as well) but it doesn't work. The flag is still 
set on all extents.

xfs_db> version
versionnum [0xb4a4+0x8] = V4,NLINK,ALIGN,DIRV2,LOGV2,EXTFLG,MOREBITS,ATTR2
xfs_db> version 0xa4a4 0x8
versionnum [0xa4a4+0x8] = V4,NLINK,ALIGN,DIRV2,LOGV2,MOREBITS,ATTR2

typeset -i agcount=$(xfs_db -c "sb" -c "print" /dev/sda | grep agcount)
typeset -i i=0
while [[ $i != $agcount ]]
        xfs_db -x -c "sb $i" -c "write versionnum 0xa4a4" /dev/sda

And once I make the file xfs_repair complains and resets the sb flag - my guess 
is that in the extent allocation path it is hardcoded for the version 4 - any 
extent allocated beyond file size will get the flag ...

Also - 2 questions -

1) what is inode64 and where can I find out all of the undocumented mkfs/mount 
options (it's unfortunate that such a good fs doesnt' have a correspondingly 
good documentation)

2) why is the largest extent size limited to xxx blocks(can't find out 
thenumber - when does the inode get finally flushed? ls -i reports 19 as the 
inode number but even after unmounting inode 19 in xfs_db still shows a free 
inode - is it still only in the log???) ? I assumed that xfs_bmap gets me the 
correct number of extents but now looking at the inode with xfs_db it's obvious 
that xfs_bmap reports contiguous ranges rather than actual extents in the 
blockmap tree


Peter Vajgel

----- Original Message ----
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
To: p v <pvlogin@xxxxxxxxx>
Cc: xfs@xxxxxxxxxxx
Sent: Tuesday, May 12, 2009 10:08:48 PM
Subject: Re: file preallocation without unwritten flag being set

p v wrote:
> I want to avoid any metadata modifications while doing O_DIRECT reads
> (the fs is mounted with noatime). Right now I am doing it mostly for
> testing - I am seeing a performance degradation going from raw to xfs
> on a 10TB filesystem - probably due to my application but I am trying
> to narrow it down so I am starting with running randomio benchmark on
> raw - then 10TB file, then 10 1TB files, then 100 100GB files, ...

you may want to try the inode64 mount option so the allocator is free to
roam your whole 10T ...

> But in general certain applications can definitely take care of the
> preallocated space (db, FB haystack, ...). 

Ok, so it sounds like you do understand the implications and you want to
be able to write into prealloc space without any metadata updates as
they are converted to initialized extents... :)

> What they require is
> minimal fragmentation so they would prefer to preallocate the space
> (fill the whole fs with contigous files) and then maintain in-files
> app specific metadata (such as valid offsets of initialized data,
> ...). What I would really like is to have vxfs equivalent of setext
> options -
> setext -r <reservation> -f chggsize
> And on top of that I would really love to have is vxfs equivalent of
> "nomtime" mount option. Then with O_DIRECT I have raw-like
> performance.
> With the unwritten mkfs option I could get the setext semantics. So
> what's the trick (before I dive into the xfs layout)? I am guessing
> that there is no equivalent for nomtime option?

well, the unwritten=0 option did get removed:

TBH I'm not entirely sure why.

The unwritten flag is per-filesystem not per-file; you can still clear
that feature bit:


by using xfs_db in -x expert mode to rewrite every superblock's
"versionnum" without that bit set.

The xfs_db "version" command will give you a more textual representation
of what is actually set before & after.

You could script the sb rewrites...

For what it's worth, your xfs_db tricks below to preallocate seem a bit
... tricky.

This should suffice:

xfs_io -f /hay/foo
xfs_io> resvsp 0 1024g
xfs_io> truncate 1024g
xfs_io> quit

Oh and you're right, there's no "nomtime" option AFAIK.


> Thanks
> Peter Vajgel


<Prev in Thread] Current Thread [Next in Thread>