On Tue, Jun 05, 2001 at 02:59:38PM +1000, Ivan Rayner wrote:
> On Mon, 4 Jun 2001, Steve Lord wrote:
> >
> > Finally, the amount of space to be used is only an estimate, I do not know
> > how accurate it normally is on Irix, but a factor of 2 looks a bit large.
>
> The size estimate is based on the blocksize multiplied by the number of
> blocks used for each file. The problem here is that there is a huge
> number (500,000) of small files, and given that the estimate is off by
> about 1k per file, I'd say the difference is just blocksize vs. filesize.
>
>
> Ivan
>
> > > Also if you look at the above xfdump report, it says that the filesystem
> > > was about 1.4G and the resultant backup was 860M. When I did the restore,
> > > it was back to about the correct original 1.4G, can anyone comment on why
> > > xfsdump is able to get such good compression?
> > >
So reiterating on what Ivan said, the "compression" is likely to be because
we do not dump the empty data in the data blocks - and for a lot of
small files this can add up.
I presume from your above statement that you weren't actually querying
the accuracy of the dump estimate - it was just the dump size was
surprisingly small.
FYI some notes on estimate of dump size below.
--Tim
--------------------------------------------------------------------
How does it compute estimated dump size ?
A dump consists of media files (only 1 in the case of a dump to a file, and
usually many when dumped to a tape (depending
on device type)). A media file consists of:
global header
inode map (inode# + state(e.g.dump or not?) )
directories
non-directory files
A directory consists of a header, directory-entry-headers for its entries
and extended-attribute header and attributes.
A non-directory file consists of a file header, extent-headers (for each
extent), file data and extended-attribute header
and attributes. Some types of files don't have extent headers or data.
The xfsdump code says:
size_estimate = GLOBAL_HDR_SZ
+
inomap_getsz( )
+
inocnt * ( u_int64_t )( FILEHDR_SZ + EXTENTHDR_SZ )
+
inocnt * ( u_int64_t )( DIRENTHDR_SZ + 8 )
+
datasz;
So this accounts for the:
global header
inode map
all the files
all the direntory entries ( "+8" presumably to account for average file
name length range, where 8 chars already
included in header; as this structure is padded to the next 8 byte
boundary, it accounts for names with lengths
between 8-15 chars)
data
What estimate doesn't seem to account for (that I can think of):
no extended attributes
assumes that a file will only have one extent
no tape block headers (for tape media)
"Datasz" is calculated by adding up for every regular inode file, its
(number of data blocks) * (block size). However, if "-a"
is used, then instead of doing this, if the file is dualstate/offline then
the file's data won't be dumped and it adds zero for it.
|