xfs
[Top] [All Lists]

Re: Backing up a "live" file system.

To: justin@xxxxxxxxxx
Subject: Re: Backing up a "live" file system.
From: Timothy Shimmin <tes@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 5 Jun 2001 15:46:41 +1000
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <Pine.SGI.4.32.0106051444400.176409-100000@xxxxxxxxxxxxxxxxxxxxxx>; from ivanr@xxxxxxxxxxxxxxxxx on Tue, Jun 05, 2001 at 02:59:38PM +1000
References: <200106042046.f54Kk4k01040@xxxxxxxxxxxxxxxxxxxx> <Pine.SGI.4.32.0106051444400.176409-100000@xxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Tue, Jun 05, 2001 at 02:59:38PM +1000, Ivan Rayner wrote:
> On Mon, 4 Jun 2001, Steve Lord wrote:
> >
> > Finally, the amount of space to be used is only an estimate, I do not know
> > how accurate it normally is on Irix, but a factor of 2 looks a bit large.
> 
> The size estimate is based on the blocksize multiplied by the number of
> blocks used for each file.  The problem here is that there is a huge
> number (500,000) of small files, and given that the estimate is off by
> about 1k per file, I'd say the difference is just blocksize vs. filesize.
> 
> 
> Ivan
> 
> > > Also if you look at the above xfdump report, it says that the filesystem
> > > was about 1.4G and the resultant backup was 860M.  When I did the restore,
> > > it was back to about the correct original 1.4G, can anyone comment on why
> > > xfsdump is able to get such good compression?
> > >
So reiterating on what Ivan said, the "compression" is likely to be because 
we do not dump the empty data in the data blocks - and for a lot of
small files this can add up.

I presume from your above statement that you weren't actually querying
the accuracy of the dump estimate - it was just the dump size was
surprisingly small.

FYI some notes on estimate of dump size below.

--Tim

--------------------------------------------------------------------
How does it compute estimated dump size ? 

    A dump consists of media files (only 1 in the case of a dump to a file, and 
usually many when dumped to a tape (depending
    on device type)). A media file consists of: 
        global header 
        inode map (inode# + state(e.g.dump or not?) ) 
        directories 
        non-directory files 

    A directory consists of a header, directory-entry-headers for its entries 
and extended-attribute header and attributes. 

    A non-directory file consists of a file header, extent-headers (for each 
extent), file data and extended-attribute header
    and attributes. Some types of files don't have extent headers or data. 

    The xfsdump code says: 

            size_estimate = GLOBAL_HDR_SZ
                            +
                            inomap_getsz( )
                            +
                            inocnt * ( u_int64_t )( FILEHDR_SZ + EXTENTHDR_SZ )
                            +
                            inocnt * ( u_int64_t )( DIRENTHDR_SZ + 8 )
                            +
                            datasz;

    So this accounts for the: 
        global header 
        inode map 
        all the files 
        all the direntory entries ( "+8" presumably to account for average file 
name length range, where 8 chars already
        included in header; as this structure is padded to the next 8 byte 
boundary, it accounts for names with lengths
        between 8-15 chars) 
        data 

    What estimate doesn't seem to account for (that I can think of): 
        no extended attributes 
        assumes that a file will only have one extent 
        no tape block headers (for tape media) 

    "Datasz" is calculated by adding up for every regular inode file, its 
(number of data blocks) * (block size). However, if "-a"
    is used, then instead of doing this, if the file is dualstate/offline then 
the file's data won't be dumped and it adds zero for it.


<Prev in Thread] Current Thread [Next in Thread>