xfs
[Top] [All Lists]

Re: Insane file system overhead on large volume

To: Martin Steigerwald <Martin@xxxxxxxxxxxx>
Subject: Re: Insane file system overhead on large volume
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 30 Jan 2012 09:18:03 +1100
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Manny <dermaniac@xxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <201201281723.42786.Martin@xxxxxxxxxxxx>
References: <CAEBWcAT2zfDskgDjFr0KcnfsT2A65r04AM1cv2-TfnNJTB1__Q@xxxxxxxxxxxxxx> <201201281555.22179.Martin@xxxxxxxxxxxx> <4F2415B2.3080605@xxxxxxxxxxx> <201201281723.42786.Martin@xxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sat, Jan 28, 2012 at 05:23:42PM +0100, Martin Steigerwald wrote:
> Am Samstag, 28. Januar 2012 schrieb Eric Sandeen:
> > On 1/28/12 8:55 AM, Martin Steigerwald wrote:
> For the gory details:
> 
> > > Why is it that I get
> […]
> > > merkaba:/tmp> LANG=C df -hT /mnt/zeit
> > > Filesystem     Type  Size  Used Avail Use% Mounted on
> > > /dev/loop0     xfs    30T   33M   30T   1% /mnt/zeit
> > > 
> > > 
> > > 33MiB used on first mount instead of 5?
> > 
> > Not sure offhand, differences in xfsprogs version mkfs defaults
> > perhaps.
> 
> Okay, thats fine with me. I was just curious. It doesn´t matter much.

More likely the kernel. Older kernels only use 1024 blocks for
the reserve block pool, while more recent ones use 8192 blocks.

$ gl -n 1 8babd8a
commit 8babd8a2e75cccff3167a61176c2a3e977e13799
Author: Dave Chinner <david@xxxxxxxxxxxxx>
Date:   Thu Mar 4 01:46:25 2010 +0000

    xfs: Increase the default size of the reserved blocks pool

    The current default size of the reserved blocks pool is easy to deplete
    with certain workloads, in particular workloads that do lots of concurrent
    delayed allocation extent conversions.  If enough transactions are running
    in parallel and the entire pool is consumed then subsequent calls to
    xfs_trans_reserve() will fail with ENOSPC.  Also add a rate limited
    warning so we know if this starts happening again.

    This is an updated version of an old patch from Lachlan McIlroy.

    Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
    Signed-off-by: Alex Elder <aelder@xxxxxxx>

> But when I review it, creating a 30TB XFS filesystem should involve writing
> some metadata at different places of the file.
> 
> I get:
> 
> merkaba:/mnt/zeit> LANG=C xfs_bmap fsfile
> fsfile:
>         0: [0..255]: 96..351
>         1: [256..2147483639]: hole
>         2: [2147483640..2147483671]: 3400032..3400063
>         3: [2147483672..4294967279]: hole
>         4: [4294967280..4294967311]: 3400064..3400095
>         5: [4294967312..6442450919]: hole
>         6: [6442450920..6442450951]: 3400096..3400127
>         7: [6442450952..8589934559]: hole

.....

Yeah, that's all the AG headers.

> Okay, it needed to write 2 GB:
> 
> merkaba:/mnt/zeit> du -h fsfile 
> 2,0G    fsfile
> merkaba:/mnt/zeit> du --apparent-size -h fsfile
> 30T     fsfile
> merkaba:/mnt/zeit>
> 
> I didn´t expect mkfs.xfs to write 2 GB, but when thinking through it
> for a 30 TB filesystem I find this reasonable.

It zeroed the log, which will be just under 2GB in size for a
filesystem that large. Zeroing the log accounts for >99% of the IO
that mkfs does for most normal cases.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>