xfs
[Top] [All Lists]

Re: fallocate bug?

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: fallocate bug?
From: Zhu Han <schumi.han@xxxxxxxxx>
Date: Wed, 9 May 2012 09:43:47 +0800
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=yNJ5A/RznxAMpCe06R7fXmyRbHYPEGgT4U6XxuCsAtw=; b=T2EIeeeFXeNWvD28AuK6VH1LoXPKRT0Lg0tAYazDRLmhtbnmSTmZql8iBxhTeR3dhy UGdlglT5u2AUemyw0b9MpnRBfEj7qoC1WaaZXfV7FbT2f5fJKXVG9mLcZkD0zxvA7U9w v6fmBiECMgbAWUBxZA5Qun1p1pcYxksSMBSGB7XCRML0rN5aLV2NI5ZA39deT5wXbigS VihdyPKSw+BLZ4skLWBvu5Cmd4PiYfU1/y/8hd7lgj+5RH5Mww7BzsPHDSwhL4NM2Nzo i2WNHFUWRvS9/Z3R77ZF6yd7ot6AzyUm1swT0M2Q2QBUqRCh7FkbsKvdXAtPr+arZHF9 5edQ==
In-reply-to: <20120508223127.GH5091@dastard>
References: <CAF7KpS-r4zRXZxBU3U8ohxA85-rEvbAzCewYZDr44MNdP+YmFg@xxxxxxxxxxxxxx> <20120507235955.GE5091@dastard> <CAF7KpS_02NjAzY1wOQ9U0kwjKH+SzA0O_3VqSfjJgv0P6Hjk=g@xxxxxxxxxxxxxx> <20120508044039.GF5091@dastard> <CAF7KpS906EpxfAMjzW1zx8fSdGacoxkg+Fz=P0Sb3yONMhe7gw@xxxxxxxxxxxxxx> <20120508054703.GG5091@dastard> <CAF7KpS8oyteCDFM9tYWMp1K_vVeD0LTCY+10dSEiM_99KU9sJw@xxxxxxxxxxxxxx> <20120508223127.GH5091@dastard>
Thank you so much for your kindly help.

best regards,
韩竹(Zhu Han)


On Wed, May 9, 2012 at 6:31 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Tue, May 08, 2012 at 11:25:05PM +0800, Zhu Han wrote:
> On Tue, May 8, 2012 at 1:47 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> > On Tue, May 08, 2012 at 01:10:55PM +0800, Zhu Han wrote:
> > > On Tue, May 8, 2012 at 12:40 PM, Dave Chinner <david@xxxxxxxxxxxxx>
> > wrote:
> > > > On Tue, May 08, 2012 at 11:24:52AM +0800, Zhu Han wrote:
> > > > > On Tue, May 8, 2012 at 7:59 AM, Dave Chinner <david@xxxxxxxxxxxxx>
> > > > wrote:
> > > > > > And so now you've triggered the speculative delayed allocation
> > > > > > beyond EOF, which is normal behaviour. Hence there are currently
> > > > > > unused blocks beyond EOF which will get removed either when the
> > next
> > > > > > close(fd) occurs on the file or the inode is removed from the
> > cache.
> > > > > >
> > > > >
> > > > > Close(fd) should be invoked before dd quits. But why the extra blocks
> > > > > beyond EOF are not freed?
> > > >
> > > > The removal is conditional on how many times the fd has been closed
> > > > with dirty data on the inode.
> > > >
> > > > > The only way I found to remove the extra blocks is truncate the file
> > to
> > > > its
> > > > > real size.
> > > >
> > > > If the close() didn't remove them, they will be removed when the
> > > > inode ages out of the cache. Why do you even care about them?
> > >
> > > Our distributed system depends on the real length of files to account the
> > > space usage.
> >
> > That's ..... naive. It's never been valid to assume that the file
> > size is an accurate reflection of space usage, especially as it will
> > *always* be wrong for sparse files. In the same light, you also
> > cannot assume that it is an accurate reflection for non-sparse files
> > because we can do both explicit and speculative allocation beyond
> > EOF which only du will show. Not to mention that metadata is not
> > accounted in the file length, and that can consume a significant
> > amount of space, too.
> >
> > > This behavior make the account inaccurate.
> >
> > The block usage reported by XFS is both accurate and correct. The
> > file size reported by XFS is both accurate and correct. You're
> > "account inaccuracy" is assuming that they are the same. Perhaps you
> > should be using quotas for accurate space usage accounting?
> >
> > Anyway, if you really want to stop speculative delayed allocation
> > beyond EOF, then use the allocsize mount option to control it.
> >
>
>
> Thanks for help.
>
> I can control the size of pre-allocation, so no data are written beyond the
> pre-allocated block range, so no speculative allocation is triggered.
> Besides it, our system can sync the accurate space usage of mount point
> periodically.
>
> Can you give any hints about the most lightweight approach to get the
> accurate block usage of whole file system?

If you are just after the whole filesystem, then statfs(2) will give
you blocks used and free. If you are after a finer breakdown, then
quotas are probably what you want - they can be used for accounting
separately to the space limiting enforcement. Hence you get
accurate, up-to-date per user, group or project space accounting
without actually limiting space usage at all.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>