[Top] [All Lists]

Re: fallocate bug?

To: Zhu Han <schumi.han@xxxxxxxxx>
Subject: Re: fallocate bug?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 8 May 2012 15:47:03 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <CAF7KpS906EpxfAMjzW1zx8fSdGacoxkg+Fz=P0Sb3yONMhe7gw@xxxxxxxxxxxxxx>
References: <CAF7KpS-r4zRXZxBU3U8ohxA85-rEvbAzCewYZDr44MNdP+YmFg@xxxxxxxxxxxxxx> <20120507235955.GE5091@dastard> <CAF7KpS_02NjAzY1wOQ9U0kwjKH+SzA0O_3VqSfjJgv0P6Hjk=g@xxxxxxxxxxxxxx> <20120508044039.GF5091@dastard> <CAF7KpS906EpxfAMjzW1zx8fSdGacoxkg+Fz=P0Sb3yONMhe7gw@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, May 08, 2012 at 01:10:55PM +0800, Zhu Han wrote:
> On Tue, May 8, 2012 at 12:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Tue, May 08, 2012 at 11:24:52AM +0800, Zhu Han wrote:
> > > On Tue, May 8, 2012 at 7:59 AM, Dave Chinner <david@xxxxxxxxxxxxx>
> > wrote:
> > > > And so now you've triggered the speculative delayed allocation
> > > > beyond EOF, which is normal behaviour. Hence there are currently
> > > > unused blocks beyond EOF which will get removed either when the next
> > > > close(fd) occurs on the file or the inode is removed from the cache.
> > > >
> > >
> > > Close(fd) should be invoked before dd quits. But why the extra blocks
> > > beyond EOF are not freed?
> >
> > The removal is conditional on how many times the fd has been closed
> > with dirty data on the inode.
> >
> > > The only way I found to remove the extra blocks is truncate the file to
> > its
> > > real size.
> >
> > If the close() didn't remove them, they will be removed when the
> > inode ages out of the cache. Why do you even care about them?
> Our distributed system depends on the real length of files to account the
> space usage.

That's ..... naive. It's never been valid to assume that the file
size is an accurate reflection of space usage, especially as it will
*always* be wrong for sparse files. In the same light, you also
cannot assume that it is an accurate reflection for non-sparse files
because we can do both explicit and speculative allocation beyond
EOF which only du will show. Not to mention that metadata is not
accounted in the file length, and that can consume a significant
amount of space, too.

> This behavior make the account inaccurate.

The block usage reported by XFS is both accurate and correct. The
file size reported by XFS is both accurate and correct. You're
"account inaccuracy" is assuming that they are the same. Perhaps you
should be using quotas for accurate space usage accounting?

Anyway, if you really want to stop speculative delayed allocation
beyond EOF, then use the allocsize mount option to control it.


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>