[Top] [All Lists]

Re: df bigger than ls?

To: Brian Candler <B.Candler@xxxxxxxxx>
Subject: Re: df bigger than ls?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 8 Mar 2012 21:22:30 +1100
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20120308095932.GA24187@xxxxxxxx>
References: <20120307155439.GA23360@xxxxxxxx> <20120307171619.GA23557@xxxxxxxx> <4F57A32A.5010704@xxxxxxxxxxx> <20120308085035.GA23992@xxxxxxxx> <20120308095932.GA24187@xxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Mar 08, 2012 at 09:59:32AM +0000, Brian Candler wrote:
> On Thu, Mar 08, 2012 at 08:50:35AM +0000, Brian Candler wrote:
> > Aha.  This may well be what is screwing up gluster's disk usage on a striped
> > volume - I believe XFS is preallocating space which is actually going to end
> > up being a hole!

How is the filesystem supposed to know that? All it sees is
extending writes, which is what triggers speculative preallocation.

> Here is a standalone testcase.
> $ for i in {0..19}; do dd if=/dev/zero of=testfile bs=128k count=1 seek=$[$i 
> * 12]; done

Yup, that's behaving exactly as expected there. When you seek past
the existing EOF, and there is a speculative preallocation between
the old EOF and the new EOF, it writes zeros to that range because
the assumption is that you are going to fill it with data.

There are applications that do this to trigger that exact
preallocation - Samba is a classic case because windows clients will
write one byte 128k beyond the current EOF to get NTFS to trigger
large preallocation, then send back and write the real data to the
server. In cases like these, you want the hole allocated and filled
with zeros before the real writes come in just in case the server
crashes between the single byte write and the real data being

FWIW, XFS on Irix had special code to detect out of order NFS
writes and do a similar hole filling trick to avoid fragmentation.

There's no one correct behaviour when dealing with writes of this
sort. In some cases the current behaviour is perfect (and samba on
XFS is widely used), in other cases it won't be exactly what you

Indeed, you can avoid this problem by using ftruncate() to extend
the file before writing, write the regions in reverse order, using
fallocate to allocate the exact blocks you want before writing, or
use the allocsize mount option to turn of the dynamic behaviour.


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>