xfs
[Top] [All Lists]

Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc

To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH 1/5] fallocate() implementation in i86, x86_64 and powerpc
From: David Chinner <dgc@xxxxxxx>
Date: Fri, 4 May 2007 17:27:42 +1000
Cc: David Chinner <dgc@xxxxxxx>, "Amit K. Arora" <aarora@xxxxxxxxxxxxxxxxxx>, torvalds@xxxxxxxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, suparna@xxxxxxxxxx, cmm@xxxxxxxxxx
In-reply-to: <20070503232815.2f62a75e.akpm@xxxxxxxxxxxxxxxxxxxx>
References: <20070417125514.GA7574@xxxxxxxxxxxxxxxxxxxx> <20070418130600.GW5967@xxxxxxxxxxxxxxxxxxxx> <20070420135146.GA21352@xxxxxxxxxxxxxxxxxxxx> <20070420145918.GY355@xxxxxxxxxxxxxxxxxxxxxxxx> <20070424121632.GA10136@xxxxxxxxxxxxxxxxxxxx> <20070426175056.GA25321@xxxxxxxxxxxxxxxxxxxx> <20070426180332.GA7209@xxxxxxxxxxxxxxxxxxxx> <20070503212955.b1b6443c.akpm@xxxxxxxxxxxxxxxxxxxx> <20070504060731.GJ32602149@xxxxxxxxxxxxxxxxx> <20070503232815.2f62a75e.akpm@xxxxxxxxxxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4.2.1i
On Thu, May 03, 2007 at 11:28:15PM -0700, Andrew Morton wrote:
> On Fri, 4 May 2007 16:07:31 +1000 David Chinner <dgc@xxxxxxx> wrote:
> > On Thu, May 03, 2007 at 09:29:55PM -0700, Andrew Morton wrote:
> > > On Thu, 26 Apr 2007 23:33:32 +0530 "Amit K. Arora" 
> > > <aarora@xxxxxxxxxxxxxxxxxx> wrote:
> > > 
> > > > This patch implements the fallocate() system call and adds support for
> > > > i386, x86_64 and powerpc.
> > > > 
> > > > ...
> > > > +{
> > > > +       struct file *file;
> > > > +       struct inode *inode;
> > > > +       long ret = -EINVAL;
> > > > +
> > > > +       if (len == 0 || offset < 0)
> > > > +               goto out;
> > > 
> > > The posix spec implies that negative `len' is permitted - presumably 
> > > "allocate
> > > ahead of `offset'".  How peculiar.
> > 
> > I just checked the man page for posix_fallocate() and it says:
> > 
> >       EINVAL  offset or len was less than zero.
> > 
> > We should probably follow this lead.
> 
> Yes, I think so.  I'm suspecting that
> http://www.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html
> is just buggy.  Or I can't read.
> 
> I mean, if we're going to support negative `len' then is the byte at
> `offset' inside or outside the segment?  Head spins.

I don't think we should care. If we provide a syscall with the
semantics of "allocate from offset to offset+len" then glibc's
implementation can turn negative length into two separate
fallocate syscalls....

> > > > +       ret = -ENODEV;
> > > > +       if (!S_ISREG(inode->i_mode))
> > > > +               goto out_fput;
> > > 
> > > So we return ENODEV against an S_ISBLK fd, as per the posix spec.  That
> > > seems a bit silly of them.
> > 
> > Hmmmm - I thought that the intention of sys_fallocate() was to
> > be generic enough to eventually allow preallocation on directories.
> > If that is the case, then this check will prevent that....
> 
> The above opengroup page only permits S_ISREG.  Preallocating directories
> sounds quite useful to me, although it's something which would be pretty
> hard to emulate if the FS doesn't support it.  And there's a decent case to
> be made for emulating it - run-anywhere reasons.  Does glibc emulation support
> directories?  Quite unlikely.
> 
> But yes, sounds like a desirable thing.  Would XFS support it easily if the 
> above
> check was relaxed?

No - right now empty blocks are pruned from the directory immediately so I
don't think we really have a concept of empty blocks in the btree structure.
dir2 is bloody complex, so adding preallocation is probably not going to
be simple to do.

It's not high on my list to add, either, because we can typically avoid the
worst case directory fragmentation by using larger directory block sizes
(e.g. 16k instead of the default 4k on a 4k block size fs).

IIRC directory preallocation has been talked about more for ext3/4....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


<Prev in Thread] Current Thread [Next in Thread>