xfs
[Top] [All Lists]

Re: XFS slows down on used partions with bonnie++

To: Thor Lancelot Simon <tls@xxxxxxxxxxxx>
Subject: Re: XFS slows down on used partions with bonnie++
From: Steve Lord <lord@xxxxxxx>
Date: 24 Apr 2002 13:31:00 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20020424142912.A24837@xxxxxxxxxxxxxxxxxxxxx>
References: <3C94F14E.7DE5A62D@xxxxxxxxxxx> <3CC67369.935DF3ED@xxxxxxxx> <1019659451.27989.44.camel@xxxxxxxxxxxxxxxxxxxx> <20020424142912.A24837@xxxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Wed, 2002-04-24 at 13:29, Thor Lancelot Simon wrote:
> On Wed, Apr 24, 2002 at 09:44:11AM -0500, Steve Lord wrote:
> > Very interesting, I will take a look at this some more. One initial
> > comment is that optimizing for bonnie is not necessarily the correct
> > thing to do - not many real world loads create thousands of files and
> > then immediately delete them. Plus, once you are on a raid device,
> > logical and physical closeness on the volume no longer mean very much.
> > 
> > Having said that we need to think some more about the underlying
> > allocation policy of inodes vs file data here.
> 
> There's been some amount of academic research on this with LFS (both the
> BSD and Sprite variants), which suffers particularly badly from this
> problem because inode rewrites can cause inodes to migrate away from their
> related data blocks in the log over time.  One interesting result is that
> the original Sprite policy, with all inodes stored at the head of the disk,
> isn't nearly as bad as you'd think; it keeps the seek time down and the
> inodes end up pretty much pinned in the cache anyway.  This isn't too far
> off "allocate inodes from the front of the allocation group", I think.  To
> allocate them at opposite ends, as the average percentage of free space in 
> filesystems with large numbers of inodes grows (as I believe research would
> currently show to be the case, due to increasing per-spindle capacity esp.
> when compared to per-spindle seek performance) probably is about as bad as
> you can get; allocation groups will *not* fill up, so this really does
> maximize seek time in the case in which the inode is not in the cache.  On
> the other hand, if you assume that allocation groups *will* fill up, and
> are willing to make the additional assumption that data written last is
> most likely to be read (questionable, I think, in general, but true for some
> database workloads) then as the group fills up, the data blocks you read
> most frequently will turn out to be closest to the inodes you need to get at
> them.  However, in this case the inode is almost sure to be in cache, no?
> 
> Another interesting take on it is to think about locality of reference
> from an LFS-like temporal point of view.  If inode blocks were simply
> allocated with no constraint other than that they be in the same allocation
> group as the first data block of the file, in the absence of fragmentation
> inode and data blocks that were written at about the same time would tend
> to be in about the same part of the disk -- indeed, since inodes in XFS
> are allocated in 64K extents (aren't they?) you could turn the allocation
> policy on its head and say "put the data block as close as possible to
> the extent with the inode in it".  This would produce the effect that in a
> filesystem with many small files written at the same time, you'd get 
> temporal locality of reference on reads, even across multiple files -- which 
> the LFS work shows to be quite good: data written at the same time generally 
> is read at the same time.  I believe NetApp's WAFL does this, as well: pick
> where the metadata goes, then use that to place the data.  Of course, there 
> are pathological cases for this kind of filesystem structure, too, but the
> existence of the allocation groups, and the presence of read caching, would
> at least reduce them.
> 
> I'm not sure how clear the above was (and maybe anything of sense in it
> was already obvious to you) but it seemed like it might be worth pointing
> out.
> 
> Thor

I am not ignoring this thread, just got a few dozen other things I need
to get done. XFS is supposed to put file data near inode data (inodes
come in chunks of 2 filesystem blocks, 8K on linux, or 32 inodes by
default). We need to study that code path and make sure it is behaving
as designed.

Steve


-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@xxxxxxx


<Prev in Thread] Current Thread [Next in Thread>