On Fri, Jul 21, 2006 at 01:00:44PM -0400, Ming Zhang wrote:
> what u mean overlay fs over small fs? like a unionfs?
sorta not really, it's userspace libraries which create a virtual
filesystem over real filesystems with some database (bezerkely db).
it sorta evolved from an attempt to unify several filesystems spread
over cheap PCs into something that pretended to be one larger fs
> but other than fsr. there is no better way for this right?
not publicly, you could patch fsr or nag me for my patches if that
helps
> of course, preallocate is always good. but i do not have control
> over applications.
well, in some cases you could use LD_PRELOAD and influence things, it
depends on the application and what you need from it
fwiw, most modern p2p applicaitons have terribly access patterns which
cause cause horrible fragmentation (on all fs's, not just XFS)
> sounds like a useful patch. :P will it be merged into fsr code?
no, because it's ugly and i don't think i ever decoupled it from other
changes and posted it
> what kind of assistance you mean?
[WARNING: lots of hand waving ahead, plenty of minor, but important,
details ignored]
if you wanted much smarter defragmentation semantics, it would
probably make sense to
* bulkstat the entire volume, this will give you the inode cluster
locations and enough information to start building a tree of where
all the files are (XFS_IOC_FSGEOMETRY details obviously)
* opendir/read to build a full directory tree
* use XFS_IOC_GETBMAP & XFS_IOC_GETBMAPA to figure out which blocks
are occupied by which files
you would now have a pretty good idea of what is using what parts of
the disk, except of course it could be constantly changing underneath
you to make things harder
also, doing this using the existing interfaces is (when i tried it)
really really painfully slow if you have a large filesystem with a lot
of small files (even when you try to optimized you accesses for
minimize seeking by sorting by inode number and submitting several
requests in parallel to try and help the elevator merge accesses)
one you have some overall picture of the disk, you can decide what you
want to move to achieve your goal, typically this would be to reduce
the fragmentation of the largest files, and this would be be
relocating some of all of those blocks to another place
if you want to allocate space in a given AG, you open/creat a
temporary file in a directory in that AG (create multiple dirs as
needed to ensure you have one or more of these), and preallocate the
space --- there you can copy the file over
we could also add ioctls to further bias XFSs allocation strategies,
like telling it to never allocate in some AGs (needed for an online
shrink if someone wanted to make such a thing) or simply bias strongly
away from some places, then add other ioctls to allow you to
specifically allocate space in those AGs so you can bias what is
allocated where
another useful ioctl would be a variation of XFS_IOC_SWAPEXT which
would swap only some extents. there is no internal support for this
now except we do have code for XFS_IOC_UNRESVSP64 and XFS_IOC_RESVSP64
so perhaps the idea would be to swap some (but not all) blocks of a
file by creating a function that do the equivalent of 'punch a hole'
where we want to replace the blocks, and then 'allocate new blocks
given some i already have elsewhere' (however, making that all work as
one transaction might be very very difficult)
it's a lot of effort for what for many people wouldn't only have
marginal gains
|