On Fri, 2006-07-21 at 11:07 -0700, Chris Wedgwood wrote:
> On Fri, Jul 21, 2006 at 01:00:44PM -0400, Ming Zhang wrote:
>
> > what u mean overlay fs over small fs? like a unionfs?
>
> sorta not really, it's userspace libraries which create a virtual
> filesystem over real filesystems with some database (bezerkely db).
> it sorta evolved from an attempt to unify several filesystems spread
> over cheap PCs into something that pretended to be one larger fs
fancy word for this is NAS virtualization i guess.
>
> > but other than fsr. there is no better way for this right?
>
> not publicly, you could patch fsr or nag me for my patches if that
> helps
i will run some tests about fsr and see if i need to bug you about
patches.
>
> > of course, preallocate is always good. but i do not have control
> > over applications.
>
> well, in some cases you could use LD_PRELOAD and influence things, it
> depends on the application and what you need from it
>
> fwiw, most modern p2p applicaitons have terribly access patterns which
> cause cause horrible fragmentation (on all fs's, not just XFS)
>
> > sounds like a useful patch. :P will it be merged into fsr code?
>
> no, because it's ugly and i don't think i ever decoupled it from other
> changes and posted it
>
> > what kind of assistance you mean?
>
> [WARNING: lots of hand waving ahead, plenty of minor, but important,
> details ignored]
>
read about this and feel this will be VERY hard to be built, especially
considering the transaction issue.
can this be easier?
* analyze the fs to find out which file(s) to be defrag;
* create a temp file and begin to copy, preserve the space so it is
continuous;
* after first round of copy, for changed blocks have a trace table and a
second round on changed blocks.
* lock and switch the old file with new file.
> if you wanted much smarter defragmentation semantics, it would
> probably make sense to
>
> * bulkstat the entire volume, this will give you the inode cluster
> locations and enough information to start building a tree of where
> all the files are (XFS_IOC_FSGEOMETRY details obviously)
>
> * opendir/read to build a full directory tree
>
> * use XFS_IOC_GETBMAP & XFS_IOC_GETBMAPA to figure out which blocks
> are occupied by which files
>
> you would now have a pretty good idea of what is using what parts of
> the disk, except of course it could be constantly changing underneath
> you to make things harder
>
> also, doing this using the existing interfaces is (when i tried it)
> really really painfully slow if you have a large filesystem with a lot
> of small files (even when you try to optimized you accesses for
> minimize seeking by sorting by inode number and submitting several
> requests in parallel to try and help the elevator merge accesses)
>
>
> one you have some overall picture of the disk, you can decide what you
> want to move to achieve your goal, typically this would be to reduce
> the fragmentation of the largest files, and this would be be
> relocating some of all of those blocks to another place
>
> if you want to allocate space in a given AG, you open/creat a
> temporary file in a directory in that AG (create multiple dirs as
> needed to ensure you have one or more of these), and preallocate the
> space --- there you can copy the file over
>
> we could also add ioctls to further bias XFSs allocation strategies,
> like telling it to never allocate in some AGs (needed for an online
> shrink if someone wanted to make such a thing) or simply bias strongly
> away from some places, then add other ioctls to allow you to
> specifically allocate space in those AGs so you can bias what is
> allocated where
>
> another useful ioctl would be a variation of XFS_IOC_SWAPEXT which
> would swap only some extents. there is no internal support for this
> now except we do have code for XFS_IOC_UNRESVSP64 and XFS_IOC_RESVSP64
> so perhaps the idea would be to swap some (but not all) blocks of a
> file by creating a function that do the equivalent of 'punch a hole'
> where we want to replace the blocks, and then 'allocate new blocks
> given some i already have elsewhere' (however, making that all work as
> one transaction might be very very difficult)
>
> it's a lot of effort for what for many people wouldn't only have
> marginal gains
|