| To: | linux-xfs@xxxxxxxxxxx |
|---|---|
| Subject: | Re: file system defragmentation |
| From: | "Linda A. Walsh" <xfs@xxxxxxxxx> |
| Date: | Wed, 21 Sep 2005 01:10:15 -0700 |
| Cc: | Austin Gonyou <austin@xxxxxxxxxxxxxxx> |
| In-reply-to: | <4312913F.6040205@coremetrics.com> |
| References: | <43128F82.4010004@tlinx.org> <4312913F.6040205@coremetrics.com> |
| Sender: | linux-xfs-bounce@xxxxxxxxxxx |
| User-agent: | Mozilla Thunderbird 1.0.6 (Windows/20050716) |
Sorry for the long delay in answering, I must have missed this coming in, but yes, you are right...if you have 1 gigantic file that grows fragmented over time (while not regularly running fsr), you might have a highly fragmented file even though the % of fragmented files would read "low". (I think that is what you were saying in addition to mentioning the concept of many small individual files that could raise a fragmentation number). However, if one were to run 'fsr' daily and if there was a large enough blank contiguous area on the disk to hold the multi-Gig file, fsr would still unfold it; in fact, that was why it was created: given xfs's delayed allocation (even slightly more so on it's native IRIX, _I believe_), it was originally designed without "fsr" because it was thought it wouldn't really be likely to need it. The file system was designed for to handle large recording sessions and large files of streaming media such as live audio and video feed -- which used to be one of SGI's target customers. This is 2nd hand, so I may have some details off, but the way I heard it was that one customer came up with an unplanned scenario where they would do something like recording different parts of weekly shows interwoven throughout a day and throughout a week. Even with delayed writes, one can only delay so long, and, perhaps, while producing multiple programs, there would still be a limit to system memory to buffer multiple feeds in memory before a write to disk would be forced. I think this was back in the mid 90's (?), so bandwidth and main memory were more limited as well. Thus 'xfs_fsr' was born to re-arrange the data portions of files to re-optimize streaming performance. Now admittedly, I don't know what happens with DBMS files that are 'locked'. Since "xfs_fsr" can be run by any user that has read/write access to the specified file it wouldn't appear to need or use any extra privileges so DBMS files that remain locked 24 hours/day but grow slowly over time could, theoretically, become highly internally fragmented. I'm not sure how one would easily work around that problem -- especially in the case of locked records of a database that could be getting updated in real-time. It would likely be "best" to allocate/create the DBMS files to max size they will be allowed to grow, before putting them into use, make sure they are defragmented before using, then use them in place. If you can't predict the maximum file size, then one may have to periodically checkpoint a database, make a copy of the database, defragment the new copy, (since xfs_fsr allows defragmenting by file), then temporarily lock the database and update the new copy with all the changes since the copy was made (perhaps some sort of database journal can be started after the checkpoint that can be replayed on the new copy? Guess it depends on the app). As far as *many* small files. XFS may have some mixed performance characteristics on those. Some small amount of file data or directory information, I believe, may be stored internal to an inode, but that is very limited and I don't know when XFS makes use of that feature -- it might be symlink data or stuff like that only. Dunno. I have seen (and experienced) bad performance for "removing" large numbers of files on XFS. It is probably better than when first ported to Linux, but I believe it's still a noticable slow spot. It _could_ be (guess), that when a file is deleted, and the space is released, the file subsystem attempts to merge the free space released with adjacent blocks of free space -- doing this recursively over each block of space released so that the resulting free space results in the fewest number of "extents" (variable sized allocation units) possible. While this creates a performance penalty for file deletes, it has the effect of automatically coalescing freespace automatically into large segments that can be quickly allocated when needed for streaming 'write' performance. With the delayed write feature of XFS, it can allow the file subsystem to choose a more optimally sized free "extent" on write, rather than allocating single blocks on a first free, first allocated basis as is common in other file systems). I.e. if I already have buffered 256K in memory for a file that is still being written to, XFS could choose a 512K extent that may be followed by other large, contiguous free extents, vs. if I have written a file and closed it for writing, say using cp, and it was a 23.5K file, it would be able to know an extent size between 16-31K might be an exact fit or it could split a 32+K size extent and create a left-over 8K extent. Either way, it would quickly find a contiguous free block of space for the file and not use the fragmentation-prone approach of allocating the first-free blocks first. I guess I've been pretty impressed with the XFS design strategy -- especially considering it's well over 10 years old -- supporting low level fragmentation in _most_ cases w/o needing "fsr" (stock IRIX release was configured to run it *weekly*), one of the first file systems I was exposed to that provided journaling and eliminated my long fsck waits. It also reduced format time on a multi Gig volume from large fractions of an hour to some number of seconds. It was designed with extended attributes, allowing both system and user-level attribute space/file, special "real-time" recording zones that can allow for faster access than going through the file system and it's hierachy and allocation mechanism might allow, supporting detailed layout specification to optimal tune RAID performance on generic hardware, and other features we take for granted in a modern file system. I think it's sorta neat that things that might normally need a data-block under other file systems, like "symlink" data, I believe, is actually stored in the inode. That's gotta reduce large numbers of single, "allocation-unit" sized junks that would be necessary in other filesystems. While their IRIX systems were up to 1024 cpu's/node (one OS image) about 3-4 years ago, they had to play catchup with the Intel Itanium architecture but seem to have gotten up to speed there as well delivering a 10,240 processor system (I'm guessing 10-OS-node cluster, but dunno, maybe they were able to do some custom config -- never can tell what some of the SGI engineers might come up with. There are still some damn smart engineers there, even though many were lost over the past several years due to layoffs and attrition as new management was brought in to 'control' costs by increased scrutiny and critical review/control of the creative process...er... <voice, character=Hagrid>I shouldn't have said that last part. Nope...Forget I said anything.</voice> Bad habit I have of saying a bit too much sometimes. On that note...time to shut up...would like to minimize foot-in mouth disease...:-) Linda
I believe in the FAQ/man-info page for xfs_fsr, it refers to the fact that it will only work on file data and that the percent of frag is a factor of the size of files, number of inodes, and number of files. |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | xfs 1.2 can't mount my promise sata raid filesystem- Plesae help, Michael Hsieh |
|---|---|
| Next by Date: | TAKE 919278 - Fix xfs_db build with GCC 4.x, Christoph Hellwig |
| Previous by Thread: | xfs 1.2 can't mount my promise sata raid filesystem- Plesae help, Michael Hsieh |
| Next by Thread: | TAKE 919278 - Fix xfs_db build with GCC 4.x, Christoph Hellwig |
| Indexes: | [Date] [Thread] [Top] [All Lists] |