On Wed, Oct 09, 2013 at 06:02:41AM -0700, Christoph Hellwig wrote:
> Currently libxfs has a cache for xfs_inode structures. Unlike in kernelspace
> where the inode cache, and the associated page cache for file data is used
> for all filesystem operations the libxfs inode cache is only used in few
> - the libxfs init code reads the root and realtime inodes when called from
> xfs_db using a special flag, but these inode structure are never referenced
> - mkfs uses namespace and bmap routines that take the xfs_inode structure
> to create the root and realtime inodes, as well as any additional files
> specified in the proto file
> - the xfs_db attr code uses xfs_inode-based attr routines in the attrset
> and attrget commands
> - phase6 of xfs_repair uses xfs_inode-based routines for rebuilding
> directories and moving files to the lost+found directory.
> - phase7 of xfs_repair uses struct xfs_inode to modify the nlink count
> of inodes.
> So except in repair we never ever reuse a cached inode, and even in repair
> the logical inode caching doesn't help:
> - in phase 6a we iterate over each inode in the incore inode tree,
> and if it's a directory check/rebuild it
> - phase6b then updates the "." and ".." entries for directories
> that need, which means we require the backing buffers.
> - phase6c moves disconnected inodes to lost_found, which again needs
> the backing buffer to actually do anything.
> - phase7 then only touches inodes for which we need to reset i_nlink,
> which always involves reading, modifying and writing the physical
> which always involves modifying the . and .. entries.
> Given these facts stop caching the inodes to reduce memory usage
> especially in xfs_repair, where this makes a different for large inode
> count inodes. On the upper end this allows repair to complete for
> filesystem / amount of memory combinations that previously wouldn't.
This all sounds good and the code looks fine, but there's one
lingering question I have - what's the impact on performance for
repair? Does it slow down phase 6/7 at all?
> With this we probably could increase the memory available to the buffer
> cache in xfs_repair, but trying to do so I got a bit lost - the current
> formula seems to magic to me to make any sense, and simply doubling the
> buffer cache size causes us to run out of memory given that the data cached
> in the buffer cache (typically lots of 8k inode buffers and few 4k other
> metadata buffers) are much bigger than the inodes cached in the inode
> cache. We probably need a sizing scheme that takes the actual amount
> of memory allocated to the buffer cache into account to solve this better.
IIRC, the size of the buffer cache is currently set at 75% of RAM
so doubling it would cause OOM issues regardless of the presence of
the inode cache....