[Top] [All Lists]

Re: [PATCH RFC 00/18] xfs: sparse inode chunks

To: xfs@xxxxxxxxxxx
Subject: Re: [PATCH RFC 00/18] xfs: sparse inode chunks
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Thu, 24 Jul 2014 12:28:16 -0400
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1406211788-63206-1-git-send-email-bfoster@xxxxxxxxxx>
References: <1406211788-63206-1-git-send-email-bfoster@xxxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
On Thu, Jul 24, 2014 at 10:22:50AM -0400, Brian Foster wrote:
> Hi all,
> This is a first pass at sparse inode chunk support for XFS. Some
> background on this work is available here:
> http://oss.sgi.com/archives/xfs/2013-08/msg00346.html
> The basic idea is to allow the partial allocation of inode chunks into
> fragmented regions of free space. This is accomplished through addition
> of a holemask field into the inobt record that defines what portion(s)
> of an inode chunk are invalid (i.e., holes in the chunk). This work is
> not quite complete, but is at a point where I'd like to start getting
> feedback on the design and what direction to take for some of the known
> gaps.

I've attached a tarball to this message with a couple userspace patches
and an xfstests patch to facilitate experimentation. The userspace
patches update the inobt record data structure and add the holemask
field to xfs_db to facilitate poking around. Note that the rest of
userspace is untouched at this point, including repair being broken,
etc., so I don't recommend use beyond xfs_db.

The xfstests test case fragments free space, allocates inodes until
ENOSPC and expects to consume most of the free space available in the
fs. The "fragmentation factor" is currently dynamic and based on the
cluster size due to the cluster size scaling behavior documented below.

Finally, sparse inode chunks are only enabled for v5 superblocks, so a
crc enabled fs is required to test.


> The basic breakdown of functionality in this set is as follows:
> - Patches 1-2 - A couple generic cleanups that are dependencies for later
>   patches in the series.
> - Patches 3-5 - Basic data structure update, feature bit and minor
>   helper introduction.
> - Patches 6-7 - Update v5 icreate logging and recovery to handle sparse
>   inode records.
> - Patches 8-13 - Allocation support for sparse inode records. Physical
>   chunk allocation and individual inode allocation.
> - Patches 14-16 - Deallocation support for sparse inode chunks. Physical
>   chunk deallocation, individual inode free and cluster free.
> - Patch 17 - Fixes for bulkstat/inumbers.
> - Patch 18 - Activate support for sparse chunk allocation and
>   processing.
> This work is lightly tested for regression (some xfstests failures due
> to repair) and basic functionality. I have a new xfstests test I'll
> forward along for demonstration purposes.
> Some notes on gaps in the design:
> - Sparse inode chunk allocation granularity:
> The current minimum sparse chunk allocation granularity is the cluster
> size. My initial attempts at this work tried to redefine to the minimum
> chunk length based on the holemask granularity (a la the stale macro I
> seemingly left in this series ;), but this involves tweaking the
> codepaths that use the cluster size (i.e., imap) which proved rather
> hairy. This also means we need a solution where an imap can change if an
> inode was initially mapped as a sparse chunk and said chunk is
> subsequently made full. E.g., we'd perhaps need to invalidate the inode
> buffers for sparse chunks at the time where they are made full. Given
> that, I landed on using the cluster size and leaving those codepaths as
> is for the time being.
> There is a tradeoff here for v5 superblocks because we've recently made
> a change to scale the cluster size based on the factor increase in the
> inode size from the default (see xfsprogs commit 7b5f9801). This means
> that effectiveness of sparse chunks is tied to whether the level of free
> space fragmentation matches the cluster size. By that I mean effectivess
> is good (near 100% utilization possible) if free space fragmentation
> leaves free extents around that at least match the cluster size. If
> fragmentation is worse than the cluster size, effectiveness is reduced.
> This can also be demonstrated with the forthcoming xfstests test.
> - On-disk lifecycle of the sparse inode chunks feature bit:
> We set an incompatible feature bit once a sparse inode chunk is
> allocated because older revisions of code will interpret the non-zero
> holemask bits in the higher order bytes of the record freecount. The
> feature bit must be removed once all sparse inode chunks are eliminated
> one way or another. This series does not currently remove the feature
> bit once set simply because I hadn't thought through the mechanism quite
> yet. For the next version, I'm thinking about adding an inobt walk
> mechanism that can be conditionally invoked (i.e., feature bit is
> currently set and a sparse inode chunk has been eliminated) either via
> workqueue on an interval or during unmount if necessary. Thoughts or
> alternative suggestions on that appreciated.
> That's about it for now. Thoughts, reviews, flames appreciated. Thanks.
> Brian
> Brian Foster (18):
>   xfs: refactor xfs_inobt_insert() to eliminate loop and support
>     variable count
>   xfs: pass xfs_mount directly to xfs_ialloc_cluster_alignment()
>   xfs: define sparse inode chunks v5 sb feature bit and helper function
>   xfs: introduce inode record hole mask for sparse inode chunks
>   xfs: create macros/helpers for dealing with sparse inode chunks
>   xfs: pass inode count through ordered icreate log item
>   xfs: handle sparse inode chunks in icreate log recovery
>   xfs: create helper to manage record overlap for sparse inode chunks
>   xfs: allocate sparse inode chunks on full chunk allocation failure
>   xfs: set sparse inodes feature bit when a sparse chunk is allocated
>   xfs: reduce min. inode allocation space requirement for sparse inode
>     chunks
>   xfs: helper to convert inobt record holemask to inode alloc. bitmap
>   xfs: filter out sparse regions from individual inode allocation
>   xfs: update free inode record logic to support sparse inode records
>   xfs: only free allocated regions of inode chunks
>   xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster()
>   xfs: use actual inode count for sparse records in bulkstat/inumbers
>   xfs: enable sparse inode chunks for v5 superblocks
>  fs/xfs/libxfs/xfs_format.h       |  17 +-
>  fs/xfs/libxfs/xfs_ialloc.c       | 441 
> +++++++++++++++++++++++++++++++++------
>  fs/xfs/libxfs/xfs_ialloc.h       |  17 +-
>  fs/xfs/libxfs/xfs_ialloc_btree.c |   4 +-
>  fs/xfs/libxfs/xfs_sb.h           |   9 +-
>  fs/xfs/xfs_inode.c               |  28 ++-
>  fs/xfs/xfs_itable.c              |  12 +-
>  fs/xfs/xfs_log_recover.c         |  23 +-
>  8 files changed, 460 insertions(+), 91 deletions(-)
> -- 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

Attachment: xfs_spinodes_user.tar.bz2
Description: BZip2 compressed data

<Prev in Thread] Current Thread [Next in Thread>