On Mon, Dec 01, 2014 at 01:40:00PM -0500, Brian Foster wrote:
> On Mon, Dec 01, 2014 at 11:32:08AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > After growing a filesystem, XFS can fail to allocate inodes even
> > though there is a large amount of space available in the filesystem
> > for inodes. The issue is caused by a nearly full allocation group
> > having enough free space in it to be considered for inode
> > allocation, but not enough contiguous free space to actually
> > allocation inodes. This situation results in successful selection
> > of the AG for allocation, then failure of the allocation resulting
> > in ENOSPC being reported to the caller.
> > It is caused by two possible issues. Firstly, we only consider the
> > lognest free extent and whether it would fit an inode chunk. If the
> > extent is not correctly aligned, then we can't allocate an inode
> > chunk in it regardless of the fact that it is large enough. This
> > tends to be a permanent error until space in the AG is freed.
> > The second issue is that we don't actually lock the AGI or AGF when
> > we are doing these checks, and so by the time we get to actually
> > allocating the inode chunk the space we thought we had in the AG may
> > have been allocated. This tends to be a spurious error as it
> > requires a race to trigger. Hence this case is ignored in this patch
> > as the reported problem is for permanent errors.
> > The first issue could be addressed by simply taking into account the
> > alignment when checking the longest extent. This, however, would
> > prevent allocation in AGs that have aligned, exact sized extents
> > free. However, this case should be fairly rare compared to the
> > number of allocations that occur near ENOSPC that would trigger this
> > condition.
> I think the allocation of aligned and exact size extents is already
> prevented in the inode chunk allocation code (where it sets the
> alignment requirement of the alloc.) and the associated checks in
> xfs_alloc_fix_freelist() (in particular, the couple bits of logic there
> that consider args->alignment and args->minalignslop).
> This is an interesting observation made from early tests of the sparse
> inode chunk work. At the point where an AG no longer satisfies inode
> chunk allocations, the sparse chunk mechanism actually leads to behavior
> where full inode chunks are allocated a sparse chunk at a time in
> regions of free space that were technically capable of supporting inode
> chunks before resorting to sparse allocs, but that the pre-allocation
> checks were not granular enough to allow to proceed. Note that this
> behavior assumes a workload that sequentially allocates inodes so as to
> not compete with allocations for other purposes, of course.
Yeah, sparse inode chunks change this logic because we can do single
block allocation for a partial inode chunk. However, for all the
existing filesystems that can't do partial inode chunks we still
need to handle this problem case.
> This patch seems to make the higher level AG selection more consistent
> with the lower level allocation attempt, which seems reasonable to me
> because it shouldn't introduce any allocation failures that aren't going
> to end up as failures anyways.
That's pretty much the conclusion I came to - I thought about trying
to combine the selection and allocation loops, but that's a lot more
work than is needed to fix the extent size requirements between the
> > Hence, when selecting the inode AG, take into account the inode
> > cluster alignment when checking the lognest free extent in the AG.
> > If we can't find any AGs with a contiguous free space large
> > enough to be aligned, drop the alignment addition and just try for
> > an AG that has enough contiguous free space available for an inode
> > chunk. This won't prevent issues from occurring, but should avoid
> > situations where other AGs have lots of free space but the selected
> > AG can't allocate due to alignment constraints.
> > Reported-by: Arkadiusz Mi¿kiewicz <arekm@xxxxxxxx>
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> (minor comment nit in the last hunk)