On Tue, Mar 11, 2008 at 09:08:31AM +0100, Christian Røsnes wrote:
> On Mon, Mar 10, 2008 at 11:21 PM, David Chinner <dgc@xxxxxxx> wrote:
> > You've got a filesystem with stripe alignment set. In xfs_ialloc_ag_alloc()
> > we attempt inode allocation by the following rules:
> > 1. a) If we haven't previously allocated inodes, fall through to 2.
> > b) If we have previously allocated inode, attempt to allocate
> > next
> > to the last inode chunk.
> > 2. If we do not have an extent now:
> > a) if we have stripe alignment, try with alignment
> > b) if we don't have stripe alignment try cluster alignment
> > 3. If we do not have an extent now:
> > a) if we have stripe alignment, try with cluster alignment
> > b) no stripe alignment, turn off alignment.
> > 4. If we do not have an extent now: FAIL.
> > Note the case missing from the stripe alignment fallback path - it does not
> > try without alignment at all. That means if all those extents large enough
> > that we found above are not correctly aligned, then we will still fail
> > to allocate an inode chunk. if all the AGs are like this, then we'll
> > fail to allocate at all and fall out of xfs_dialloc() through the last
> > fragment I quoted above.
> > As to the shutdown that this triggers - the attempt to allocate dirties
> > the AGFL and the AGF by moving free blocks into the free list for btree
> > splits and cancelling a dirty transaction results in a shutdown.
> > Now, to test this theory. ;) Luckily, it's easy to test. mount the
> > filesystem with the mount option "noalign" and rerun the mkdir test.
> > If it is an alignment problem, then setting noalign will prevent
> > this ENOSPC and shutdown as the filesystem will be able to allocate
> > more inodes.
> > Can you test this for me, Christian?
> Thanks. Unfortunately noalign didn't solve my problem:
Ok, reading the code a bit further, I've mixed up m_sinoalign,
m_sinoalignmt and the noalign mount option. The noalign mount option
turns off m_sinoalign, but it does not turn off inode cluster
alignment, hence we can't fall back to an unaligned allocation.
So the above theory still holds, just the test case was broken.
Unfortunately, further investigation indicates that inodes are
always allocated aligned; I expect that I could count the number of
linux XFS filesystems not using inode allocation alignment because
mkfs.xfs has set this as the default since it was added in mid-1996.
The problem with unaligned inode allocation is the lookup case
(xfs_dilocate()) in that it requires btree lookups to convert the
inode number to a block number as you don't know where in the chunk
the inode exists just by looking at the inode number. With aligned
allocations, the block number can be derived directly from the inode
number because we know how the inode chunks are aligned.
IOWs, if we allow an unaligned inode chunk allocation to occur, we
have to strip the "aligned inode allocation" feature bit from the
filesystem and the related state and use the slow, btree based
lookup path forever more. That involves I/O instead of a simple
Hence I'm inclined to leave the allocation alignment as it stands
and work out how to prevent the shutdown (a difficult issue in
> I'll try to add some printk statements to the codepaths you mentioned,
> and see where it leads.
Definitely worth confirming this is where the error is coming from.
SGI Australian Software Group