On Wed, Jul 13, 2016 at 02:32:17PM -0400, Brian Foster wrote:
> On Wed, Jul 13, 2016 at 09:50:08AM -0700, Darrick J. Wong wrote:
> > On Fri, Jul 08, 2016 at 09:21:55AM -0400, Brian Foster wrote:
> > > > /*
> > > > + * In order to avoid ENOSPC-related deadlock caused by out-of-order
> > > > locking of
> > > > + * AGF buffer (PV 947395), we place constraints on the relationship
> > > > among
> > > > + * actual allocations for data blocks, freelist blocks, and potential
> > > > file data
> > > > + * bmap btree blocks. However, these restrictions may result in no
> > > > actual space
> > > > + * allocated for a delayed extent, for example, a data block in a
> > > > certain AG is
> > > > + * allocated but there is no additional block for the additional bmap
> > > > btree
> > > > + * block due to a split of the bmap btree of the file. The result of
> > > > this may
> > > > + * lead to an infinite loop when the file gets flushed to disk and all
> > > > delayed
> > > > + * extents need to be actually allocated. To get around this, we
> > > > explicitly set
> > > > + * aside a few blocks which will not be reserved in delayed allocation.
> > > > + *
> > > > + * The minimum number of needed freelist blocks is 4 fsbs _per AG_
> > > > when we are
> > > > + * not using rmap btrees a potential split of file's bmap btree
> > > > requires 1 fsb,
> > > > + * so we set the number of set-aside blocks to 4 + 4*agcount when not
> > > > using
> > > > + * rmap btrees.
> > > > + *
> > >
> > > That's a bit wordy.
> >
> > Yikes, that whole thing is a single sentence!
> >
> > One thing I'm not really sure about is how "a potential split of file's bmap
> > btree requires 1 fsb" seems to translate to 4 in the actual formula. I'd
> > have thought it would be m_bm_maxlevels or something... not just 4.
> >
>
> I'm not sure about that either, tbh.
So, a trip down memory lane.
<wavy line dissolve>
Back in 2006, I fixed a bug that changed XFS_ALLOC_SET_ASIDE from
a fixed value of 8 blocks to 4 blocks + 4 AGFL blocks per AG in
commit 4be536de ("[XFS] Prevent free space
oversubscription and xfssyncd looping."). The original value of
8 was for 4 blocks for the bmbt split, and 4 blocks from the current
AG for the AGFL (commit message explains the reason this was a
problem (Yay for writing good commit messages 10 years ago!)). The
oringinal comment text was:
- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks to 8.
-*/
So we need to go back further. We have an obvious git log search
target in the comment (PV#947395), and that points to:
commit d210a28cd851082cec9b282443f8cc0e6fc09830
Author: Yingping Lu <yingping@xxxxxxx>
Date: Fri Jun 9 14:55:18 2006 +1000
[XFS] In actual allocation of file system blocks and freeing extents, the
transaction within each such operation may involve multiple locking of AGF
buffer. While the freeing extent function has sorted the extents based on
AGF number before entering into transaction, however, when the file system
space is very limited, the allocation of space would try every AGF to get
space allocated, this could potentially cause out-of-order locking, thus
deadlock could happen. This fix mitigates the scarce space for allocation
by setting aside a few blocks without reservation, and avoid deadlock by
maintaining ascending order of AGF locking.
SGI-PV: 947395
SGI-Modid: xfs-linux-melb:xfs-kern:210801a
Signed-off-by: Yingping Lu <yingping@xxxxxxx>
Signed-off-by: Nathan Scott <nathans@xxxxxxx>
Which tells us nothing about why 1 fsb for the bmbt split was
actually reserved as 4fsbs. IIRC, I ended up having to find and fix
the problem because Yingping left SGI soon after this fix was made,
and at the time nobody understood or could work out why that was
done. It worked, however, so we left it that way, and just fixed the
per-ag reservation problem this bug fix had.
> > /*
> > * When rmap is disabled, we need to reserve 4 fsbs _per AG_ for the
> > freelist
> > * and 4 more to handle a potential split of the file's bmap btree.
As such, I'm not sure that is any more correct than the original
comment.
Looking back on this now with 10 years more time working on XFS, my
suspicion is that a single level bmap btree split requires 1
block to be allocated, but that allocation will call
xfs_alloc_fix_freelist() to refill the freelist to the minimum
(which is 4 blocks), and so we need at least 4 blocks for the
allocation to succeed (4 blocks for the freelist fill, and if we are
at ENOSPC then the bmap btree block will be allocated from the
AGFL).
Whether the value of 4 is correct or not for this purpose is just a
guess based on the per-ag AGFL requirements, so my only comment
right now is: it's worked for 10 years, so let's not change it until
there's at least some evidence that is it wrong.
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|