[PATCH 5/6] xfs: splitting delalloc extents can run out of reservation
Dave Chinner
david at fromorbit.com
Fri Apr 4 16:31:52 CDT 2014
On Fri, Apr 04, 2014 at 09:37:01AM -0400, Brian Foster wrote:
> On Fri, Mar 21, 2014 at 09:11:49PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner at redhat.com>
> >
> > When we punch a hole in a delalloc extent, we split the indirect
> > block reservation between the two new extents. If we repeatedly
> > punch holes in a large delalloc extent, that reservation will
> > eventually run out and we'll assert fail in xfs_bunmapi() because
> > the indirect block reservation for the delalloc extent is zero. This
> > is caused by doing a large delalloc write, then zeroing multiple
> > ranges of that write using fallocate to punch lots of holes in the
> > delayed allocation range.
> >
> > To avoid this problem, if we split the reservation and require more
> > indirect blocks for the two new extents than we had for the old
> > reservation, steal the additional blocks from the hole we punched in
> > the extent. In most cases we only need a single extra block, so even
> > if we punch only single block holes we can still retain sufficient
> > indirect block reservations to avoid problems.
> >
> > In doing this "stealing", we need to change where we account for the
> > delalloc blocks being freed. The block count held on the inode does
> > not take into account the indirect block reservation, so we still
> > need to do that before we free the extent. However, the accounting
> > ofr free space in the superblock need to be done after we've stolen
> > the blocks fro the freed extent so that they are accounted for
> > correctly.
> >
> > Signed-off-by: Dave Chinner <dchinner at redhat.com>
> > ---
> > fs/xfs/xfs_bmap.c | 65 ++++++++++++++++++++++++++++++++++++++++---------------
> > 1 file changed, 47 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
> > index 5b6092e..4bf6a0e 100644
> > --- a/fs/xfs/xfs_bmap.c
> > +++ b/fs/xfs/xfs_bmap.c
> > @@ -4945,7 +4945,27 @@ xfs_bmap_del_extent(
> > temp2 = xfs_bmap_worst_indlen(ip, temp2);
> > new.br_startblock = nullstartblock((int)temp2);
> > da_new = temp + temp2;
> > +
> > + /*
> > + * Note: if we have an odd number of blocks reserved,
> > + * then if we keep splitting the delalloc extent like
> > + * this we end up with a delalloc indlen reservation of
> > + * zero for one of the two extents. Hence if we end
> > + * up with the new indlen reservations being larger than
> > + * the old one, steal blocks from the data reservation
> > + * we just punched out. Otherwise, just reduce the
> > + * remaining indlen reservations alternately and hope
> > + * next time we come here the range getting removed is
> > + * large enough to fix this all up.
> > + */
> > while (da_new > da_old) {
> > + if (del->br_blockcount) {
> > + /* steal a block */
> > + da_new--;
> > + del->br_blockcount--;
> > + continue;
> > + }
> > +
> > if (temp) {
> > temp--;
> > da_new--;
> > @@ -5255,24 +5275,6 @@ xfs_bunmapi(
> > }
> > if (wasdel) {
> > ASSERT(startblockval(del.br_startblock) > 0);
> > - /* Update realtime/data freespace, unreserve quota */
> > - if (isrt) {
> > - xfs_filblks_t rtexts;
> > -
> > - rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
> > - do_div(rtexts, mp->m_sb.sb_rextsize);
> > - xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS,
> > - (int64_t)rtexts, 0);
> > - (void)xfs_trans_reserve_quota_nblks(NULL,
> > - ip, -((long)del.br_blockcount), 0,
> > - XFS_QMOPT_RES_RTBLKS);
> > - } else {
> > - xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
> > - (int64_t)del.br_blockcount, 0);
> > - (void)xfs_trans_reserve_quota_nblks(NULL,
> > - ip, -((long)del.br_blockcount), 0,
> > - XFS_QMOPT_RES_REGBLKS);
> > - }
> > ip->i_delayed_blks -= del.br_blockcount;
> > if (cur)
> > cur->bc_private.b.flags |=
> > @@ -5302,6 +5304,33 @@ xfs_bunmapi(
> > }
> > error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
> > &tmp_logflags, whichfork);
> > + /*
> > + * xfs_bmap_del_extent may hand delayed alloc blocks back to the
> > + * indirect block reservations to keep extent split reservations
> > + * sane. Hence we should only decrement the delayed block count
> > + * on the inode once we know exactly the amount of delalloc
> > + * space we actually removed from the inode.
> > + */
> > + if (wasdel && del.br_blockcount) {
> > + /* Update realtime/data freespace, unreserve quota */
> > + if (isrt) {
> > + xfs_filblks_t rtexts;
> > +
> > + rtexts = XFS_FSB_TO_B(mp, del.br_blockcount);
> > + do_div(rtexts, mp->m_sb.sb_rextsize);
> > + xfs_mod_incore_sb(mp, XFS_SBS_FREXTENTS,
> > + (int64_t)rtexts, 0);
> > + (void)xfs_trans_reserve_quota_nblks(NULL,
> > + ip, -((long)del.br_blockcount), 0,
> > + XFS_QMOPT_RES_RTBLKS);
> > + } else {
> > + xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
> > + (int64_t)del.br_blockcount, 0);
> > + (void)xfs_trans_reserve_quota_nblks(NULL,
> > + ip, -((long)del.br_blockcount), 0,
> > + XFS_QMOPT_RES_REGBLKS);
> > + }
> > + }
>
> In xfs_bmapi_reserve_delalloc(), we account alen against the quota,
> calculate indlen, account both against the sb counters, add alen to
> i_delayed_blks and continue on...
*nod*
> In xfs_bunmap(), we remove br_blockcount from the sb counters and
> unreserve from the quota then call into xfs_bmap_del_extent(). The
> latter takes care of removing the indlen blocks from the sb counters if
> necessary.
*nod*
> With these changes, we move the accounting after the del_extent() call
> and allow the latter to steal from br_blockcount for indlen. This seems
> like it works for the sb counters.
*nod*
> We also adjust i_delayed_blks against
> the original extent length before it can be modified. The quota
> reservation looks like it could become inconsistent, however. E.g., some
> blocks previously accounted against the quota (alen) are now moved over
> to indlen.
Yes, you are right - it will cause inconsistency. That's a lesser
evil than not having a reservation at all, but I can probably fix
that up.
> If those indlen blocks happen to be cleaned up through
> del_extent(), they'd only be replenished to the sb counters (near the
> end of del_extent()). Perhaps we should leave the quota unreserve prior
> to the del_extent() call or sample the original br_blockcount and use it
> post del_extent()..?
That's a possibility. I really only looked at fixing the reservation
issue and keeping the sb counters correct, not the quota aspect of
it. I'll go back and see what I can come up with that solves this
problem, too.
Thanks!
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list