On Thu, Feb 11, 2016 at 09:09:37AM -0500, Brian Foster wrote:
> On Thu, Feb 11, 2016 at 08:40:58AM +1100, Dave Chinner wrote:
> > On Wed, Feb 10, 2016 at 11:07:38AM -0800, Christoph Hellwig wrote:
> > > On Wed, Feb 10, 2016 at 01:50:10AM -0800, Darrick J. Wong wrote:
> > > > That's odd... I'd have thought that the AG reservation would always be
> > > > able
> > > > to handle a refcount btree expansion, since it calculates how many
> > > > blocks
> > > > are needed to handle the worst case of 1 record per extent. There's
> > > > also
> > > > a bug where we undercount the number of blocks already used, so it
> > > > should
> > > > have an extra big reservation.
> > > >
> > > > OTOH I've seen occasional ENOSPCs in generic/186 and generic/168 too,
> > > > so I
> > > > guess something's going wrong. Maybe the xfs_ag_resv* tracepoints can
> > > > help?
> > >
> > > I'm not seeing an ENOSPC, I run into:
> > >
> > > [ 640.924891] XFS: Assertion failed: tp->t_blk_res_used <=
> > > tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 315
> >
> > I run into that from time to time (maybe once a month) on a vanilla
> > kernel.
> >
>
> Any idea which test reproduces? I see that generic/033 resulted from the
> discussion below on the rfc. I don't currently reproduce with that test,
> however. The test mentions it uses fzero because zero range doesn't do
> writeback (comments ftw :) and thus allows splitting of delalloc
> extents, but it looks like that might no longer be the case in the
> kernel (since zero range was simplified to reuse punch/alloc).
It's usually one of the fsstress tests that triggers it. For some
reason generic/233 sticks in my mind, but it's a pretty rare failure
these days...
> > IIRC, the problem is the delayed allocation extent split runs out of
> > it's reserved block count if you split it enough times. The case
> > I've seen is that the indlen calculated in xfs_bmap_worst_indlen()
> > ends up too small for a subsequent allocation after we've called
> > xfs_bmap_del_extent() to delete the middle of a delalloc extent too
> > many times.
> >
> > Brian had some patches that attempted to solve it - we may have
> > simply dropped the ball on this (again).
> >
> > http://oss.sgi.com/archives/xfs/2014-09/msg00337.html
> >
>
> I recall working on this, but not quite where it left off. If I dig back
> to my old tree from before the oss.sgi.com->vger switchover, I have a v1
> branch for this work that was posted here:
>
> http://oss.sgi.com/archives/xfs/2014-10/msg00294.html
>
> It looks like we just never got it reviewed and I since lost track of
> it. I can resurrect it if warranted. I would like to nail down a current
> reproducer though...
*nod*. Not sure what we can use to trigger it, though.
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|