Hey Dave,
On Fri, Jan 18, 2013 at 10:39:22AM +1100, Dave Chinner wrote:
> On Thu, Jan 17, 2013 at 03:41:08PM -0600, Ben Myers wrote:
> > Hey Brian,
> >
> > On Thu, Jan 17, 2013 at 01:11:29PM -0500, Brian Foster wrote:
> > > The stack_switch check currently occurs in __xfs_bmapi_allocate,
> > > which means the stack switch only occurs when xfs_bmapi_allocate()
> > > is called in a loop. Pull the check up before the loop in
> > > xfs_bmapi_write() such that the first iteration of the loop has
> > > consistent behavior.
> > >
> > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > ---
> > >
> > > I was reading through this code and confused myself over whether the stack
> > > switch ever actually occurs. Eric and Ben pointed out on irc
> > > (simultaneously,
> > > I might add) the surrounding loop that I had missed, but it wasn't clear
> > > whether
> > > the behavior to enable the stack switch after the first iteration was
> > > intentional or not. I'm throwing this out there to either fix the issue
> > > or seek
> > > out an explanation for the existing behavior. Thanks!
> >
> > To me this looks to be the correct behavior. It might be better to
> > just get rid of the XFS_BMAPI_STACK_SWITCH flag entirely. Nice find.
>
> Which would take it back to the original logic which always switched
> stacks and we know that caused significant metadata performance
> degradation in various workloads.
>
> If we want to remove XFS_BMAPI_STACK_SWITCH, then we either need to
> solve either the stack overrun problem (not possible, AFAICT) or the
> metadata performance degradation as a result of always pushing
> allocation off into workqueues. So, unfortunately, until we have some
> other resolution, we stuck with it.... :/
Yeah, I went off the rails there. I meant to suggest something more along the
lines of getting rid of the stack_switch member of the args structure, since
xfs_bmapi_write is the only caller of xfs_bmapi_allocate. But it didn't come
out that way... Anyway, what we have is just fine.
Regards,
Ben
|