On Thu, Jan 17, 2013 at 03:41:08PM -0600, Ben Myers wrote:
> Hey Brian,
> On Thu, Jan 17, 2013 at 01:11:29PM -0500, Brian Foster wrote:
> > The stack_switch check currently occurs in __xfs_bmapi_allocate,
> > which means the stack switch only occurs when xfs_bmapi_allocate()
> > is called in a loop. Pull the check up before the loop in
> > xfs_bmapi_write() such that the first iteration of the loop has
> > consistent behavior.
> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> > I was reading through this code and confused myself over whether the stack
> > switch ever actually occurs. Eric and Ben pointed out on irc
> > (simultaneously,
> > I might add) the surrounding loop that I had missed, but it wasn't clear
> > whether
> > the behavior to enable the stack switch after the first iteration was
> > intentional or not. I'm throwing this out there to either fix the issue or
> > seek
> > out an explanation for the existing behavior. Thanks!
> To me this looks to be the correct behavior. It might be better to
> just get rid of the XFS_BMAPI_STACK_SWITCH flag entirely. Nice find.
Which would take it back to the original logic which always switched
stacks and we know that caused significant metadata performance
degradation in various workloads.
If we want to remove XFS_BMAPI_STACK_SWITCH, then we either need to
solve either the stack overrun problem (not possible, AFAICT) or the
metadata performance degradation as a result of always pushing
allocation off into workqueues. So, unfortunately, until we have some
other resolution, we stuck with it.... :/