On Wed, Feb 12, 2014 at 09:25:38AM -0500, Dave Jones wrote:
> On Wed, Feb 12, 2014 at 05:10:38PM +1100, Dave Chinner wrote:
> > On Wed, Feb 12, 2014 at 12:50:27AM -0500, Dave Jones wrote:
> > > On Wed, Feb 12, 2014 at 04:40:43PM +1100, Dave Chinner wrote:
> > >
> > > > None of the XFS code disables interrupts in that path, not does is
> > > > call outside XFS except to dispatch IO. The stack is pretty deep at
> > > > this point and I know that the standard (non stacked) IO stack can
> > > > consume >3kb of stack space when it gets down to having to do memory
> > > > reclaim during GFP_NOIO allocation at the lowest level of SCSI
> > > > drivers. Stack overruns typically show up with symptoms like we are
> > > > seeing.
> > > > ..
> > > >
> > > > Dave, before chasing ghosts, can you (like Eric originally asked)
> > > > turn on stack overrun detection?
> > >
> > > CONFIG_DEBUG_STACKOVERFLOW ? Already turned on.
> > That only checks stack usage when an interrupt is taken. If no
> > interrupts are taken when stack usage is within 128 bytes of
> > overflow, then it doesn't catch it.
> > I tend to use CONFIG_DEBUG_STACK_USAGE=y as it records the maximum
> > stack usage of a process via canary overwrites and it records it in
> > do_exit().
> I had that on too. The only message from it came from quite a while
> before the trace that happened overnight..
Right, it won't capture an overrun at the point in time an overrun
occurs, either, because it only checks when the process exits. But
it does tell you what stack usage is being seen, as this:
> [ 3415.655125] trinity-c0 (4383) used greatest stack depth: 992 bytes left
> [12900.804230] BUG: sleeping function called from invalid context at
is a pretty a good indication that trinity is at risk of stack