On 2/12/14, 12:10 AM, Dave Chinner wrote:
> On Wed, Feb 12, 2014 at 12:50:27AM -0500, Dave Jones wrote:
>> On Wed, Feb 12, 2014 at 04:40:43PM +1100, Dave Chinner wrote:
>> > None of the XFS code disables interrupts in that path, not does is
>> > call outside XFS except to dispatch IO. The stack is pretty deep at
>> > this point and I know that the standard (non stacked) IO stack can
>> > consume >3kb of stack space when it gets down to having to do memory
>> > reclaim during GFP_NOIO allocation at the lowest level of SCSI
>> > drivers. Stack overruns typically show up with symptoms like we are
>> > seeing.
>> > ..
>> > Dave, before chasing ghosts, can you (like Eric originally asked)
>> > turn on stack overrun detection?
>> CONFIG_DEBUG_STACKOVERFLOW ? Already turned on.
> That only checks stack usage when an interrupt is taken. If no
> interrupts are taken when stack usage is within 128 bytes of
> overflow, then it doesn't catch it.
> I tend to use CONFIG_DEBUG_STACK_USAGE=y as it records the maximum
> stack usage of a process via canary overwrites and it records it in
> do_exit(). I also use the stack tracer to record the largest stack
> usage seen so I know exactly what code paths are approaching stack
I'm not sure if I'm off base here, but maybe this would make sense: check
for a corrupted stack in __might_sleep. Compile tested only,
possibly inelegant, and/or completely wrong, but:
From: Eric Sandeen <sandeen@xxxxxxxxxx>
sched: Test for corrupted task_struct in __might_sleep
If a thread overruns the stack, it may corrupt the task_struct,
leading to false positives on tests like irqs_disabled().
Warn if this seems to be the case.
Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b46131e..6920c3c 100644
@@ -6934,6 +6934,8 @@ static inline int preempt_count_equals(int preempt_offset)
void __might_sleep(const char *file, int line, int preempt_offset)
+ struct task_struct *tsk = current;
+ unsigned long *stackend;
static unsigned long prev_jiffy; /* ratelimiting */
rcu_sleep_check(); /* WARN_ON_ONCE() by default, no rate limit reqd. */
@@ -6952,6 +6954,11 @@ void __might_sleep(const char *file, int line, int
+ /* A corrupted stack can cause a false positive on irqs_disabled etc */
+ stackend = end_of_stack(tsk);
+ if (tsk != &init_task && *stackend != STACK_END_MAGIC)
+ printk(KERN_EMERG "Thread overran stack, or stack corrupted\n");