netdev
[Top] [All Lists]

Re: do_IRQ: stack overflow: 872..

To: David Woodhouse <dwmw2@xxxxxxxxxxxxx>, Bart De Schuymer <bdschuym@xxxxxxxxxx>
Subject: Re: do_IRQ: stack overflow: 872..
From: Stephen Hemminger <shemminger@xxxxxxxx>
Date: Fri, 7 Jan 2005 10:00:17 -0800
Cc: Andi Kleen <ak@xxxxxxx>, Crazy AMD K7 <snort2004@xxxxxxx>, bridge@xxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <1105117559.11753.34.camel@xxxxxxxxxxxxxxxxxxxxxxx>
Organization: Open Source Development Lab
References: <1131604877.20041218092730@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <p73zn0ccaee.fsf@xxxxxxxxxxxxx> <1105117559.11753.34.camel@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
On Fri, 07 Jan 2005 17:05:59 +0000
David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:

> On Sat, 2004-12-18 at 08:50 +0100, Andi Kleen wrote:
> > It's not really an oops, just a warning that stack space got quiet
> > tight.
> > 
> > The problem seems to be that the br netfilter code is nesting far too
> > deeply and recursing several times. Looks like a design bug to me,
> > it shouldn't do that.
> 
> I don't think it's recursing -- I think the stack trace is just a bit
> noisy. The problem is that the bridge code, especially with br_netfilter
> in the equation, is implicated in code paths which are just _too_ deep.
> This happens when you're bridging packets received in an interrupt while
> you were deep in journalling code, and it's also been seen with a call
> trace something like nfs->sunrpc->ip->bridge->br_netfilter.

Sounds like an argument for interrupt stacks.

> One option might be to make br_dev_xmit() just queue the packet rather
> than trying to deliver it to all the slave devices immediately. Then the
> actual retransmission can be handled from a context where we're _not_
> short of stack; perhaps from a dedicated kernel thread. 

Probably the solution would be to handle it in the filter code
that way if we are not filtering, we can use the interrupt path,
but if filtering just defer to a safer context (like soft irq).

> Unfortunately that approach would introduce a lot of latency on all
> packets we pass. Another option would be to have all architectures
> provide a stack_available() function and for br_dev_xmit() to queue the
> packet only if we're short of stack, while still sending most packets
> immediately. 

NO, that looks like a testablity and portablity nightmare.

<Prev in Thread] Current Thread [Next in Thread>