netdev
[Top] [All Lists]

Re: Kernel crash in 2.6.0-test9-mm3

To: Reuben Farrelly <reuben-linux@xxxxxxxx>
Subject: Re: Kernel crash in 2.6.0-test9-mm3
From: Krishna Kumar <kumarkr@xxxxxxxxxx>
Date: Tue, 18 Nov 2003 18:22:42 -0800
Cc: Andrew Morton <akpm@xxxxxxxx>, "David S. Miller" <davem@xxxxxxxxxx>, netdev@xxxxxxxxxxx
Sender: netdev-bounce@xxxxxxxxxxx



Could this be happening on an SMP system only ? If so, e100intr routine
services tx queues (e100_tx_srv)
without holding a lock. Can't multiple rx interrupts be scheduled on
different cpus at the same time, and
each execute dev_kfree_skb_irq() which decrements the ref count too many
times ? But the softirq handler
(net_tx_action) seems to clean up the skb once as the dec_test returns 1
only if count is zero, so I don't see
where the dst ref is being decremented wrongly in this case.

Can someone explain why the intr handler doesn't need locks to stop other
intr on different cpu's from going
through the same devices memory at the same time ?

Thanks,

- KK



|---------+---------------------------->
|         |           Reuben Farrelly  |
|         |           <reuben-linux@reu|
|         |           b.net>           |
|         |           Sent by:         |
|         |           netdev-bounce@oss|
|         |           .sgi.com         |
|         |                            |
|         |                            |
|         |           11/18/2003 05:22 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  
>-----------------------------------------------------------------------------------------------------------------|
  |                                                                             
                                    |
  |       To:       "David S. Miller" <davem@xxxxxxxxxx>, Andrew Morton 
<akpm@xxxxxxxx>                             |
  |       cc:       netdev@xxxxxxxxxxx                                          
                                    |
  |       Subject:  Re: Kernel crash in 2.6.0-test9-mm3                         
                                    |
  |                                                                             
                                    |
  
>-----------------------------------------------------------------------------------------------------------------|




FWIW I'm compiling with:

[root@tornado log]# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.3.2/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--disable-checking --with-system-zlib --enable-__cxa_atexit
--host=i386-redhat-linux
Thread model: posix
gcc version 3.3.2 20031107 (Red Hat Linux 3.3.2-2)
[root@tornado log]#

Reuben


At 13:49 19/11/2003, David S. Miller wrote:
>On Tue, 18 Nov 2003 11:01:39 -0800
>Andrew Morton <akpm@xxxxxxxx> wrote:
>
> > It's one for the networking guys.
> >
> > The mm kernels have a patch which detects when atomic_dec_and_test
> > takes an atomic_t negative - it is assumed that this is a bug so
> > a warning is generated.
>
>Andrew I've analyzed this a bit.  This is incredible evidence in
>these dumps that either there is a bug in Linus's atomic_dec_and_test()
>debugging hack or GCC is miscompiling it in certain cases with certain
>versions of the compiler.
>
>Look at this:
>
> > > Nov 18 23:09:00 tornado kernel:  [<c029203c>]
> skb_release_data+0x14c/0x160
> > > Nov 18 23:09:00 tornado kernel:  [<c0292063>] kfree_skbmem+0x13/0x30
> > > Nov 18 23:09:00 tornado kernel:  [<c0292138>] __kfree_skb+0xb8/0x1b0
> > > Nov 18 23:09:00 tornado kernel:  [<c0218815>] e100intr+0x1e5/0x290
>
>Ok, releasing an SKB data area twice.
>
> > > Nov 18 23:09:00 tornado kernel: BUG: dst underflow 0: c02921ef
>
>Freeing a 'dst' entry one too many times.
>
> > > Nov 18 23:09:00 tornado kernel: Attempt to release alive inet socket
> dfd4c780
>
>A socket refcount dropping to zero too early, before it's marked dead.
>
>These last two problems are very serious errors, and would have
>printed out debugging messages before the atomic_dec_and_test() patch.
>If these last two messages don't show up without the
>atomic_dec_and_test() debugging patch applied, well there you
>go... :-)
>
>In that debugging patch, I'm wondering something about x86.
>When one goes "sete %reg; sets %reg" does the first 'sete' modify
>the condition codes by chance?  Probably not...






<Prev in Thread] Current Thread [Next in Thread>