netdev
[Top] [All Lists]

Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack]

To: Andrew Morton <andrewm@xxxxxxxxxx>
Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack]
From: Bob Felderman <feldy@xxxxxxxx>
Date: Mon, 5 Mar 2001 22:18:52 -0800 (PST)
Cc: Bob Felderman <feldy@xxxxxxxx>, jamal <hadi@xxxxxxxxxx>, kuznet@xxxxxxxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <3AA4307F.21698B6C@xxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx

On Tue, 6 Mar 2001, Andrew Morton wrote:

> jamal wrote:
> > 
> > Now this is the problem with proprietary drivers that nobody sees the code
> > for (or maybe hardware that nobody sees specs for)
> 
>    * Permission to use, copy, modify and distribute this software and its   *
>    * documentation in source and binary forms for non-commercial purposes   *
>    * and without fee is hereby granted, provided that the modified software *
>    * is returned to Myricom, Inc. for redistribution.
> 
> So it's not *too* sinful :)
> 
> > You caused people all the pain of trying to decode what your problem is
> > only to find you are making some basic mistakes.
> 
> I don't know if that's proven yet.
> 

Our code has always been "open source" since we started the
company nearly 7 years ago. I feel pretty strongly about that
and we do get lots of help from cutomers because of it.
One reason we don't GPL the code is that the Commerce Dept.
classifies our hardware as "munitions" (I think) and we have
some export restrictions, so they are happier if we pretend
to not give our code away.


OK, I've added in spin locking to serialize the interrupt
routine and any transmits. I've done it my way and using
a patch from Andrew. The effect is the same - no change 
in the basic behavior.

Here's my most recent crash. It looks like both processors
are panicing in the same place?

ksymoops 0.7c on i686 2.4.2.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.2/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Warning (compare_maps): mismatch on symbol __module_author  , gm says d0888ee0, 
sbin/gm says d088ab60.  Ignoring sbin/gm entry
Warning (compare_maps): mismatch on symbol __module_description  , gm says 
d0888eff, sbin/gm says d088ab7f.  Ignoring sbin/gm entry
Warning (compare_maps): mismatch on symbol __module_parm_gm_net_copy_threshold  
, gm says d0888f5c, sbin/gm says d088abdc.  Ignoring sbin/gm entry
Warning (compare_maps): mismatch on symbol __module_parm_gmip_hw_checksum  , gm 
says d0888f44, sbin/gm says d088abc4.  Ignoring sbin/gm entry
invalid operand: 0000
CPU:    0
EIP:    0010:[<c01eb629>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 0000001c   ebx: c63fb280   ecx: c150c000   edx: 00000001
esi: cff07420   edi: c9639da0   ebp: 00000fe7   esp: c029fea0
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c029f000)
Stack: c026dde5 c026df00 00000120 c9639da0 c01f612c c9510d20 cff07420 00000000 
       c01f6b4f cff07420 c9639da0 c9639da0 c9639da0 c9639da0 c92b0040 3782c9c7 
       c029e000 cc042a00 c01f5c50 c9639da0 c9639da0 c92b0040 c9639da0 c9639da0 
Call Trace: [<c01f612c>] [<c01f6b4f>] [<c01f5c50>] [<c01f6011>] [<d08ef8b0>] 
[<c01ed73e>] [<c01193 
       [<c010a99a>] [<c01071c0>] [<c01071c0>] [<c010909c>] [<c01071c0>] 
[<c01071c0>] [<c0100018>]  
       [<c0107252>] [<c0105000>] [<c01001cf>] 
Code: 0f 0b 83 c4 0c e9 bc 00 00 00 8b 4a 28 85 c9 74 08 f0 ff 49 

>>EIP; c01eb629 <__kfree_skb+31/fc>   <=====
Trace; c01f612c <ip_frag_destroy+ac/d0>
Trace; c01f6b4f <ip_defrag+123/184>
Trace; c01f5c50 <ip_local_deliver+1c/114>
Trace; c01f6011 <ip_rcv+2c9/338>
Trace; d08ef8b0 <END_OF_CODE+613bd/????>
Trace; c01ed73e <net_rx_action+17e/278>
Trace; c010a99a <do_IRQ+da/ec>
Trace; c01071c0 <default_idle+0/34>
Trace; c01071c0 <default_idle+0/34>
Trace; c010909c <ret_from_intr+0/20>
Trace; c01071c0 <default_idle+0/34>
Trace; c01071c0 <default_idle+0/34>
Trace; c0100018 <startup_32+18/cb>
Trace; c0107252 <cpu_idle+3e/54>
Trace; c0105000 <empty_bad_page+0/1000>
Trace; c01001cf <L6+0/2>
Code;  c01eb629 <__kfree_skb+31/fc>
00000000 <_EIP>:
Code;  c01eb629 <__kfree_skb+31/fc>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c01eb62b <__kfree_skb+33/fc>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c01eb62e <__kfree_skb+36/fc>
   5:   e9 bc 00 00 00            jmp    c6 <_EIP+0xc6> c01eb6ef 
<__kfree_skb+f7/fc>
Code;  c01eb633 <__kfree_skb+3b/fc>
   a:   8b 4a 28                  mov    0x28(%edx),%ecx
Code;  c01eb636 <__kfree_skb+3e/fc>
   d:   85 c9                     test   %ecx,%ecx
Code;  c01eb638 <__kfree_skb+40/fc>
   f:   74 08                     je     19 <_EIP+0x19> c01eb642 
<__kfree_skb+4a/fc>
Code;  c01eb63a <__kfree_skb+42/fc>
  11:   f0 ff 49 00               lock decl 0x0(%ecx)

invalid operand: 0000
Kernel panic: Aiee, killing interrupt handler!
CPU:    1
EIP:    0010:[<c01eb629>]
EFLAGS: 00010292
eax: 0000001c   ebx: c63fb280   ecx: c150c000   edx: 00000001
esi: cff07420   edi: c1575300   ebp: 00000fe7   esp: c1449e74
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c1449000)
Stack: c026dde5 c026df00 00000120 c1575300 c01f612c c9510d20 cff07420 00000000 
       c01f6b4f cff07420 c1575300 c1575300 c1575300 c1575300 c8b54040 3782c9c7 
       c1448000 cc042a00 c01f5c50 c1575300 c1575300 c8b54040 c1575300 c1575300 
Call Trace: [<c01f612c>] [<c01f6b4f>] [<c01f5c50>] [<c01f6011>] [<d08ef8b0>] 
[<c01ed73e>] [<c01193 
       [<c010a99a>] [<c01071c0>] [<c01071c0>] [<c010909c>] [<c01071c0>] 
[<c01071c0>] [<c0100018>]  
       [<c0107252>] [<c01193aa>] [<c010a99a>] 
Code: 0f 0b 83 c4 0c e9 bc 00 00 00 8b 4a 28 85 c9 74 08 f0 ff 49 

>>EIP; c01eb629 <__kfree_skb+31/fc>   <=====
Trace; c01f612c <ip_frag_destroy+ac/d0>
Trace; c01f6b4f <ip_defrag+123/184>
Trace; c01f5c50 <ip_local_deliver+1c/114>
Trace; c01f6011 <ip_rcv+2c9/338>
Trace; d08ef8b0 <END_OF_CODE+613bd/????>
Trace; c01ed73e <net_rx_action+17e/278>
Trace; c010a99a <do_IRQ+da/ec>
Trace; c01071c0 <default_idle+0/34>
Trace; c01071c0 <default_idle+0/34>
Trace; c010909c <ret_from_intr+0/20>
Trace; c01071c0 <default_idle+0/34>
Trace; c01071c0 <default_idle+0/34>
Trace; c0100018 <startup_32+18/cb>
Trace; c0107252 <cpu_idle+3e/54>
Trace; c01193aa <do_softirq+5a/88>
Trace; c010a99a <do_IRQ+da/ec>
Code;  c01eb629 <__kfree_skb+31/fc>
00000000 <_EIP>:
Code;  c01eb629 <__kfree_skb+31/fc>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c01eb62b <__kfree_skb+33/fc>
   2:   83 c4 0c                  add    $0xc,%esp
Code;  c01eb62e <__kfree_skb+36/fc>
   5:   e9 bc 00 00 00            jmp    c6 <_EIP+0xc6> c01eb6ef 
<__kfree_skb+f7/fc>
Code;  c01eb633 <__kfree_skb+3b/fc>
   a:   8b 4a 28                  mov    0x28(%edx),%ecx
Code;  c01eb636 <__kfree_skb+3e/fc>
   d:   85 c9                     test   %ecx,%ecx
Code;  c01eb638 <__kfree_skb+40/fc>
   f:   74 08                     je     19 <_EIP+0x19> c01eb642 
<__kfree_skb+4a/fc>
Code;  c01eb63a <__kfree_skb+42/fc>
  11:   f0 ff 49 00               lock decl 0x0(%ecx)


5 warnings issued.  Results may not be reliable.


<Prev in Thread] Current Thread [Next in Thread>