[Top] [All Lists]

Oops in arp_rcv, patch

To: netdev@xxxxxxxxxxx
Subject: Oops in arp_rcv, patch
From: Jacek Konieczny <jajcus@xxxxxxx>
Date: Wed, 4 Jul 2001 16:46:54 +0200
Cc: pld-devel-en@xxxxxxxxxx
Mail-followup-to: netdev@xxxxxxxxxxx, pld-devel-en@xxxxxxxxxx
Sender: owner-netdev@xxxxxxxxxxx
User-agent: Mutt/1.3.18i

One of my router has rebooted a lot last days. I couldn't find the
reason as the oops were not logged, and during most of crashes there was
noone at the console. But finnaly I got the oops on a serial console.
After decoding the oops and examining kernel sources I found the problem
 --- it was neigh_release() function which failed. Everywhere else in
the code its argument is protected against being NULL, but not in the
one place. Here is my patch:
===== cut ====
--- linux/net/ipv4/arp.c.orig   Thu Jun 28 17:29:10 2001
+++ linux/net/ipv4/arp.c        Tue Jul  3 19:37:25 2001
@@ -738,7 +738,7 @@
                            (addr_type == RTN_UNICAST  && rt-> != dev 
                             (IN_DEV_PROXY_ARP(in_dev) || 
pneigh_lookup(&arp_tbl, &tip, dev, 0)))) {
                                n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
-                               neigh_release(n);
+                               if (n) neigh_release(n);
                                if (skb->stamp.tv_sec == 0 ||
                                    skb->pkt_type == PACKET_HOST ||

The bug cames out when proxy-arp is configured. It seems number of
entries in ARP table matters to (on my host "ip nieghb show|wc" gives
more than 1000), or it may be number of ethernet ports (I have 10).

The buggy code seems unchanged in 2.4.5 kernel.

Here is the decoded oops:
ksymoops 2.4.1 on i686 2.2.19.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /lib/modules/2.2.19-16/ (specified)
     -o /lib/modules/2.2.19/ (default)
     -m /boot/ (specified)

Error (expand_objects): cannot stat(/lib/ext2.o) for ext2
Error (expand_objects): cannot stat(/lib/ide-disk.o) for ide-disk
Error (expand_objects): cannot stat(/lib/ide-probe-mod.o) for ide-probe-mod
Error (expand_objects): cannot stat(/lib/ide-mod.o) for ide-mod
Error (regular_file): read_lsmod /lib/modules/2.2.19-16/ is not a regular file, 
Warning (map_ksym_to_module): cannot match loaded module ext2 to a unique 
module object.  Trace may not be reliable.
Oops: 0002
CPU:    0
EIP:    0010:[<c016d089>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 00000000   ebx: 00000000   ecx: 5343e3d5   edx: 00000401
esi: c62a7430   edi: ca4d71f0   ebp: c62a7438   esp: c0211f04
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=c0211000)
Stack: c0200494 c0210608 5343e3d5 c0211f20 c0211f24 cb8f0750 00017b80 5343e3d5
       5143e3d5 c0148038 c7bf6640 ca4d71f0 c0200494 00000001 c023d8e4 0003c15b
       c0211f60 c7bf6640 0003c15b c011a269 00000000 c0210000 c010b56a 00001000
Call Trace: [<c0148038>] [<c011a269>] [<c010b56a>] [<c010b230>] [<c01088dd>]
           [<c0106000>] [<c010890
       [<c010a008>] [<c0106000>] [<c0106077>] [<c0106000>] [<c0100175>]
Code: ff 4b 2c 0f 94 c0 84 c0 74 0f 83 7b 04 00 75 09 53 e8 a9 cd

>>EIP; c016d089 <arp_rcv+2c9/3d4>   <=====
Trace; c0148038 <net_bh+1a0/200>
Trace; c011a269 <do_bottom_half+49/70>
Trace; c010b56a <do_IRQ+3a/3c>
Trace; c010b230 <common_interrupt+18/20>
Trace; c01088dd <cpu_idle+5d/6c>
Trace; c0106000 <get_options+0/70>
Trace; c010a008 <system_call+34/38>
Trace; c0106000 <get_options+0/70>
Trace; c0106077 <cpu_idle+7/18>
Trace; c0106000 <get_options+0/70>
Trace; c0100175 <L6+0/2>
Code;  c016d089 <arp_rcv+2c9/3d4>
00000000 <_EIP>:
Code;  c016d089 <arp_rcv+2c9/3d4>   <=====
   0:   ff 4b 2c                  decl   0x2c(%ebx)   <=====
Code;  c016d08c <arp_rcv+2cc/3d4>
   3:   0f 94 c0                  sete   %al
Code;  c016d08f <arp_rcv+2cf/3d4>
   6:   84 c0                     test   %al,%al
Code;  c016d091 <arp_rcv+2d1/3d4>
   8:   74 0f                     je     19 <_EIP+0x19> c016d0a2 
Code;  c016d093 <arp_rcv+2d3/3d4>
   a:   83 7b 04 00               cmpl   $0x0,0x4(%ebx)
Code;  c016d097 <arp_rcv+2d7/3d4>
   e:   75 09                     jne    19 <_EIP+0x19> c016d0a2 
Code;  c016d099 <arp_rcv+2d9/3d4>
  10:   53                        push   %ebx
Code;  c016d09a <arp_rcv+2da/3d4>
  11:   e8 a9 cd 00 00            call   cdbf <_EIP+0xcdbf> c0179e48 

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing

1 warning and 5 errors issued.  Results may not be reliable.

<Prev in Thread] Current Thread [Next in Thread>