Hi,
today the kernel on one of our systems crashed with a kernel panic.
Hardware
--------
IBM Netfinity 7000 M10
Two Intel Pentium III Xeon processors 550 MHz, 512KB Cache each
1 GB RAM
IBM ServeRAID SCSI controller
Olympic Token Ring card
Software
--------
RedHat Linux 6.2 distribution
Kernel 2.2.14-5.0 (RedHat package name)
Did the following kernel patches:
Edited include/asm/shmparam.h. Changed _SHM_ID_BITS to 9.
Edited include/linux/msg.h. Changed MSGMNI to 1024.
Edited include/linux/sem.h. Changed SEMMNI to 1024 and SEMMSL to 512.
Edited include/net/tcp.h. Changed TCP_SYN_RETRIES to 3
The kernel panic occured roughly 80 minutes after system start with
the following message (Written down from the console by hand, might
not be 100% complete)
Warning: kfree_skb passed an skb still on a list (from c0096245)
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU: 3
EIP: 0010:[<80151e27>]
EFLAGS: 00010286
eax: 00000fd0 ebx: b3f769c0 ecx: 00020400 edx: 00000000
esi: b3f769c0 edi: 80235d84 ebp: b3f769c0 esp: 8024bf4c
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=8024b000)
Code: f0 29 42 40 c3 53 8b 5c 24 08 83 7c 24 10 00 75 08 8b 43 50
The system did quite a lot of socket operations in its 80 minutes
of uptime. It port scanned a range of IP addresses.
I locked up the symbol table of the running kernel as described in
Documentation/oops-tracing.txt
80151d4e T sk_alloc
80151d89 T sk_free
80151e02 T sock_wfree
80151e1d T sock_rfree
80151e2c T sock_wmalloc
80151e74 T sock_rmalloc
80151ebc T sock_kmalloc
The function that caused the panic seems to be sock_rfree. Following
is the disasembled function taken from the kernel
0x80151e1d : mov 0x4(%esp,1),%eax
0x80151e21 : mov 0xc(%eax),%edx
0x80151e24 : mov 0x78(%eax),%eax
0x80151e27 : lock sub %eax,0x40(%edx)
0x80151e2b : ret
The funtion is implemented in net/core/sock.c and looks as follows:
void sock_rfree(struct sk_buff *skb)
{
struct sock *sk = skb->sk;
atomic_sub(skb->truesize, &sk->rmem_alloc);
}
Now I am not sure on how to proceed.
Could this be some kind of race condition that occurs on multiprocessor
systems?
Is this error already known?
---
MfG/kind regards, Stefan Steinert
IBM R&D Germany
Tel.: (49) +49 7031 16 2173
Fax: (49) +49 7031 16 3328
e-mail: stefan.steinert@xxxxxxxxxx
|