netdev
[Top] [All Lists]

Kernel Panic for FTP with FreeBSD

To: <netdev@xxxxxxxxxxx>
Subject: Kernel Panic for FTP with FreeBSD
From: "Gregory Parrott" <gparrott@xxxxxxxxxx>
Date: Mon, 9 Apr 2001 17:09:19 -0400
Sender: owner-netdev@xxxxxxxxxxx
Thread-index: AcC4ix6lOUkN/tU8SNKH6uWkDYYATQIqVNbQ
Thread-topic: Linux Hardware/Software IP Stack Integration for a TOE
I was surprised that I only received a few replies to my original
note...  oh, well.

WARNING: long post follows.  I apologize, but I could use some help.

I am running mostly vanilla 2.2.14 (I say mostly because I have modified
a few routines to be able to printk some useful data.)  The scenario is
that I am running my TOE in "dumb" packet mode and passing it IP data
that it then sends on the network.  When we do Linux-to-Linux FTP
sessions, everything works great.  However, when we try
FreeBSD-to-Linux, the Linux machine panics.  At the packet layer,
FreeBSD has sent a SYN and Linux attempts to send a SYN/ACK back.  Once
the transmit complete interrupt kicks in, we crash trying to free the
skb.  The difference between the Linux SYN and the FreeBSD SYN is
essentially FreeBSD does not send SACK, timestamp, window scale, and a
null that Linux sends for TCP options.

Here is a sample of what I get on the serial console...

WaveNICHandleRawReceives: TotalPacketLength=60 DataLength=46
WaveNICHandleRawReceives: EnetHdr: 00 00 1d 31 da b8 00 60 1d 09 00 01
08 00
WaveNICHandleRawReceives:   0: 45 00 00 2e 40 15 40 00 40 06 d2 42 0a 0a
0a 1a
WaveNICHandleRawReceives:  16: 0a 0a 0a 45 04 00 00 15 72 e8 1a da 00 00
00 00
WaveNIC: Entering wnic_DevStartTransmit line 857
wnic_DevStartTransmit: Socket Buffer Address cea8c370
        Next Socket buffer 00000000
        Previous Socket buffer 00000000
        Socket buffer list 00000000
        Socket cf710f70
        Pointer to Device ce8a5830
        Data length 58
        Pointer to Head ce8a5c70
        Pointer to Tail ce8a5d20
        Pointer to Data ce8a5ce6
        Pointer to End ce8a5d20
        Protocol 0x8
        Data=00-00-1d-31-da-c6-00-00-1d-31-da-b8
        Data=08-00-45-00-00-2c-00-b9-40-00-40-06
        Data=11-a1-0a-0a-0a-45-0a-0a-0a-1a-00-15
WaveNICSendPackets: Packet from IP, Length=58 VirtualAddress=ce8a5ce6
SegmentLength=176
WaveNICSendPacket: Type = RAW_IP
WaveNICSendPacket: FragmentCount=1, Fragment1 VirtualAddress=ce8a5cf0
WaveNICSendPacket: Packet = cea8c370.
WaveNICSendPacket: 45 00 00 2c 00 b9 40 00 40 06 11 a1 0a 0a 0a 45
WaveNICSendPacket: 0a 0a 0a 1a 00 15 04 00 19 ae 29 89 72 e8 1a db
WaveNICSendPacket: 60 12 7d 78 1d 1c 00 00 02 04 05 b4 01 00 00 00
WaveNICSendPacket: 00 00 00 00 00 00 00 00 4c 00 08 00 3c 00 02 00
WaveNICSendPacket: 63 1d 00 04 10 00 2e 00 00 00 00 08 01 00 00 00
WaveNICCompleteRawSendData: PacketType=RAW_IP
OsFreePacket: Socket Buffer Address cea8c370
        Next Socket buffer 00000000
        Previous Socket buffer 00000000
        Socket buffer list 00000000
        Socket cf710f70
        Pointer to Device ce8a5830
        Data length 58
        Pointer to Head ce8a5c70
        Pointer to Tail ce8a5d20
        Pointer to Data ce8a5ce6
        Pointer to End ce8a5d20
        Protocol 0x8
        Data=00-00-1d-31-da-c6-00-00-1d-31-45-00
        Data=00-2c-00-b9-40-00-40-06-11-a1-0a-0a
        Data=0a-45-0a-0a-0a-1a-00-15-04-00-19-ae
__kfree_skb_wgp: Entering __kfree_skb().
__kfree_skb_wgp: Calling dst_release(cd1e1400).
__kfree_skb_wgp: Calling skb->destructor(cea8c370).
__kfree_skb_wgp: Calling skb_headerinit(cea8c370, NULL, 0).
__kfree_skb_wgp: Calling kfree_skbmem_wgp(cea8c370).
Entering kfree_skbmem_wgp.
kfree_skbmem_wgp: calling kfree(ce8a5c70).
Unable to handle kernel paging request at virtual address 030a0114

This is meant to illustrate the SYN coming in, the SYN/ACK going out,
followed by the attempted kfree_skb.  You will notice that the data
looks a bit different when I print the skb structure the second time.
This is because I have to align the IP data on an 8-byte boundary prior
to asking the adapter to move the data over.  We do not change any
fields in the skb for this.  Not a problem for Linux-to-Linux FTP (or
more accurately TCP session establishment.)  Why does FreeBSD give us a
fit?

Finally, I'll include the ksymoops output.  I have been looking at this
for several days now and have not made any real progress (although I
have learned an awful lot about setting up a serial console and making
kernel routines visible to modules!)  Obviously, ecx has a bad value in
it.  My next steps are to continue digging deeper into kfree(), but I
was hoping someone could possibly shed some light to help speed things
up if they have previously seen this kind of error.

[root@gusgus AdapterDriver]# ksymoops -k symbols.txt ftpwgp2.oops
ksymoops 0.7c on i686 2.2.14wgp.  Options used
     -V (default)
     -k symbols.txt (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.2.14wgp/ (default)
     -m /usr/src/linux/System.map (default)

Unable to handle kernel paging request at virtual address 030a0114
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0122770>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010082
eax: 0000010c   ebx: cffff1a0   ecx: 030a010c   edx: ce8a5d7c
esi: ce8a5c70   edi: 00000002   ebp: c023bdb0   esp: c023bd60
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=c023b000)
Stack: cea8c3cc c023bdb0 ce8a5d7c c024ef83 c014f1e4 ce8a5c70 c01eb480
ce8a5c70 
       cea8c370 c014f3ae cea8c370 c01eb680 cea8c370 00000024 00000003
d093fcd0 
       cea8c370 00000001 cea8c370 cc622000 c023bdc8 d0945a3a cea8c370
ca8bc000 
Call Trace: [<c014f1e4>] [<c01eb480>] [<c014f3ae>] [<c01eb680>]
[<d093fcd0>] [<d0945a3a>] [<d094be88>] 
       [<d094b077>] [<d0994000>] [<c016c45d>] [<d0945997>] [<d0994000>]
[<d093f56f>] [<c010b1dd>] [<c010afa2>] 
       [<c010b303>] [<c010afe0>] [<c016c701>] [<c0151210>] [<c01198b9>]
[<c010b31a>] [<c010afe0>] [<c01086cd>] 
       [<c0106000>] [<c01086f0>] [<c010a1a8>] [<c0106000>] [<c0106077>]
[<c0106000>] [<c0100175>] 
Code: 8b 69 08 81 fd 2b 2f c3 a5 0f 85 d1 00 00 00 8b 69 0c 85 ed 

>>EIP; c0122770 <kfree+7c/1ac>   <=====
Trace; c014f1e4 <kfree_skbmem_wgp+3c/78>
Trace; c01eb480 <cprt+4ba0/5de0>
Trace; c014f3ae <__kfree_skb_wgp+e6/f8>
Trace; c01eb680 <cprt+4da0/5de0>
Trace; d093fcd0 <[wnp1_d]OsFreePacket+160/180>
Trace; d0945a3a <[wnp1_d]WaveNICCompleteRawSendData+6a/90>
Trace; d094be88 <[wnp1_d]HandleBufReadDone+38/a0>
Trace; d094b077 <[wnp1_d]TA1000DequeueOds+2d7/540>
Trace; d0994000 <.bss.end+31a1/????>
Trace; c016c45d <tcp_v4_do_rcv+141/16c>
Trace; d0945997 <[wnp1_d]WaveNICISR+87/c0>
Trace; d0994000 <.bss.end+31a1/????>
Trace; d093f56f <[wnp1_d]wnic_DevInterrupt+f/20>
Trace; c010b1dd <handle_IRQ_event+3d/74>
Trace; c010afa2 <do_8259A_IRQ+72/98>
Trace; c010b303 <do_IRQ+23/3c>
Trace; c010afe0 <common_interrupt+18/20>
Trace; c016c701 <tcp_v4_rcv+279/378>
Trace; c0151210 <net_bh+168/1c0>
Trace; c01198b9 <do_bottom_half+49/70>
Trace; c010b31a <do_IRQ+3a/3c>
Trace; c010afe0 <common_interrupt+18/20>
Trace; c01086cd <cpu_idle+55/64>
Trace; c0106000 <get_options+0/70>
Trace; c01086f0 <sys_idle+14/20>
Trace; c010a1a8 <system_call+34/38>
Trace; c0106000 <get_options+0/70>
Trace; c0106077 <cpu_idle+7/18>
Trace; c0106000 <get_options+0/70>
Trace; c0100175 <L6+0/2>
Code;  c0122770 <kfree+7c/1ac>
00000000 <_EIP>:
Code;  c0122770 <kfree+7c/1ac>   <=====
   0:   8b 69 08                  mov    0x8(%ecx),%ebp   <=====
Code;  c0122773 <kfree+7f/1ac>
   3:   81 fd 2b 2f c3 a5         cmp    $0xa5c32f2b,%ebp
Code;  c0122779 <kfree+85/1ac>
   9:   0f 85 d1 00 00 00         jne    e0 <_EIP+0xe0> c0122850
<kfree+15c/1ac>
Code;  c012277f <kfree+8b/1ac>
   f:   8b 69 0c                  mov    0xc(%ecx),%ebp
Code;  c0122782 <kfree+8e/1ac>
  12:   85 ed                     test   %ebp,%ebp

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing
[root@gusgus AdapterDriver]# 

-----Original Message-----
From: Gregory Parrott 
Sent: Thursday, March 29, 2001 3:02 PM
To: netdev@xxxxxxxxxxx
Subject: Linux Hardware/Software IP Stack Integration for a TOE


Hello,

I have been a lurker on this netdev list for the last 6 months and have
learned a lot by simply watching the e-mails go by.  I am quite
impressed with the knowledge floating around out there.  Now I need to
ask for some advice on how to approach my current problem.

I am developing a device driver that supports an adapter that has its
own TCP/UDP/ICMP/IP stack onboard in silicon (some may refer to this as
a transport offload engine).  My challenge is to have the hardware
stack(s) co-exist with the Linux software stack which will have to be
used to support "conventional" NICs.  Since we want this to be
transparent to users at the socket layer (the interface to the chip is
actual socket calls - socket, bind, listen, etc. along with support
functions and methods for transmitting raw packets), I believe this to
be a non-trivial endeavour.  Creating my own protocol family is the easy
way out, but not very useful to the tons of existing socket software.

I have done some research into the 2.2.14 kernel some time back and now
need to move forward with hooking in to the kernel to support my
"hardware" stack.  The driver work that I have done so far in supporting
"conventional" NIC operation has been completed for 2.2.14.  You are
probably asking why I have not graduated to 2.4 kernels.... my responses
would be  1) lack of time in trying to get something to work as it is
and 2) sticking with commercially available distributions when I started
this project.

With this background information, my questions are as follows:

1) Has anyone looked into the problem of supporting multiple stacks in
Linux (the existing software stack with one or more hardware stacks)?
If so, are the research results available?
2) Is now the time to switch from 2.2.14 to 2.4.x to simplify my life?
This will involve converting the existing framework that I have for
supporting "conventional" mode.
3) Where is the best place to hook in?  I could intercept sys_* calls or
I could hook in at the specific protocol (tcp, udp, raw).  My feeling is
it will be a combination of the two.

Multiple stacks introduce interesting problems.  When a user app opens a
socket, a socket has to be opened on all stacks and simultaneous ops
have to be done to each until the socket is bound.  Once bound, the
others need to be cleaned up.  I am sure this is just the tip of the
iceberg.

Any comments and suggestions would be greatly appreciated.

Greg Parrott
Optical Area Networking
Lucent Technologies
919-838-6095
http://www.lucent-optical.com/oan


<Prev in Thread] Current Thread [Next in Thread>