=>
=> I am looking now. Probably, it is some silly misprint in ip_fragment.c.
=>
=>
=> The problem with Bob's original report was that in the first lines
=> he reported an illegal kfree_skb with skb->list!=NULL, called
=> from ip_rcv(). This can be only bug in driver, nothing more.
=>
=> Actually, Bob, if you will say me that you found why this happened,
=> my enthusiasm in reauditing ip_fragment.c will grow just fantastically. 8)
I have added in spinlocks for the interrupt routine and the transmit side.
I don't see more stability and the oopses I sent yesterday are using the
spinlocked code.
If you don't see the skb->list anymore, it is probably because of the
spinlocks,
but I'm certainly seeing bad behavior.
I generate the problem with a udp test from netperf (http://www.netperf.org)
I've attached my
udp_range
script. Run it as
udp_range <dest_host>
I also do this
echo "1048576" > /proc/sys/net/core/rmem_max
echo "1048576" > /proc/sys/net/core/wmem_max
echo "1048576" > /proc/sys/net/core/wmem_default
echo "1048576" > /proc/sys/net/core/rmem_default
echo "1048576" > /proc/sys/net/core/optmem_max
Our network is really fast. When the machine is stable I can sustain
a 1.5Gigabit/sec udp stream. It might be possible to reproduce this
using 100Mbit ethernet, but it might require 1gbit ethernet with jumbo
frames. I'm currently using a 9000 bytes MTU. Our interrupt routine
will deliver only a single ethernet packet to the higher levels
for each interrupt, so maybe that also stresses the IP fragmentation
code.
I don't see this problem on a linux-2.2 box and I don't see it when
I remove one of the processors from the receiver on my setup.
#!/bin/sh
#
# udp_range
#
# generate a whole lot of numbers from netperf to see the effects
# of send size on thruput
#
#
# usage : udp_range hostname
#
if [ $# -gt 1 ]; then
echo "try again, correctly -> udp_range hostname"
exit 1
fi
#
# some params
#
if [ $# -eq 1 ]; then
REMHOST=$1
else
echo "try again, correctly -> udp_range hostname"
exit 1
fi
# where is netperf
NETHOME=.
BUFSIZE="-s 2062144 -S 2062144"
#BUFSIZE="-s 2147484 -S 2147484"
#BUFSIZE="-s 1048576 -S 1048576"
#BUFSIZE="-s 524288 -S 524288"
#BUFSIZE="-s 262144 -S 262144"
#BUFSIZE="-s 131072 -S 131072"
#BUFSIZE="-s 65535 -S 65535"
#BUFSIZE="-s 49152 -S 49152"
#BUFSIZE="-s 49152 -S 131072"
#BUFSIZE="-S 65536"
TIME="10"
#
# some stuff for the arithmatic
#
# we start at start, and then multiply by MULT and add ADD. by changing
# these numbers, we can double each time, or increase by a fixed ammount
#
START=32768
END=4
DIV=2
ADD=0
# Do we wish to measure CPU utilization?
LOC_CPU=""
REM_CPU=""
#LOC_CPU="-c"
#REM_CPU="-C"
# If we are measuring CPU utilization, then we can save beaucoup
# time by saving the results of the CPU calibration and passing
# them in during the real tests. So, we execute the new CPU "tests"
# of netperf and put the values into shell vars.
case $LOC_CPU in
\-c) LOC_RATE=`$NETHOME/netperf -t LOC_CPU`;;
*) LOC_RATE=""
esac
case $REM_CPU in
\-C) REM_RATE=`$NETHOME/netperf -t REM_CPU -H $REMHOST`;;
*) REM_RATE=""
esac
# after the first datapoint, we don't want more headers
# but we want one for the first one
NO_HDR=""
MESSAGE=$START
while [ $MESSAGE -ge $END ]; do
$NETHOME/netperf -p 9100 -l $TIME -H $REMHOST -t UDP_STREAM\
$LOC_CPU $LOC_RATE $REM_CPU $REM_RATE $NO_HDR --\
-m $MESSAGE $BUFSIZE
NO_HDR="-P 0"
MESSAGE=`expr $MESSAGE + $ADD`
MESSAGE=`expr $MESSAGE \/ $DIV`
done
|