netdev
[Top] [All Lists]

Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack]

To: andrewm@xxxxxxxxxx, kuznet@xxxxxxxxxxxxx
Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack]
From: Bob Felderman <feldy@xxxxxxxx>
Date: Tue, 6 Mar 2001 10:54:22 -0800 (PST)
Cc: feldy@xxxxxxxx, netdev@xxxxxxxxxxx
Sender: owner-netdev@xxxxxxxxxxx
=> 
=> I am looking now. Probably, it is some silly misprint in ip_fragment.c.
=> 
=> 
=> The problem with Bob's original report was that in the first lines 
=> he reported an illegal kfree_skb with skb->list!=NULL, called
=> from ip_rcv(). This can be only bug in driver, nothing more.
=> 
=> Actually, Bob, if you will say me that you found why this happened,
=> my enthusiasm in reauditing ip_fragment.c will grow just fantastically. 8)

I have added in spinlocks for the interrupt routine and the transmit side.
I don't see more stability and the oopses I sent yesterday are using the 
spinlocked code.

If you don't see the skb->list anymore, it is probably because of the 
spinlocks, 
but I'm certainly seeing bad behavior.

I generate the problem with a udp test from netperf (http://www.netperf.org)
I've attached my
        udp_range 
script. Run it as
        udp_range  <dest_host>


I also do this

echo "1048576" > /proc/sys/net/core/rmem_max
echo "1048576" > /proc/sys/net/core/wmem_max
echo "1048576" > /proc/sys/net/core/wmem_default
echo "1048576" > /proc/sys/net/core/rmem_default
echo "1048576" > /proc/sys/net/core/optmem_max

Our network is really fast. When the machine is stable I can sustain
a 1.5Gigabit/sec udp stream. It might be possible to reproduce this
using 100Mbit ethernet, but it might require 1gbit ethernet with jumbo
frames. I'm currently using a 9000 bytes MTU. Our interrupt routine
will deliver only a single ethernet packet to the higher levels
for each interrupt, so maybe that also stresses the IP fragmentation
code.

I don't see this problem on a linux-2.2 box and I don't see it when
I remove one of the processors from the receiver on my setup.




#!/bin/sh
#
# udp_range
#
# generate a whole lot of numbers from netperf to see the effects
# of send size on thruput
#

#
# usage : udp_range hostname
#

if [ $# -gt 1 ]; then
        echo "try again, correctly -> udp_range hostname"
        exit 1
fi

#
# some params
#
if [ $# -eq 1 ]; then
        REMHOST=$1
else
        echo "try again, correctly -> udp_range hostname"
        exit 1
fi

# where is netperf
NETHOME=.

BUFSIZE="-s 2062144 -S 2062144"
#BUFSIZE="-s  2147484 -S 2147484"
#BUFSIZE="-s 1048576 -S 1048576"
#BUFSIZE="-s 524288 -S 524288"
#BUFSIZE="-s 262144 -S 262144"
#BUFSIZE="-s 131072 -S 131072"
#BUFSIZE="-s 65535 -S 65535"
#BUFSIZE="-s 49152 -S 49152"
#BUFSIZE="-s 49152 -S 131072"
#BUFSIZE="-S 65536"


TIME="10"
#
# some stuff for the arithmatic 
#
# we start at start, and then multiply by MULT and add ADD. by changing
# these numbers, we can double each time, or increase by a fixed ammount
#
START=32768
END=4

DIV=2

ADD=0

# Do we wish to measure CPU utilization?
LOC_CPU=""
REM_CPU=""
#LOC_CPU="-c"
#REM_CPU="-C"

# If we are measuring CPU utilization, then we can save beaucoup
# time by saving the results of the CPU calibration and passing
# them in during the real tests. So, we execute the new CPU "tests"
# of netperf and put the values into shell vars.
case $LOC_CPU in
\-c) LOC_RATE=`$NETHOME/netperf -t LOC_CPU`;;
*) LOC_RATE=""
esac

case $REM_CPU in
\-C) REM_RATE=`$NETHOME/netperf -t REM_CPU -H $REMHOST`;;
*) REM_RATE=""
esac


# after the first datapoint, we don't want more headers
# but we want one for the first one
NO_HDR=""


MESSAGE=$START
while [ $MESSAGE -ge $END ]; do
        $NETHOME/netperf -p 9100 -l $TIME -H $REMHOST -t UDP_STREAM\
          $LOC_CPU $LOC_RATE $REM_CPU $REM_RATE $NO_HDR --\
          -m $MESSAGE $BUFSIZE
        NO_HDR="-P 0"
        MESSAGE=`expr $MESSAGE + $ADD`
        MESSAGE=`expr $MESSAGE \/ $DIV`
done



<Prev in Thread] Current Thread [Next in Thread>