Hi Val-
It sounds like your problem is related to the time-out
and retranmission timers associated with the ST stack.
The GbE NIC is a difficult thing to tune (more on that
later). What could be happening is that - due to bad
tuning you aren't getting good performance from the NIC
- causing packets to get delayed and retransmission
timers kicking in and saturating the network - further
degrading the performance.
Now it could be a tuning problem, or you might be genuinely
losing packets on the network. Linux is notorious for quietly
discarding packets from the transmit queue if the queue
length is exceeded (tunable for the driver with the txqueuelen
ioctl) - and ST without tiling in the NIC tends to generate
a lot of packets in bursts (a block tiled into multiple stu's
ends up creating a lot of packets).
The other parameter to tune are those related to the interrupt
coalescing on the NIC (there are 4 related parameters there -
the driver code talks about them).
Finally the tx vs. rx buffer sizes on the NIC are also tunable.
Unfortunately there are no algorithms to determine timer
values for STP (like in TCP); they are static for now - you
can try changing those values in stp/core/stp_timers.h
Of course - it could be entirely something else dropping the
packets.. but way too many packets need to be dropped for
the connection to timeout. You can build the STP module with
debugging turned on; loading the module by passing the appropriate
debug flag parameters will throw enough information on the
console to help you understand the problem better.
The ST stack puts some information about timeouts and stuff
into /proc/net/stp and /proc/net/sockstat. It is also possible
to turn on profiling and get more information about the stack
behavior (this requires compiling the kernel with NET_PROFILing
config turned on..
Val Henson wrote:
>
> Hello,
>
> I work for Essential, that HiPPI network company. Brad Allen asked me
> to benchmark your ST for Linux stuff. I got everything working with
> Linux 2.3.99-pre2 and modified NetPIPE 2.3
> (http://www.scl.ameslab.gov/netpipe/) slightly to use STP instead of
> TCP. Unfortunately, I'm having a hard time benchmarking because I
> keep getting "Connection timed out" errors on my reads. Is this an
> inherent limitation of ST or can it be fixed? It usually bombs out
> around 16K packet sizes, but it can time out anywhere from 8K to 64K.
>
> The perror() message is:
>
> NetPIPE: Connection timed out
>
> I'm attaching a tar file which you can use to reproduce this
> problem. Just
>
> 1. Untar/gunzip it somewhere
> 2. Change this line in the Makefile to have your receiving hostname
>
> ./NPtcp -P -t -h _hostname_of_receiver__change_me_
>
> 3. On the receiving host:
>
> make receiver
>
> 4. On the sending host:
>
> make sender
>
> I turned on the debugging in ST and was no wiser:
>
> stvd_input: discarding duplicate STU 0 for B_num 3316453428
> stvd_input: discarding duplicate STU 0 for B_num 3308970036
> stvd_input: discarding duplicate STU 0 for B_num 3311700020
> stvd_input: discarding duplicate STU 0 for B_num 3308976180
> stvd_input: discarding duplicate STU 0 for B_num 3311702068
> stvd_input: discarding duplicate STU 0 for B_num 3308972084
> stvd_input: discarding duplicate STU 0 for B_num 3348811828
> stvd_input: discarding duplicate STU 0 for B_num 3311704116
> stvd_input: discarding duplicate STU 0 for B_num 3311697972
> st_do_input: discarding DATA on bad R_id 0x12a2
>
> A few details about my setup, let me know if you need more:
>
> 2 SMP i586 hosts connected back-to-back
> Linux 2.3.99-pre2 with these patches applied:
> patch_stp-0.2a_lk-2.3.99-pre2.gz
> patch_stp-0.2a-1
> Alteon Acenic with following bootup messages from the driver:
>
> acenic.c: v0.42 03/02/2000 Jes Sorensen, linux-acenic@xxxxxxxxxxxxxx
> http://home.cern.ch/~jes/gige/acenic.html
> eth1: Alteon AceNIC Gigabit Ethernet at 0xfe100000, irq 18
> Tigon II (Rev. 6), Firmware: 12.4.5, MAC: 00:60:cf:20:38:f6
> PCI bus width: 32 bits, speed: 33MHz, latency: 64 clks
> Disabling PCI memory write and invalidate
> Enabling PCI Fast Back to Back
> eth1: Firmware up and running
> stp_device_attach(eth1): attaching ST support
> eth1: Optical link UP
>
> -VAL
>
> ------------------------------------------------------------------------
> Name: netpipe.tar.gz
> netpipe.tar.gz Type: Unix Tape Archive (application/x-tar)
> Encoding: base64
|