From owner-stp@oss.sgi.com Wed Jun 28 14:17:19 2000 Received: by oss.sgi.com id ; Wed, 28 Jun 2000 14:17:09 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:27956 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 28 Jun 2000 14:16:43 -0700 Received: from lhotse.engr.sgi.com (lhotse.engr.sgi.com [163.154.35.41]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id OAA17281 for ; Wed, 28 Jun 2000 14:11:54 -0700 (PDT) mail_from (aman@cthulhu.engr.sgi.com) Received: from engr.sgi.com (localhost [127.0.0.1]) by lhotse.engr.sgi.com (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id OAA64089; Wed, 28 Jun 2000 14:16:22 -0700 (PDT) Message-ID: <395A6B26.17301FD9@engr.sgi.com> Date: Wed, 28 Jun 2000 14:16:22 -0700 From: Aman Singla Organization: SGI X-Mailer: Mozilla 4.74b2C-SGI [en] (X11; I; IRIX 6.5 IP32) X-Accept-Language: en MIME-Version: 1.0 To: Val Henson CC: stp@oss.sgi.com Subject: Re: Linux STP - connection timed out? References: <20000628135427.V2335@esscom.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-stp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;stp-outgoing Content-Length: 4540 Lines: 111 Hi Val- It sounds like your problem is related to the time-out and retranmission timers associated with the ST stack. The GbE NIC is a difficult thing to tune (more on that later). What could be happening is that - due to bad tuning you aren't getting good performance from the NIC - causing packets to get delayed and retransmission timers kicking in and saturating the network - further degrading the performance. Now it could be a tuning problem, or you might be genuinely losing packets on the network. Linux is notorious for quietly discarding packets from the transmit queue if the queue length is exceeded (tunable for the driver with the txqueuelen ioctl) - and ST without tiling in the NIC tends to generate a lot of packets in bursts (a block tiled into multiple stu's ends up creating a lot of packets). The other parameter to tune are those related to the interrupt coalescing on the NIC (there are 4 related parameters there - the driver code talks about them). Finally the tx vs. rx buffer sizes on the NIC are also tunable. Unfortunately there are no algorithms to determine timer values for STP (like in TCP); they are static for now - you can try changing those values in stp/core/stp_timers.h Of course - it could be entirely something else dropping the packets.. but way too many packets need to be dropped for the connection to timeout. You can build the STP module with debugging turned on; loading the module by passing the appropriate debug flag parameters will throw enough information on the console to help you understand the problem better. The ST stack puts some information about timeouts and stuff into /proc/net/stp and /proc/net/sockstat. It is also possible to turn on profiling and get more information about the stack behavior (this requires compiling the kernel with NET_PROFILing config turned on.. Val Henson wrote: > > Hello, > > I work for Essential, that HiPPI network company. Brad Allen asked me > to benchmark your ST for Linux stuff. I got everything working with > Linux 2.3.99-pre2 and modified NetPIPE 2.3 > (http://www.scl.ameslab.gov/netpipe/) slightly to use STP instead of > TCP. Unfortunately, I'm having a hard time benchmarking because I > keep getting "Connection timed out" errors on my reads. Is this an > inherent limitation of ST or can it be fixed? It usually bombs out > around 16K packet sizes, but it can time out anywhere from 8K to 64K. > > The perror() message is: > > NetPIPE: Connection timed out > > I'm attaching a tar file which you can use to reproduce this > problem. Just > > 1. Untar/gunzip it somewhere > 2. Change this line in the Makefile to have your receiving hostname > > ./NPtcp -P -t -h _hostname_of_receiver__change_me_ > > 3. On the receiving host: > > make receiver > > 4. On the sending host: > > make sender > > I turned on the debugging in ST and was no wiser: > > stvd_input: discarding duplicate STU 0 for B_num 3316453428 > stvd_input: discarding duplicate STU 0 for B_num 3308970036 > stvd_input: discarding duplicate STU 0 for B_num 3311700020 > stvd_input: discarding duplicate STU 0 for B_num 3308976180 > stvd_input: discarding duplicate STU 0 for B_num 3311702068 > stvd_input: discarding duplicate STU 0 for B_num 3308972084 > stvd_input: discarding duplicate STU 0 for B_num 3348811828 > stvd_input: discarding duplicate STU 0 for B_num 3311704116 > stvd_input: discarding duplicate STU 0 for B_num 3311697972 > st_do_input: discarding DATA on bad R_id 0x12a2 > > A few details about my setup, let me know if you need more: > > 2 SMP i586 hosts connected back-to-back > Linux 2.3.99-pre2 with these patches applied: > patch_stp-0.2a_lk-2.3.99-pre2.gz > patch_stp-0.2a-1 > Alteon Acenic with following bootup messages from the driver: > > acenic.c: v0.42 03/02/2000 Jes Sorensen, linux-acenic@SunSITE.auc.dk > http://home.cern.ch/~jes/gige/acenic.html > eth1: Alteon AceNIC Gigabit Ethernet at 0xfe100000, irq 18 > Tigon II (Rev. 6), Firmware: 12.4.5, MAC: 00:60:cf:20:38:f6 > PCI bus width: 32 bits, speed: 33MHz, latency: 64 clks > Disabling PCI memory write and invalidate > Enabling PCI Fast Back to Back > eth1: Firmware up and running > stp_device_attach(eth1): attaching ST support > eth1: Optical link UP > > -VAL > > ------------------------------------------------------------------------ > Name: netpipe.tar.gz > netpipe.tar.gz Type: Unix Tape Archive (application/x-tar) > Encoding: base64