From owner-stp@oss.sgi.com Wed Jan 3 00:05:41 2001 Received: by oss.sgi.com id ; Wed, 3 Jan 2001 00:05:21 -0800 Received: from mailhub.iastate.edu ([129.186.1.102]:13576 "EHLO mailhub.iastate.edu") by oss.sgi.com with ESMTP id ; Wed, 3 Jan 2001 00:05:12 -0800 Received: from Debug (webmail-10.iastate.edu [129.186.1.82]) by mailhub.iastate.edu (8.9.3/8.9.3) with SMTP id CAA21056 for ; Wed, 3 Jan 2001 02:05:11 -0600 Message-Id: <200101030805.CAA21056@mailhub.iastate.edu> To: stp@oss.sgi.com From: Weiyi Chen Subject: OS bypass Date: Wed, 3 Jan 2001 02:05:09 CST6DST X-Mailer: Endymion MailMan Professional Edition v3.0.14 Sender: owner-stp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;stp-outgoing Hi, Is there anyone working on STP OS bypass code? Its low latency characteristic looks more important in MPI library. Weiyi From owner-stp@oss.sgi.com Tue Jan 9 10:52:06 2001 Received: by oss.sgi.com id ; Tue, 9 Jan 2001 10:51:46 -0800 Received: from mailgate.igd.fhg.de ([192.44.32.11]:43250 "EHLO mailgate.igd.fhg.de") by oss.sgi.com with ESMTP id ; Tue, 9 Jan 2001 10:51:31 -0800 Received: from HeinrichSchiff.igd.fhg.de ([153.97.150.5]) by mailgate.igd.fhg.de (Netscape Messaging Server 3.6) with ESMTP id AAA16C7 for ; Tue, 9 Jan 2001 19:50:45 +0100 Received: from hokusai.igd.fhg.de.igd.fhg.de (hokusai.igd.fhg.de [146.140.4.36]) by HeinrichSchiff.igd.fhg.de (8.9.3+Sun/8.9.3) with ESMTP id TAA08831 for ; Tue, 9 Jan 2001 19:51:24 +0100 (MET) From: Dirk Reiners Received: by hokusai.igd.fhg.de.igd.fhg.de (980427.SGI.8.8.8/SMI-4.0) id TAA78483; Tue, 9 Jan 2001 19:50:17 +0100 (MET) Date: Tue, 9 Jan 2001 19:50:17 +0100 (MET) Message-Id: <1010109195016.ZM1254496@hokusai.igd.fhg.de> X-Face: "`A\#m^;_fF4zDC3eD@[pKCui5i.FQgNnQRYt[l7o[*M0tF5*@vI$(t1;}B+~t;s\&esfOu+<3\Lg/y"wyG]w'Z"K4j0-[u-~jw^D7{I;7BUU'hvnvF:~O1KGjjRoHO9/]5.@Y>~[v:km#3+c|+Rlk{LP"S~TunjL7MoGUMeTlJD?ciwXYP X-Orcpt: rfc822;stp-outgoing Hello everybody, I have a (probably stupid) question, but I need to know anyway. We want to build an asymmetric network, a bunch clients with 100 MBit connected to a Switch, whose GBit uplink is connected to a master. Now if we're using STP on this, will it benefit from the firmware in the GBit master or not? If STP is wrapped in TCP on the 100 MBit side it probably wouldn't, but if it creates real STP packets it might. Any comments on this situation? Has anybody tried something similar, with what results? We need to broadcast data from the master and collect data from all the clients quickly, and STP seems like a good idea to reduce overhead and latency. Thanks Dirk -- -- -- Dirk Reiners reiners@igd.fhg.de, Dirk.Reiners@gmx.net -- OpenSG Forum http://www.opensg.org -- Rundeturmstrasse 6 http://www.igd.fhg.de/~reiners -- D-64283 Darmstadt All standard disclaimers apply. -- Truth is stranger than fiction because fiction has to make sense. From owner-stp@oss.sgi.com Thu Jan 11 02:12:59 2001 Received: by oss.sgi.com id ; Thu, 11 Jan 2001 02:12:49 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:29447 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Thu, 11 Jan 2001 02:12:29 -0800 Received: from lxplus005.cern.ch (IDENT:root@lxplus005.cern.ch [137.138.161.122]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id LAA00509; Thu, 11 Jan 2001 11:12:21 +0100 (MET) Received: from localhost (ppieta@localhost) by lxplus005.cern.ch (8.9.3/8.9.3) with ESMTP id LAA08464; Thu, 11 Jan 2001 11:12:20 +0100 X-Authentication-Warning: lxplus005.cern.ch: ppieta owned process doing -bs Date: Thu, 11 Jan 2001 11:12:20 +0100 (CET) From: Pekka Pietikainen X-Sender: ppieta@lxplus005.cern.ch To: Dirk Reiners cc: stp@oss.sgi.com Subject: Re: Mixed mode In-Reply-To: <1010109195016.ZM1254496@hokusai.igd.fhg.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-stp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;stp-outgoing > Now if we're using STP on this, will it benefit from the firmware in the > GBit master or not? If STP is wrapped in TCP on the 100 MBit side it > probably wouldn't, but if it creates real STP packets it might. Hi The modified firmware only benefits receives, although STP on Linux also does zero-copy transmits with some driver changes, which help a bit. You can get the same for TCP too these days, and I will probably change STP to use the same infrastructure as there's now a "stable" release of it out at ftp://ftp.kernel.org/pub/linux/kernel/people/davem/. > > Any comments on this situation? Has anybody tried something similar, with > what results? We need to broadcast data from the master and collect data > from all the clients quickly, and STP seems like a good idea to reduce > overhead and latency. If you're sending the same data to all of the clients, and there's a lot of them some kind of reliable multicast protocol might be a good idea (latency might be a problem with these, though), I haven't followed this area recently, though. Otherwise STP might do the job, although without hardware assist it won't perform that much better than TCP as far as CPU use and bandwidth are concerned (especially on 100baseT, which is slow enough that modern machines have no problems dealing with it). From owner-stp@oss.sgi.com Thu Jan 11 08:24:11 2001 Received: by oss.sgi.com id ; Thu, 11 Jan 2001 08:24:02 -0800 Received: from laime.cs.uchicago.edu ([128.135.11.244]:35773 "EHLO laime.cs.uchicago.edu") by oss.sgi.com with ESMTP id ; Thu, 11 Jan 2001 08:24:02 -0800 Received: from candide.cs.uchicago.edu (candide.cs.uchicago.edu [128.135.11.62]) by laime.cs.uchicago.edu (8.10.2/8.9.3) with SMTP id f0BGO0004373 for ; Thu, 11 Jan 2001 10:24:01 -0600 (CST) Received: by candide.cs.uchicago.edu (5.57/4.7) id AA25357; Thu, 11 Jan 01 10:22:30 -0600 Message-Id: <10101111622.AA25357@candide.cs.uchicago.edu> To: stp@oss.sgi.com Subject: Re: Mixed mode In-Reply-To: Message from Pekka Pietikainen of "Thu, 11 Jan 2001 11:12:20 +0100." References: Date: Thu, 11 Jan 2001 10:23:48 -0600 From: Stephen Bailey Sender: owner-stp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;stp-outgoing Dirk, > Otherwise STP might do the job, although without hardware assist it won't > perform that much better than TCP as far as CPU use and bandwidth are > concerned (especially on 100baseT, which is slow enough that modern > machines have no problems dealing with it). It sounds like you are talking about a 1 Gb source and 100 Mb sinks. In this case, you must be careful that you bound the ST block sizes for your ST Write sequences so you don't overwhelm the available elasticity buffering in your network infrastructure. Put another way, if you try to send from 1 Gb source to 10 100 Mb sinks, and you burst (your ST block size) 4 MB at a time, and your switch can only absorb 1 MB, you will almost certainly end up with almost 3 MB worth of data thrown on the floor. ST will retry, but it will fail each and every time because it's retry granularity is an ST block. Put another way, ST doesn't do adaptive congestion avoidance. You can solve this problem by tuning down your block sizes, but smaller blocks means higher CPU overhead, and this still only works well if the network is known to be operating in a steady state (quiescent is nice). If there is an increase in traffic flow to one of the 100 Mb hosts, you'll end up losing again. In this case, you will have to use an ST block size of ~100 KB ( * 10 = 1MB). To hide latency on a per flow basis, you might also want to have two outstanding CTSs, in which case you're talking about 50 KB blocks. The 10 * 100 Mb sources to 1 Gb sink works better, for the obvious reason. In general, STP only works unconditionally well with equally sized source and sink pipes and a non-blocking fabric. Still, for all this, if you can get your traffic balanced right, STP will probably be two decimal orders of magnitude more efficient than TCP. This doesn't matter for the 100 Mb hosts, but it's very significant for the 1 Gb host. Steph From owner-stp@oss.sgi.com Thu Jan 11 14:08:04 2001 Received: by oss.sgi.com id ; Thu, 11 Jan 2001 14:07:54 -0800 Received: from mailgate.igd.fhg.de ([192.44.32.11]:28041 "EHLO mailgate.igd.fhg.de") by oss.sgi.com with ESMTP id ; Thu, 11 Jan 2001 14:07:25 -0800 Received: from HeinrichSchiff.igd.fhg.de ([153.97.150.5]) by mailgate.igd.fhg.de (Netscape Messaging Server 3.6) with ESMTP id AAA324A for ; Thu, 11 Jan 2001 23:06:39 +0100 Received: from hokusai.igd.fhg.de.igd.fhg.de (hokusai.igd.fhg.de [146.140.4.36]) by HeinrichSchiff.igd.fhg.de (8.9.3+Sun/8.9.3) with ESMTP id XAA22461 for ; Thu, 11 Jan 2001 23:07:17 +0100 (MET) From: Dirk Reiners Received: by hokusai.igd.fhg.de.igd.fhg.de (980427.SGI.8.8.8/SMI-4.0) id XAA01801; Thu, 11 Jan 2001 23:06:12 +0100 (MET) Date: Thu, 11 Jan 2001 23:06:12 +0100 (MET) Message-Id: <1010111230612.ZM1800@hokusai.igd.fhg.de> In-Reply-To: Pekka Pietikainen "Re: Mixed mode" (Jan 11, 11:12am) References: <10101111622.AA25357@candide.cs.uchicago.edu> In-Reply-To: Stephen Bailey "Re: Mixed mode" (Jan 11, 10:23am) X-Face: "`A\#m^;_fF4zDC3eD@[pKCui5i.FQgNnQRYt[l7o[*M0tF5*@vI$(t1;}B+~t;s\&esfOu+<3\Lg/y"wyG]w'Z"K4j0-[u-~jw^D7{I;7BUU'hvnvF:~O1KGjjRoHO9/]5.@Y>~[v:km#3+c|+Rlk{LP"S~TunjL7MoGUMeTlJD?ciwXYP X-Orcpt: rfc822;stp-outgoing Hi everybody, thanks for your info so far. There are still some aspects that need clarifying, though. On Jan 11, 11:12am, Pekka Pietikainen wrote: > Subject: Re: Mixed mode > > The modified firmware only benefits receives, although STP on Linux > also does zero-copy transmits with some driver changes, which help > a bit. You can get the same for TCP too these days, and I will probably > change STP to use the same infrastructure as there's now a > "stable" release of it out at > ftp://ftp.kernel.org/pub/linux/kernel/people/davem/. Hmm, can you give me some pointers on the state of it and how to use it? Google just gave me links to kernel mailing list archives from 1999 (where it was rejected) and Sep 2000 (which says it's in there, but kernel use only for TUX). There was a diploma thesis at ETH Zurich that claims to get 60MB/s using z-c TCP, but with buffer aligment limitations (sounds nice anyway). > If you're sending the same data to all of the clients, and there's > a lot of them some kind of reliable multicast protocol might be a good > idea (latency might be a problem with these, though), I haven't followed > this area recently, though. Ok, I should have been more specific. My problem is a little different. I have two possible setups. In the first I send a little data to all the clients, and will probably use some sort of multicast or broadcast for it (the network is dedicated, just me and nothing else). The clients work on that and generate pretty large blocks of data (images, worst case ~10MB apiece, typical ~1MB), that are sent back to the server and processed. I just hope the latency on the multicast is not too bad, but I'm guessing the bigger problem here is getting the data back from the clients in time, and I hope that STP will help me there, as copying this kind of data hurts for every single copy. The second situation needs STP for synchronisation of the clients, so that they can all act at the same time, as close as possible together. This I'm pretty sure I can't do with TCP due to latency, but STP should help here. > Otherwise STP might do the job, although without hardware assist it won't > perform that much better than TCP as far as CPU use and bandwidth are > concerned (especially on 100baseT, which is slow enough that modern > machines have no problems dealing with it). I'm thinking of using the 3C905 for the 100MBit and 3C985 for the GBit. Are that sensible choices? Money is not the prime concern right now, as this is more of a feasibility study than a product. For the switch we're looking at some 12 or 16 100 Mbit port + 1 or 2Gbit port Cisco (Catalyst 3500? don't remember right now), because it cooperates best with the rest of the inhouse network. Any experience here, be it good or bad? >-- End of excerpt from Pekka Pietikainen On Jan 11, 10:23am, Stephen Bailey wrote: > Subject: Re: Mixed mode > > It sounds like you are talking about a 1 Gb source and 100 Mb sinks. > In this case, you must be careful that you bound the ST block sizes > for your ST Write sequences so you don't overwhelm the available > elasticity buffering in your network infrastructure. Actually, it's the other way around, see above. My fault for not being clear the first time. > Put another way, if you try to send from 1 Gb source to 10 100 Mb > sinks, and you burst (your ST block size) 4 MB at a time, and your > switch can only absorb 1 MB, you will almost certainly end up with > almost 3 MB worth of data thrown on the floor. ST will retry, but it > will fail each and every time because it's retry granularity is an ST > block. Hm, I guess I can have that problem in my case, too. > Put another way, ST doesn't do adaptive congestion avoidance. > > You can solve this problem by tuning down your block sizes, but > smaller blocks means higher CPU overhead, and this still only works > well if the network is known to be operating in a steady state > (quiescent is nice). If there is an increase in traffic flow to one > of the 100 Mb hosts, you'll end up losing again. As the network is dedicated that's ok. And as I write all the software myself I can do some traffic shaping to prevent all the clients from sending at the same time, but I'd prefer not having to do that. > In this case, you will have to use an ST block size of ~100 KB ( * 10 > = 1MB). To hide latency on a per flow basis, you might also want to > have two outstanding CTSs, in which case you're talking about 50 KB > blocks. > > The 10 * 100 Mb sources to 1 Gb sink works better, for the obvious > reason. In general, STP only works unconditionally well with equally > sized source and sink pipes and a non-blocking fabric. Ok, that sounds better, as that's what I have. > Still, for all this, if you can get your traffic balanced right, STP > will probably be two decimal orders of magnitude more efficient than > TCP. This doesn't matter for the 100 Mb hosts, but it's very > significant for the 1 Gb host. > >-- End of excerpt from Stephen Bailey This sounds very good. The Gbit host doesn't have to do a lot of work on the data, but at the data rates I'd like to get every little bit counts. So it looks like it can actually work nicely using STP. Now I need to kick my boss to actually buy all the stuff... Thanks for your help Dirk -- -- -- Dirk Reiners reiners@igd.fhg.de, Dirk.Reiners@gmx.net -- OpenSG Forum http://www.opensg.org -- Rundeturmstrasse 6 http://www.igd.fhg.de/~reiners -- D-64283 Darmstadt All standard disclaimers apply. -- Truth is stranger than fiction because fiction has to make sense. From owner-stp@oss.sgi.com Wed Jan 24 09:17:06 2001 Received: by oss.sgi.com id ; Wed, 24 Jan 2001 09:16:46 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:9743 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Wed, 24 Jan 2001 09:16:18 -0800 Received: from lxplus029.cern.ch (IDENT:root@lxplus029.cern.ch [137.138.161.106]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id SAA18348 for ; Wed, 24 Jan 2001 18:16:11 +0100 (MET) Received: from localhost (ppieta@localhost) by lxplus029.cern.ch (8.9.3/8.9.3) with ESMTP id SAA05581 for ; Wed, 24 Jan 2001 18:16:11 +0100 X-Authentication-Warning: lxplus029.cern.ch: ppieta owned process doing -bs Date: Wed, 24 Jan 2001 18:16:11 +0100 (CET) From: Pekka Pietikainen X-Sender: ppieta@lxplus029.cern.ch To: stp@oss.sgi.com Subject: new stuff Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-stp@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;stp-outgoing I put a version of STP based on the 2.4.x zerocopy TCP patches from ftp.kernel.org/pub/linux/kernel/people/davem. The diff can be found from http://ppieta.home.cern.ch/ppieta/stpdiff-2.4.1pre8-zc.gz This removes the need for CONFIG_STP_DIRECT completely, and thus makes the code a lot prettier. Also the STP-accelerated acenic driver has been split into two parts, acenic.c and acenic_egast.c (and as can be seen from a diff drivers/net/acenic.c net/stp/drivers/acenic/acenic.c the changes are quite small now) The firmware included is the one that works on 512k boards. A small caveat, I have seen the acenic driver on the receiver lock-up a few times with this version (this also has happened with the non-STP zerocopy acenic driver, so there's still something funny there...) Technically it should work with a kernel without the zero-copy patches too, although I haven't tested this...