From owner-netdev@oss.sgi.com Thu Feb 1 03:20:43 2001 Received: by oss.sgi.com id ; Thu, 1 Feb 2001 03:20:33 -0800 Received: from oxmail3.ox.ac.uk ([129.67.1.180]:5763 "EHLO oxmail.ox.ac.uk") by oss.sgi.com with ESMTP id ; Thu, 1 Feb 2001 03:20:24 -0800 Received: from sable.ox.ac.uk ([163.1.2.4]) by oxmail.ox.ac.uk with esmtp (Exim 3.12 #1) id 14OHmh-0000Dl-00; Thu, 01 Feb 2001 11:20:15 +0000 Received: from mbeattie by sable.ox.ac.uk with local (Exim 3.13 #1) id 14OHmh-0004Yr-00; Thu, 01 Feb 2001 11:20:15 +0000 Date: Thu, 1 Feb 2001 11:20:15 +0000 From: Malcolm Beattie To: "David S. Miller" Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] Fresh zerocopy patch on kernel.org Message-ID: <20010201112014.A27009@sable.ox.ac.uk> References: <14966.35438.429963.405587@pizda.ninka.net> <20010131152653.C13345@sable.ox.ac.uk> <14968.49462.674977.825098@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <14968.49462.674977.825098@pizda.ninka.net>; from davem@redhat.com on Wed, Jan 31, 2001 at 05:51:50PM -0800 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing David S. Miller writes: > > Malcolm Beattie writes: > > David S. Miller writes: > > > > > > At the usual place: > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.1-1.diff.gz > > > > Hmm, disappointing results here; maybe I've missed something. > > As discussed elsewhere there is a %10 to %15 performance hit for > normal write()'s done with the new code. > > If you do your testing using sendfile() as the data source, you'll > results ought to be wildly different and more encouraging. I did say that the ftp test used sendfile() as the data source and it dropped from 86 MB/s to 62 MB/s. Alexey has mailed me suggesting the problem may be that netfilter is turned on. It is indeed turned on in both the 2.4.1 config and the 2.4.1+zc config but maybe it has a far higher detrimental effect in the zerocopy case. I'm currently building new non-netfilter kernels and I'll go through the exercise again. I'm confident I'll end up being impressed with the numbers even if it takes some tweaking to get there :-) --Malcolm -- Malcolm Beattie Unix Systems Programmer Oxford University Computing Services From owner-netdev@oss.sgi.com Thu Feb 1 03:28:13 2001 Received: by oss.sgi.com id ; Thu, 1 Feb 2001 03:28:03 -0800 Received: from pizda.ninka.net ([216.101.162.242]:22937 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 1 Feb 2001 03:27:59 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id DAA19712; Thu, 1 Feb 2001 03:25:55 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14969.18371.677242.728885@pizda.ninka.net> Date: Thu, 1 Feb 2001 03:25:55 -0800 (PST) To: Malcolm Beattie Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] Fresh zerocopy patch on kernel.org In-Reply-To: <20010201112014.A27009@sable.ox.ac.uk> References: <14966.35438.429963.405587@pizda.ninka.net> <20010131152653.C13345@sable.ox.ac.uk> <14968.49462.674977.825098@pizda.ninka.net> <20010201112014.A27009@sable.ox.ac.uk> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Malcolm Beattie writes: > Alexey has mailed me suggesting the problem may be that netfilter > is turned on. Oh yes, netfilter being enabled will cause some performance degradation, that is for sure. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Thu Feb 1 21:39:11 2001 Received: by oss.sgi.com id ; Thu, 1 Feb 2001 21:38:51 -0800 Received: from pizda.ninka.net ([216.101.162.242]:17280 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 1 Feb 2001 21:38:47 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id VAA01434; Thu, 1 Feb 2001 21:37:20 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14970.18319.991981.729662@pizda.ninka.net> Date: Thu, 1 Feb 2001 21:37:19 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] Zerocopy patch of the day... X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In the usual spot: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.1-2.diff.gz Changes: 1) Merge in 3c59x update from Andrew Morton. I hope Andrew won't mind if people who see problems due to these changes at least CC: him on bug reports? :-) 2) Correct receive buffer space checks during direct user copies. 3) Correct returning of errors in datagram wait_for_packet(), cures DoS discovered with AF_UNIX sockets. And yes, before Mr. Wedgewood asks, the generic fixes (#2 and #3) will be sent to Linus seperately when he returns from NYC. :-) I will soon start keeping a real ChangeLog.zerocopy file going at the same place you get the patches from. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Feb 2 02:05:53 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 02:05:34 -0800 Received: from isis.its.uow.edu.au ([130.130.68.21]:38792 "EHLO isis.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Fri, 2 Feb 2001 02:05:24 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by isis.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id VAA04981; Fri, 2 Feb 2001 21:05:01 +1100 (EST) Message-ID: <3A7A8822.CC5D8E4E@uow.edu.au> Date: Fri, 02 Feb 2001 21:12:50 +1100 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.0-test8 i586) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) References: <3A728475.34CF841@uow.edu.au>, <3A726087.764CC02E@uow.edu.au> <20010126222003.A11994@vitelus.com> <3A728475.34CF841@uow.edu.au> <14966.22671.446439.838872@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing "David S. Miller" wrote: > > ... > Finally, please do some tests on loopback. It is usually a great > way to get "pure software overhead" measurements of our TCP stack. Here we are. TCP and NFS/UDP over lo. Machine is a dual-PII. I didn't bother running CPU utilisation testing while benchmarking loopback, although this may be of some interest for SMP. I just looked at the throughput. Machine is a dual 500MHz PII (again). Memory read bandwidth is 320 meg/sec. Write b/w is 130 meg/sec. The working set is 60 ~300k files, everything cached. We run the following tests: 1: sendfile() to localhost, sender and receiver pinned to separate CPUs 2: sendfile() to localhost, sender and receiver pinned to the same CPU 3: sendfile() to localhost, no explicit pinning. 4, 5, 6: same as above, except we use send() in 8kbyte chunks. Repeat with and without zerocopy patch 2.4.1-2. The receiver reads 64k hunks and throws them away. sendfile() sends the entire file. Also, do an NFS mount of localhost, rsize=wsize=8192, see how long it takes to `cp' a 100 meg file from the "server" to /dev/null. The file is cached on the "server". Do this for the three pinning cases as well - all the NFS kernel processes were pinned as a group and `cp' was the other group. sendfile() send(8k) NFS Mbyte/s Mbyte/s Mbyte/s No explicit bonding 2.4.1: 66600 70000 25600 2.4.1-zc: 208000 69000 25000 Bond client and server to separate CPUs 2.4.1: 66700 68000 27800 2.4.1-zc: 213047 66000 25700 Bond client and server to same CPU: 2.4.1: 56000 57000 23300 2.4.1-zc: 176000 55000 22100 Much the same story. Big increase in sendfile() efficiency, small drop in send() and NFS unchanged. The relative increase in sendfile() efficiency is much higher than with a real NIC, presumably because we've factored out the constant (and large) cost of the device driver. All the bits and pieces to reproduce this are at http://www.uow.edu.au/~andrewm/linux/#zc - From owner-netdev@oss.sgi.com Fri Feb 2 04:15:34 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 04:15:14 -0800 Received: from pat.uio.no ([129.240.130.16]:20984 "EHLO pat.uio.no") by oss.sgi.com with ESMTP id ; Fri, 2 Feb 2001 04:14:48 -0800 Received: from charged.uio.no ([129.240.86.49]) by pat.uio.no with esmtp (Exim 2.12 #7) id 14Of6o-0007V3-00; Fri, 2 Feb 2001 13:14:34 +0100 Received: from trondmy by charged.uio.no with local (Exim 2.12 #1) id 14Of6n-0003Gy-00; Fri, 2 Feb 2001 13:14:33 +0100 To: Andrew Morton Cc: "David S. Miller" , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) References: <3A728475.34CF841@uow.edu.au> <3A726087.764CC02E@uow.edu.au> <20010126222003.A11994@vitelus.com> <3A728475.34CF841@uow.edu.au> <14966.22671.446439.838872@pizda.ninka.net> <3A7A8822.CC5D8E4E@uow.edu.au> From: Trond Myklebust Date: 02 Feb 2001 13:14:33 +0100 In-Reply-To: Andrew Morton's message of "Fri, 02 Feb 2001 21:12:50 +1100" Message-ID: Lines: 18 X-Mailer: Gnus v5.6.45/XEmacs 21.1 - "Channel Islands" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> " " == Andrew Morton writes: > Much the same story. Big increase in sendfile() efficiency, > small drop in send() and NFS unchanged. This is normal. The server doesn't do zero copy reads, but instead copies from the page cache into an NFS-specific buffer using file.f_op->read(). Alexey and Dave's changes are therefore unlikely to register on NFS performance (other than on CPU use as has been mentioned before) until we implement a sendfile-like scheme for knfsd over TCP. I've been wanting to start doing that (and also to finish the client conversion to use the TCP zero-copy), but I'm pretty pressed for time at the moment. Cheers, Trond From owner-netdev@oss.sgi.com Fri Feb 2 09:55:08 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 09:54:49 -0800 Received: from warden.digitalinsight.com ([208.29.163.2]:46745 "HELO warden.diginsite.com") by oss.sgi.com with SMTP id ; Fri, 2 Feb 2001 09:54:33 -0800 Received: from wlvexc01.diginsite.com by warden.diginsite.com via smtpd (for oss.sgi.com [216.32.174.190]) with SMTP; 2 Feb 2001 17:54:33 UT Received: by wlvexc01.diginsite.com with Internet Mail Service (5.5.2653.19) id ; Fri, 2 Feb 2001 09:55:46 -0800 Received: from dlang.diginsite.com ([10.200.255.252]) by viper.digitalinsight.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id DVN97X46; Fri, 2 Feb 2001 09:57:14 -0800 Date: Fri, 2 Feb 2001 09:51:21 -0800 (PST) From: David Lang To: Andrew Morton cc: "David S. Miller" , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) In-Reply-To: <3A7A8822.CC5D8E4E@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I have been watching this thread with interest for a while now, but am wondering about the real-world use of this, given the performance penalty for write() As I see it there are two basic cases you are saying this will help in. 1. webservers 2. other fileservers I also freely admit that I don't know a lot about sendfile() so it may have some capability that makes my concerns meaningless, if so please let me know. 1a. for webservers that server static content (and can therefor use sendfile) I don't see this as significant becouse as your tests have been showing, even a modest machine can saturate your network (unless you are useing gigE at which time it takes a skightly larger machine) 1b. for webservers that are not primarily serving static content, they have to use write() for the output from cgi's, etc and therefor pay the performance penalty without being able to use sendfile() much to get the advantages. These machines are the ones that really need the performance as the cgi's take a significant amount of your cpu. 2. for other fileservers sendfile() sounds like it would be useful if the client is reading the entire file, but what about the cases where the client is reading part of the file, or is writing to the file. In both of these cases it seems that the fileserver is back to the write() penalty. does anyone have stats on the types of requests that fileservers are being asked for? David Lang On Fri, 2 Feb 2001, Andrew Morton wrote: > Date: Fri, 02 Feb 2001 21:12:50 +1100 > From: Andrew Morton > To: David S. Miller > Cc: lkml , > "netdev@oss.sgi.com" > Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) > > "David S. Miller" wrote: > > > > ... > > Finally, please do some tests on loopback. It is usually a great > > way to get "pure software overhead" measurements of our TCP stack. > > Here we are. TCP and NFS/UDP over lo. > > Machine is a dual-PII. I didn't bother running CPU utilisation > testing while benchmarking loopback, although this may be of > some interest for SMP. I just looked at the throughput. > > Machine is a dual 500MHz PII (again). Memory read bandwidth > is 320 meg/sec. Write b/w is 130 meg/sec. The working set > is 60 ~300k files, everything cached. We run the following > tests: > > 1: sendfile() to localhost, sender and receiver pinned to > separate CPUs > > 2: sendfile() to localhost, sender and receiver pinned to > the same CPU > > 3: sendfile() to localhost, no explicit pinning. > > 4, 5, 6: same as above, except we use send() in 8kbyte > chunks. > > Repeat with and without zerocopy patch 2.4.1-2. > > The receiver reads 64k hunks and throws them away. sendfile() > sends the entire file. > > Also, do an NFS mount of localhost, rsize=wsize=8192, see how > long it takes to `cp' a 100 meg file from the "server" to > /dev/null. The file is cached on the "server". Do this for > the three pinning cases as well - all the NFS kernel processes > were pinned as a group and `cp' was the other group. > > > sendfile() send(8k) NFS > Mbyte/s Mbyte/s Mbyte/s > > No explicit bonding > 2.4.1: 66600 70000 25600 > 2.4.1-zc: 208000 69000 25000 > > Bond client and server to separate CPUs > 2.4.1: 66700 68000 27800 > 2.4.1-zc: 213047 66000 25700 > > Bond client and server to same CPU: > 2.4.1: 56000 57000 23300 > 2.4.1-zc: 176000 55000 22100 > > > > Much the same story. Big increase in sendfile() efficiency, > small drop in send() and NFS unchanged. > > The relative increase in sendfile() efficiency is much higher > than with a real NIC, presumably because we've factored out > the constant (and large) cost of the device driver. > > All the bits and pieces to reproduce this are at > > http://www.uow.edu.au/~andrewm/linux/#zc > > - > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > Please read the FAQ at http://www.tux.org/lkml/ > From owner-netdev@oss.sgi.com Fri Feb 2 14:48:10 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 14:47:50 -0800 Received: from pizda.ninka.net ([216.101.162.242]:1928 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Fri, 2 Feb 2001 14:47:31 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id OAA06072; Fri, 2 Feb 2001 14:46:07 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14971.14511.765806.838208@pizda.ninka.net> Date: Fri, 2 Feb 2001 14:46:07 -0800 (PST) To: David Lang Cc: Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) In-Reply-To: References: <3A7A8822.CC5D8E4E@uow.edu.au> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing David Lang writes: > 1a. for webservers that server static content (and can therefor use > sendfile) I don't see this as significant becouse as your tests have been > showing, even a modest machine can saturate your network (unless you are > useing gigE at which time it takes a skightly larger machine) Start using more than one interface, then it begins to become interesting. > 1b. for webservers that are not primarily serving static content, they > have to use write() for the output from cgi's, etc and therefor pay the > performance penalty without being able to use sendfile() much to get the > advantages. These machines are the ones that really need the performance > as the cgi's take a significant amount of your cpu. CGI's can be cached btw if the implementation is clever (f.e. CGI tells the web server that if the file used as input to the CGI does not change then the output from the CGI will not change, meaning CGI output is based solely on input, therefore CGI output can be cached by the web server). > 2. for other fileservers sendfile() sounds like it would be useful if the > client is reading the entire file, but what about the cases where the > client is reading part of the file, or is writing to the file. In both of > these cases it seems that the fileserver is back to the write() penalty. > does anyone have stats on the types of requests that fileservers are being > asked for? It helps no matter what part of the file the client reads. sendfile() can be used on an arbitrary offset+len portion of a file, it is not limited to just sending an entire fire. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Feb 2 15:01:10 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 15:00:50 -0800 Received: from warden.digitalinsight.com ([208.29.163.2]:62086 "HELO warden.diginsite.com") by oss.sgi.com with SMTP id ; Fri, 2 Feb 2001 15:00:47 -0800 Received: from wlvexc01.diginsite.com by warden.diginsite.com via smtpd (for oss.sgi.com [216.32.174.190]) with SMTP; 2 Feb 2001 23:00:47 UT Received: by wlvexc01.diginsite.com with Internet Mail Service (5.5.2653.19) id ; Fri, 2 Feb 2001 15:02:12 -0800 Received: from dlang.diginsite.com ([10.200.255.252]) by viper.digitalinsight.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id DVN98HC8; Fri, 2 Feb 2001 15:03:36 -0800 Date: Fri, 2 Feb 2001 14:57:42 -0800 (PST) From: David Lang To: "David S. Miller" cc: Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) In-Reply-To: <14971.14511.765806.838208@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Thanks, that info on sendfile makes sense for the fileserver situation. for webservers we will have to see (many/most CGI's look at stuff from the client so I still have doubts as to how much use cacheing will be) David Lang On Fri, 2 Feb 2001, David S. Miller wrote: > Date: Fri, 2 Feb 2001 14:46:07 -0800 (PST) > From: David S. Miller > To: David Lang > Cc: Andrew Morton , lkml , > "netdev@oss.sgi.com" > Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) > > > David Lang writes: > > 1a. for webservers that server static content (and can therefor use > > sendfile) I don't see this as significant becouse as your tests have been > > showing, even a modest machine can saturate your network (unless you are > > useing gigE at which time it takes a skightly larger machine) > > Start using more than one interface, then it begins to become > interesting. > > > 1b. for webservers that are not primarily serving static content, they > > have to use write() for the output from cgi's, etc and therefor pay the > > performance penalty without being able to use sendfile() much to get the > > advantages. These machines are the ones that really need the performance > > as the cgi's take a significant amount of your cpu. > > CGI's can be cached btw if the implementation is clever (f.e. CGI > tells the web server that if the file used as input to the CGI does > not change then the output from the CGI will not change, meaning CGI > output is based solely on input, therefore CGI output can be cached > by the web server). > > > 2. for other fileservers sendfile() sounds like it would be useful if the > > client is reading the entire file, but what about the cases where the > > client is reading part of the file, or is writing to the file. In both of > > these cases it seems that the fileserver is back to the write() penalty. > > does anyone have stats on the types of requests that fileservers are being > > asked for? > > It helps no matter what part of the file the client reads. > > sendfile() can be used on an arbitrary offset+len portion of > a file, it is not limited to just sending an entire fire. > > Later, > David S. Miller > davem@redhat.com > From owner-netdev@oss.sgi.com Fri Feb 2 15:11:10 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 15:10:50 -0800 Received: from pizda.ninka.net ([216.101.162.242]:16008 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Fri, 2 Feb 2001 15:10:41 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA06190; Fri, 2 Feb 2001 15:09:13 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14971.15897.432460.25166@pizda.ninka.net> Date: Fri, 2 Feb 2001 15:09:13 -0800 (PST) To: David Lang Cc: Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) In-Reply-To: References: <14971.14511.765806.838208@pizda.ninka.net> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing David Lang writes: > Thanks, that info on sendfile makes sense for the fileserver situation. > for webservers we will have to see (many/most CGI's look at stuff from the > client so I still have doubts as to how much use cacheing will be) Also note that the decreased CPU utilization resulting from zerocopy sendfile leaves more CPU available for CGI execution. This was a point I forgot to make. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Feb 2 15:16:40 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 15:16:21 -0800 Received: from warden.digitalinsight.com ([208.29.163.2]:43178 "HELO warden.diginsite.com") by oss.sgi.com with SMTP id ; Fri, 2 Feb 2001 15:16:12 -0800 Received: from wlvexc01.diginsite.com by warden.diginsite.com via smtpd (for oss.sgi.com [216.32.174.190]) with SMTP; 2 Feb 2001 23:16:11 UT Received: by wlvexc01.diginsite.com with Internet Mail Service (5.5.2653.19) id ; Fri, 2 Feb 2001 15:17:38 -0800 Received: from dlang.diginsite.com ([10.200.255.252]) by viper.digitalinsight.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id DVN98HHT; Fri, 2 Feb 2001 15:19:02 -0800 Date: Fri, 2 Feb 2001 15:13:08 -0800 (PST) From: David Lang To: "David S. Miller" cc: Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) In-Reply-To: <14971.15897.432460.25166@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing right, assuming that there is enough sendfile() benifit to overcome the write() penalty from the stuff that can't be cached or sent from a file. my question was basicly are there enough places where sendfile would actually be used to make it a net gain. David Lang On Fri, 2 Feb 2001, David S. Miller wrote: > Date: Fri, 2 Feb 2001 15:09:13 -0800 (PST) > From: David S. Miller > To: David Lang > Cc: Andrew Morton , lkml , > "netdev@oss.sgi.com" > Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) > > > David Lang writes: > > Thanks, that info on sendfile makes sense for the fileserver situation. > > for webservers we will have to see (many/most CGI's look at stuff from the > > client so I still have doubts as to how much use cacheing will be) > > Also note that the decreased CPU utilization resulting from > zerocopy sendfile leaves more CPU available for CGI execution. > > This was a point I forgot to make. > > Later, > David S. Miller > davem@redhat.com > From owner-netdev@oss.sgi.com Fri Feb 2 15:29:10 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 15:28:51 -0800 Received: from mail.hsnp.com ([205.161.174.10]:20033 "HELO netc.netc.com") by oss.sgi.com with SMTP id ; Fri, 2 Feb 2001 15:28:32 -0800 Received: (qmail 27181 invoked by uid 510); 2 Feb 2001 17:28:26 -0600 (CDT) Date: Fri, 2 Feb 2001 17:28:26 -0600 (CST) From: Jeff Barrow To: David Lang cc: "David S. Miller" , Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Let's see.... all the work being done for clustering would definitely benefit... all the static images on your webserver--and static images makes up most of the bandwidth from web servers (images, activeX controls, java apps, sound clips...)... NFS servers, Samba servers (both of which are used more than you may think)... email servers... Once Real Networks patches their Realserver to use sendfile (which shouldn't bee all that hard), then that would help too.... I think that sendfile can be used in a LOT of applications, and the only ones that wouldn't benefit are mostly low-bandwidth anyway (CGI apps almost always return either a small html file or a small image file, then there's telnet and other interactive utilities...). Most applications that use a lot of bandwidth (and thus a lot of CPU time sending the packets) are capable of being patched to use sendfile. On Fri, 2 Feb 2001, David Lang wrote: > right, assuming that there is enough sendfile() benifit to overcome the > write() penalty from the stuff that can't be cached or sent from a file. > > my question was basicly are there enough places where sendfile would > actually be used to make it a net gain. > > David Lang > > On Fri, 2 Feb 2001, David S. Miller wrote: > > > Date: Fri, 2 Feb 2001 15:09:13 -0800 (PST) > > From: David S. Miller > > To: David Lang > > Cc: Andrew Morton , lkml , > > "netdev@oss.sgi.com" > > Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) > > > > > > David Lang writes: > > > Thanks, that info on sendfile makes sense for the fileserver situation. > > > for webservers we will have to see (many/most CGI's look at stuff from the > > > client so I still have doubts as to how much use cacheing will be) > > > > Also note that the decreased CPU utilization resulting from > > zerocopy sendfile leaves more CPU available for CGI execution. > > > > This was a point I forgot to make. > > > > Later, > > David S. Miller > > davem@redhat.com > > > From owner-netdev@oss.sgi.com Fri Feb 2 15:33:30 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 15:33:11 -0800 Received: from pizda.ninka.net ([216.101.162.242]:26760 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Fri, 2 Feb 2001 15:33:05 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA06277; Fri, 2 Feb 2001 15:31:41 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14971.17245.660605.426205@pizda.ninka.net> Date: Fri, 2 Feb 2001 15:31:41 -0800 (PST) To: David Lang Cc: Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) In-Reply-To: References: <14971.15897.432460.25166@pizda.ninka.net> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing David Lang writes: > right, assuming that there is enough sendfile() benifit to overcome the > write() penalty from the stuff that can't be cached or sent from a file. > > my question was basicly are there enough places where sendfile would > actually be used to make it a net gain. There are non-performance issues as well (really, all of these points have been mentioned in this thread btw). One is that since paged SKBs use only single-order page allocations, the memory allocation subsystem is stressed less than the current scheme where SLAB allocates multi-order pages to satisfy allocations of linear SKB data buffers. This has consequences and benefits system wide. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Feb 2 18:28:24 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 18:28:15 -0800 Received: from orange.csi.cam.ac.uk ([131.111.8.77]:60085 "EHLO orange.csi.cam.ac.uk") by oss.sgi.com with ESMTP id ; Fri, 2 Feb 2001 18:27:54 -0800 Received: from jas88 (helo=localhost) by orange.csi.cam.ac.uk with local-esmtp (Exim 3.22 #1) id 14OsQL-00043O-00; Sat, 03 Feb 2001 02:27:37 +0000 Date: Sat, 3 Feb 2001 02:27:37 +0000 (GMT) From: James Sutherland X-Sender: jas88@orange.csi.cam.ac.uk To: David Lang cc: "David S. Miller" , Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: sendfile+zerocopy: fairly sexy (nothing to do with ECN) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 2 Feb 2001, David Lang wrote: > Thanks, that info on sendfile makes sense for the fileserver situation. > for webservers we will have to see (many/most CGI's look at stuff from the > client so I still have doubts as to how much use cacheing will be) CGI performance isn't directly affected by this - the whole point is to reduce the "cost" of handling static requests to zero (at least, as close as possible) leaving as much CPU as possible for the CGI to use. So sendfile won't help your CGI directly - it will just give your CGI more resources to work with. James. From owner-netdev@oss.sgi.com Fri Feb 2 23:24:26 2001 Received: by oss.sgi.com id ; Fri, 2 Feb 2001 23:24:17 -0800 Received: from pizda.ninka.net ([216.101.162.242]:10368 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Fri, 2 Feb 2001 23:23:51 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id XAA01189; Fri, 2 Feb 2001 23:22:38 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14971.45502.776808.711822@pizda.ninka.net> Date: Fri, 2 Feb 2001 23:22:38 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] Zerocopy 2.4.1 rev 3 X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing You know where to get it: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.1-3.diff.gz Fixes: 1) pskb_expand_tail could corrupt SKB frag lists in some cases, leading to OOPS 2) Need to check for out of window data even in the partial packet cases of tcp_data_queue 3) Merged in some small net fixes from the AC patches. As of this moment, I know of no bugs (ie. corrupts data or crashes kernel) in the zerocopy patches. Some people have asked me about making a patch against the AC patches. It is doable, but would be quite a bit of work for me. If someone would like to do this and put those patches up somewhere, they can feel free to do so. Just let everyone on linux-kernel and netdev know about it. Probably, after the next zerocopy patch revision, I will ask Alan to add the zerocopy stuff to his tree anyways. Things really look good right now. Thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sat Feb 3 04:55:48 2001 Received: by oss.sgi.com id ; Sat, 3 Feb 2001 04:55:37 -0800 Received: from [195.226.187.51] ([195.226.187.51]:49168 "HELO titan.bieringer.de") by oss.sgi.com with SMTP id ; Sat, 3 Feb 2001 04:55:26 -0800 Received: (qmail 21902 invoked from network); 3 Feb 2001 12:55:23 -0000 Received: from p3ee29468.dip.t-dialin.net (HELO worker.bieringer.de) (62.226.148.104) by mail.bieringer.de with SMTP; 3 Feb 2001 12:55:23 -0000 Message-Id: <5.0.2.1.0.20010203135305.02404eb8@mail.bieringer.de> X-Sender: peter@bieringer.de@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Sat, 03 Feb 2001 13:56:42 +0100 To: netdev@oss.sgi.com From: Peter Bieringer Subject: IPv6 & 2.2.18: kernel crashes if removing link-local address Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, perhaps interesting - I've remember the same happens in earlier time: If I shut down the IPv6 interface configuration the kernel crashes, if the automatic assigned (EUI-64) link-local address will be removed (ok, stupid, but can happen...). Most of the time, the kernel dump loops infinitly. Any hints or suggestions how to get more debug information for the kernel developers? TIA, Peter From owner-netdev@oss.sgi.com Sat Feb 3 05:15:28 2001 Received: by oss.sgi.com id ; Sat, 3 Feb 2001 05:15:18 -0800 Received: from [195.226.187.51] ([195.226.187.51]:49680 "HELO titan.bieringer.de") by oss.sgi.com with SMTP id ; Sat, 3 Feb 2001 05:15:06 -0800 Received: (qmail 22007 invoked from network); 3 Feb 2001 13:15:04 -0000 Received: from p3ee29468.dip.t-dialin.net (HELO worker.bieringer.de) (62.226.148.104) by mail.bieringer.de with SMTP; 3 Feb 2001 13:15:04 -0000 Message-Id: <5.0.2.1.0.20010203135649.00b1b090@mail.bieringer.de> X-Sender: peter@bieringer.de@mail.bieringer.de (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Sat, 03 Feb 2001 14:16:23 +0100 To: netdev@oss.sgi.com From: Peter Bieringer Subject: IPv6 & 2.2.17 + 2.4.0: autoconfiguration works only on bootup Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, following scenario: Gateway (running radvd) [00:50:BF:06:B4:F5] and a client with one interface (forwarding disabled) [00:e0:18:90:92:05]. Client boots, neigborhood detection works and take the given prefix from radvd to assign a global address for the one and only interface. 14:00:54.475785 :: > ff02::1:ff90:9205: icmp6: neighbor sol: who has fe80::2e0:18ff:fe90:9205 14:00:58.476075 fe80::2e0:18ff:fe90:9205 > ff02::2: icmp6: router solicitation 14:00:58.502708 fe80::250:bfff:fe06:b4f5 > ff02::1: icmp6: router advertisement 14:00:58.996090 :: > ff02::1:ff90:9205: icmp6: neighbor sol: who has 3ffe:400:100:f101:2e0:18ff:fe90:9205 Ok, 3ffe:400:100:f101:2e0:18ff:fe90:9205 was added on client interface, routing entries are also added. But if I switch down and up the network (completly or only interface), the mechanism won't work: 14:01:22.037731 :: > ff02::1:ff90:9205: icmp6: neighbor sol: who has fe80::2e0:18ff:fe90:9205 14:01:26.038013 fe80::2e0:18ff:fe90:9205 > ff02::2: icmp6: router solicitation 14:01:26.142629 fe80::250:bfff:fe06:b4f5 > ff02::1: icmp6: router advertisement the final "neighbor sol" is missing, neither address nor routing is well configured for global usage. Looks like the client do not accept the advertisement. But "accept_ra" is unchanged, also forwarding on client side - still disabled. Is there a hidden switch somewhere or something else like a retrigger? Happens with 2.2.17 and 2.4.0. Can someone reproduce this? Any hints? TIA, Peter From owner-netdev@oss.sgi.com Sat Feb 3 08:56:59 2001 Received: by oss.sgi.com id ; Sat, 3 Feb 2001 08:56:49 -0800 Received: from chiara.elte.hu ([157.181.150.200]:3848 "HELO chiara.elte.hu") by oss.sgi.com with SMTP id ; Sat, 3 Feb 2001 08:56:34 -0800 Received: by chiara.elte.hu (Postfix, from userid 17806) id 3C787186D; Sat, 3 Feb 2001 17:56:03 +0100 (CET) Date: Sat, 3 Feb 2001 17:54:46 +0100 (CET) From: Ingo Molnar Reply-To: To: "David S. Miller" Cc: Linux Kernel List , , Subject: [patch] Zerocopy 2.4.1 rev3 patch against 2.4.1-ac2 In-Reply-To: <14971.45502.776808.711822@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 2 Feb 2001, David S. Miller wrote: > Some people have asked me about making a patch against the AC patches. > It is doable, but would be quite a bit of work for me. i've done this for TUX anyway, so here is the 2.4.1-rev3 patch against the 2.4.1-ac2 tree: http://people.redhat.com/mingo/davem-zerocopy-patches/zerocopy-2.4.1-ac2-A0.bz2 Ingo From owner-netdev@oss.sgi.com Sat Feb 3 10:09:19 2001 Received: by oss.sgi.com id ; Sat, 3 Feb 2001 10:09:00 -0800 Received: from expanse.dds.nl ([194.109.10.118]:38667 "EHLO expanse.dds.nl") by oss.sgi.com with ESMTP id ; Sat, 3 Feb 2001 10:08:56 -0800 Received: (from ookhoi@localhost) by expanse.dds.nl (8.9.3/8.9.3) id TAA21680 for netdev@oss.sgi.com; Sat, 3 Feb 2001 19:07:58 +0100 Date: Sat, 3 Feb 2001 19:07:58 +0100 From: Ookhoi To: netdev@oss.sgi.com Subject: Fwd: Re: vaio doesn't boot with 2.4.1-ac1, stops at PCI: Probing PCI hardware Message-ID: <20010203190758.O3922@ookhoi.dds.nl> Reply-To: ookhoi@dds.nl Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.1.14i X-Uptime: 12:00pm up 3 days, 23:04, 22 users, load average: 0.72, 0.18, 0.08 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi! I'm sorry to bother you directly, but I don't get any response to the messages below. :-( The problem is that I can't get linux to load a root image. I have an usb floppy drive from which a kernel loads fine, but then I can't load a root image from the floppy drive, and I can't mount / via nfs. Now a mount / via nfs is something which really should work. The problem is that ipconfig tries to configure the nic before the nic is initialized. So it says "IP-Config: No network devices available" (net/ipv4/ipconfig.c), a few seconds later the nic gets initialized (3com 575 cardbus, 3c59x), and another few seconds nfsroot failes immediately because it can't find an nfs server. There are no bootp messages at all. Somebody said that a pcmcia card is a bit delayed at boot time because of a problem with irq (irq storm) or something. That is something which seemes to be true, according to drivers/pcmcia/yenta.c: /* Get the PCMCIA kernel thread to complete the initialisation later. We can't do this here, because, er, because Linus says so :) */ socket->tq_task.routine = yenta_open_bh; socket->tq_task.data = socket; The suggetions in the messages below didn't work. Can you please tell me how to change the order in which the kernel does things at boottime? A pointer to docs is also appreciated of course. And is it the right way to do? I want to move the ipconfig below the nic to make nfsroot work. Or should I mail the pcmcia maintainer to get rid of the delay? Or somebody else? The combo pcmcia, bootp and nfsroot imho should work.. I would love to play with the kernel source, but don't know where to start. Thank you. Ookhoi Date: Fri, 26 Jan 2001 00:00:07 +0100 From: Ookhoi To: linux-kernel@vger.kernel.org Subject: pcmcia delay causes bootp not to work Hi, A few days ago I mailed that I can't get nfsroot to work because bootp tries to do its job before the cardbus card gets initialized. I got a message from somebody who said that the pcmcia devices are a bit delayed at boottime, which is also mentioned in the source. Unfortunately I'm too stupid to be able to change the source code, and get rid of the delay. And unfortunately, the guy who mailed me didn't respond at my cry for help, so now I try the list again. :-) Btw, I use lilo to load the kernel, and also tried: append="rootnfs=192.168.0.1:/usr/remote ip=192.168.0.4:192.168.0.1:192.168.0.1:255.255.255.0:vaio::" Why doesn't the network card gets configured with ip-address 192.168.0.4 ? Can it also be due to the delay? It lookes like the light on the nic lights up at the same time as the error message about nfs server not found. I don't see any traffic at all at the lan. (I'm now almost four weeks fighting my vaio to get linux on it. It is no problem to boot from the usb floppy drive, but then nfsroot doesn't work due to the pcmcia delay, and a root image on floppy doesn't work due to some usb problems. The kernels seemes to find and accept the usb floppy drive just fine, but then I can't make it to load the root image. It seemes as if /dev/sda doesn't get 'connected' with the fdd. I appreciate any help with this of course :-) Ookhoi Subject: Re: bootp starts before network device? > ookhoi@dds.nl said: > > It says: IP-Config: No network devices available. > > a few lines below that the nic (3com 575) is detected. Of course it > > fails to do the nfs mount. > > The kernel delays the initialisation of CardBus sockets to prevent it > from dying in an IRQ storm as soon as it registers the interrupt. The > CardBus sockets don't actually get initialised until later (from > keventd). > > Can you try changing the end of yenta_open() to call yenta_open_bh() > directly instead of queueing via schedule_task(). Thank you for your response. Unfortunately I'm no C expert at all, and I don't understand what to do with this piece of code: drivers/pcmcia/yenta.c:854 /* Get the PCMCIA kernel thread to complete the initialisation later. We can't do this here, because, er, because Linus says so :) */ socket->tq_task.routine = yenta_open_bh; socket->tq_task.data = socket; MOD_INC_USE_COUNT; schedule_task(&socket->tq_task); return 0; It makes perfect sense to what you said about the delay, and the delay makes perfect sense in bootp's complaining. :-) But now this is way over my head.. :-( Can you please help me with what to change? Thanks again! Ookhoi ==== Date: Fri, 26 Jan 2001 01:15:38 +0100 To: David Woodhouse Cc: linux-kernel@vger.kernel.org Subject: Re: pcmcia delay causes bootp not to work Hi David, > Er... no, don't try that patch. It'll oops. Try this instead. > > --- drivers/pcmcia/yenta.c 2000/12/05 13:30:42 1.1.2.23 > +++ drivers/pcmcia/yenta.c 2001/01/25 23:10:35 > @@ -859,7 +859,8 @@ > socket->tq_task.data = socket; > > MOD_INC_USE_COUNT; > - schedule_task(&socket->tq_task); > + // schedule_task(&socket->tq_task); > + yenta_open_bh(socket); > > return 0; > } Thank you. :-) Unfortunately, the bootp message "IP-Config: No network devices available." still comes before the initialisation of the network card, and thus the nf mount still failes. :-( (this is with a clean untarred linux tree, edit and compile, and I double checked the change in drivers/pcmcia/yenta.c) Is there an other way to initialize the nic before bootp kickes in? Ookhoi ----- End forwarded message ----- From owner-netdev@oss.sgi.com Sun Feb 4 09:32:41 2001 Received: by oss.sgi.com id ; Sun, 4 Feb 2001 09:32:21 -0800 Received: from snowball.ucd.ie ([193.1.132.97]:63239 "EHLO snowball.ucd.ie") by oss.sgi.com with ESMTP id ; Sun, 4 Feb 2001 09:32:10 -0800 Received: from genie.ucd.ie ([193.1.132.80]:33554 "EHLO genie.ucd.ie") by snowball.ucd.ie with ESMTP id ; Sun, 4 Feb 2001 17:32:02 +0000 Received: by genie.ucd.ie with local (Exim 3.03 #1 (Debian)) id 14PT16-0004b8-00; Sun, 04 Feb 2001 17:32:00 +0000 Date: Sun, 4 Feb 2001 17:32:00 +0000 From: Nikita Schmidt To: davies@maniac.ultranet.com Cc: netdev@oss.sgi.com Subject: Bug in de4x5 driver Message-ID: <20010204173200.A17660@snowball.ucd.ie> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="CE+1k2dSO48ffgeK" User-Agent: Mutt/1.0.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing --CE+1k2dSO48ffgeK Content-Type: text/plain; charset=us-ascii Hello, I came across a bug in the de4x5 driver for Linux 2.4.1, which causes segfault during initialisation. In srom_search(), the driver walks the device list attached to a PCI bus and eventually finds the list head, which is contained in struct pci_bus, not pci_dev. Then, pci_dev_b() obviously returns nonsense, which eventually results in illegal memory reference. This bug only manifests itself when the network adapter is not on the PCI bus 0. I attach a patch that works for me. Thank you, Nikita --CE+1k2dSO48ffgeK Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="de4x5.diff" --- drivers/net/de4x5.c.orig Sun Feb 4 17:17:57 2001 +++ drivers/net/de4x5.c Sun Feb 4 17:16:32 2001 @@ -2301,6 +2301,9 @@ for (walk = walk->next; walk != &dev->bus_list; walk = walk->next) { struct pci_dev *this_dev = pci_dev_b(walk); + /* Skip the pci_bus list entry */ + if (list_entry(walk, struct pci_bus, devices) == dev->bus) continue; + pb = this_dev->bus->number; vendor = this_dev->vendor; device = this_dev->device << 8; --CE+1k2dSO48ffgeK-- From owner-netdev@oss.sgi.com Sun Feb 4 11:57:02 2001 Received: by oss.sgi.com id ; Sun, 4 Feb 2001 11:56:42 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:12730 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Sun, 4 Feb 2001 11:56:24 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id OAA16311; Sun, 4 Feb 2001 14:48:34 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sun, 4 Feb 2001 14:48:34 -0500 (EST) From: jamal To: Rick Jones cc: Ion Badulescu , Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN) In-Reply-To: <3A77777D.E1A998FC@cup.hp.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 30 Jan 2001, Rick Jones wrote: > > > How does ZC/SG change the nature of the packets presented to the NIC? > > > > what do you mean? I am _sure_ you know how SG/ZC work. So i am suspecting > > more than socratic view on life here. Could be influence from Aristotle;-> > > Well, I don't know the specifics of Linux, but I gather from what I've > read on the list thusfar, that prior to implementing SG support, Linux > NIC drivers would copy packets into single contiguous buffers that were > then sent to the NIC yes? > yes. > If so, the implication is with SG going, that copy no longer takes > place, and so a chain of buffers is given to the NIC. > yes. > Also, if one is fully ZC :) pesky things like protocol headers can > naturally end-up in separate buffers. > yes. > So, now you have to ask how well any given NIC follows chains of > buffers. At what number of buffers is the overhead in the NIC of > following the chains enough to keep it from achieving link-rate? > hmmm... not sure how you would enforce this today or why you would want that. Alexey, Dave? The kernel should be able to break it into two buffers(with netperf, for example -- header + data). Ok, probably with tux-http 3 (header, data, trailler). > One way to try and deduce that would be to meld some of the SG and preSG > behaviours and copy packets into varying numbers of buffers per packet > and measure the resulting impact on throughput through the NIC. > If only time were on my hands i'd love to do this. Alas. NOTE also, that effect would also be an effect of the specif NIC. > rick jones > > As time marches on, the orders of magnitude of the constants may change, > but basic concepts still remain, and the "lessons" learned in the past > by one generation tend to get relearned in the next :) for example - > there is no such a thing as a free lunch... :) ;-> BTW, i am reading one of your papers (circa 1993 ;->, "we go fast with a little help from your apps") in which you make an interesting observation. That (figure 2) there is "a considerable increase in efficiency but not a considerable increase in throughput" .... I "scanned" to the end of the paper and dont see an explanation. I've made a somehow similar observation with the current zc patches and infact observed that throughput goes down with the linux zc patches. [This is being contested but no-one else is testing at gigE, so my word is the only truth]. Of course your paper doesnt talk about sendfile rather the page pinning + COW tricks (which are considered taboo in Linux) but i do sense a relationship. cheers, jamal PS:- I dont have "my" machines yet and i have a feeling it will be a while before i re-run the tests; however, i have created a patch for linux-sendfile with netperf. Please take a look at it at: http://www.cyberus.ca/~hadi/patch-nperf-sfile-linux.gz tell me if is missing anything and if it is ok, could you please merge in your tree? From owner-netdev@oss.sgi.com Sun Feb 4 14:22:41 2001 Received: by oss.sgi.com id ; Sun, 4 Feb 2001 14:22:31 -0800 Received: from mailer3.bham.ac.uk ([147.188.128.54]:15602 "EHLO mailer3.bham.ac.uk") by oss.sgi.com with ESMTP id ; Sun, 4 Feb 2001 14:22:10 -0800 Received: from bham.ac.uk ([147.188.128.127]) by mailer3.bham.ac.uk with esmtp (Exim 3.16 #2) id 14PXXW-0000UL-00; Sun, 04 Feb 2001 22:21:46 +0000 Received: from star.sr.bham.ac.uk ([147.188.32.230] ident=root) by bham.ac.uk with esmtp (Exim 3.16 #3) id 14PXXV-0001w5-00; Sun, 04 Feb 2001 22:21:45 +0000 Received: from pc24 by star.sr.bham.ac.uk (8.9.1b+Sun/SMI-SVR4) id WAA02964; Sun, 4 Feb 2001 22:21:39 GMT Date: Sun, 4 Feb 2001 22:21:39 +0000 (GMT) From: Mark Cooke X-Sender: To: Jamie Lokier cc: Andi Kleen , "Albert D. Cahalan" , John Fremlin , , , , , Subject: Re: [PATCH] dynamic IP support for 2.4.0 (SIOCKILLADDR) In-Reply-To: <20010129193136.A11035@pcep-jamie.cern.ch> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 29 Jan 2001, Jamie Lokier wrote: > Unfortunately getting the same IP is rare now, so I've been toying with > running a PPP tunnel through a fixed host out on the net. The tunnel > would be dropped and recreated with each new connection. My local link > IP would change, but the tunnel IP would not so connections to other > places, ssh etc. would all be from the tunnel IP. ciped is great for this. I use it to tunnel ssh from my home dialup to work. Very stable, and with cipe's shared keys, there's nothing too taxing about setting it up. I just have a call to /etc/init.d/ciped restart in my ppp up script. freeswan was another way I looked at , but ip/sec was horrible at the time and didn't (maybe still doesn't) deal with dynamic ip assignment nicely. Cheers, Mark -- +-------------------------------------------------------------------------+ Mark Cooke The views expressed above are mine and are not Systems Programmer necessarily representative of university policy University Of Birmingham URL: http://www.sr.bham.ac.uk/~mpc/ +-------------------------------------------------------------------------+ From owner-netdev@oss.sgi.com Sun Feb 4 17:02:51 2001 Received: by oss.sgi.com id ; Sun, 4 Feb 2001 17:02:41 -0800 Received: from lox.sandelman.ottawa.on.ca ([209.151.24.2]:54481 "EHLO lox.sandelman.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Sun, 4 Feb 2001 17:02:27 -0800 Received: from nox.sandelman.ottawa.on.ca (nox.sandelman.ottawa.on.ca [209.151.24.6]) by lox.sandelman.ottawa.on.ca (8.8.7/8.8.8) with ESMTP id UAA28605; Sun, 4 Feb 2001 20:01:46 -0500 (EST) Received: from marajade.sandelman.ottawa.on.ca (marajade.sandelman.ottawa.on.ca [209.151.24.20]) by nox.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f151Rtt08203 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK); Sun, 4 Feb 2001 17:27:57 -0800 (PST) Received: from marajade.sandelman.ottawa.on.ca (marajade.sandelman.ottawa.on.ca [127.0.0.1]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f150oKB21645; Sun, 4 Feb 2001 19:50:20 -0500 (EST) Message-Id: <200102050050.f150oKB21645@marajade.sandelman.ottawa.on.ca> To: John Fraizer From: mcr@solidum.com cc: Ben Greear , linux-atm , netdev Subject: Re: packet (ppp) over Sonet in Linux In-reply-to: Your message of "Wed, 31 Jan 2001 01:01:03 EST." Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Sun, 04 Feb 2001 19:50:20 -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "John" == John Fraizer writes: John> Anyone know what the streetprice for those two are though? Around $8K US. I should be getting one in a couple of months. From: "Larsen, Jacob Z (Jacob)" Subject: RE: [Fwd: OptiStar products] Return-Path: jlarsen@lucent.com Delivery-Date: Wed Dec 20 22:43:32 2000 Return-Path: Michael, Sorry about the mail error. Glad we caught up anyway. All you need is to fax a P.O. to our fax number at 972 671 5476. The product comcode is 408270759. The price is $7995. We ship 2 weeks (and it takes usually 24 hours, but sometimes 48). That means that you could have it before the end of the year or early January depending on when we get your P.O. If you have any questions call me at 917 690 1885. Regards, Jacob From owner-netdev@oss.sgi.com Sun Feb 4 21:16:02 2001 Received: by oss.sgi.com id ; Sun, 4 Feb 2001 21:15:42 -0800 Received: from pizda.ninka.net ([216.101.162.242]:4489 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Sun, 4 Feb 2001 21:15:40 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id VAA15689; Sun, 4 Feb 2001 21:13:36 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14974.13952.347437.536161@pizda.ninka.net> Date: Sun, 4 Feb 2001 21:13:36 -0800 (PST) To: jamal Cc: Rick Jones , Ion Badulescu , Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN) In-Reply-To: References: <3A77777D.E1A998FC@cup.hp.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal writes: > > So, now you have to ask how well any given NIC follows chains of > > buffers. At what number of buffers is the overhead in the NIC of > > following the chains enough to keep it from achieving link-rate? > > > > hmmm... not sure how you would enforce this today or why you would > want that. Alexey, Dave? > The kernel should be able to break it into two buffers(with netperf, > for example -- header + data). > Ok, probably with tux-http 3 (header, data, trailler). First, just to make sure Jamal understands what Rick Jones is trying to make note of. He is trying to say that the cost of dealing with extra TX descriptor ring entries can begin to nullify the gains of zerocopy, depending upon HW implementation (both at the NIC and the PCI controller). Back to today, it is possible that this is an issue if your machine is near PCI bandwidth saturation before zerocopy for these tests. I think this may be one of the factors causing Jamal to see results Alexey cannot reproduce. Get two people with identical PCI host bridges, Acenic in identical PCI slot, I bet the numbers begin to jive. Currently, you get "1 + ((MTU + PAGE_SIZE - 1) / PAGE_SIZE)" buffers per packet when going over a zerocopy device using TCP. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Feb 4 21:45:32 2001 Received: by oss.sgi.com id ; Sun, 4 Feb 2001 21:45:23 -0800 Received: from cwcsun41.cwc.nus.edu.sg ([137.132.163.102]:23802 "EHLO cwcsun41.cwc.nus.edu.sg") by oss.sgi.com with ESMTP id ; Sun, 4 Feb 2001 21:45:01 -0800 Received: from yaoqi ([172.16.3.22]) by cwcsun41.cwc.nus.edu.sg (8.9.3/8.9.3) with SMTP id NAA05473 for ; Mon, 5 Feb 2001 13:43:43 +0800 (SGT) Message-ID: <000e01c08f37$22b47940$160310ac@cwc.nus.edu.sg> From: "Yao Qi" To: Subject: ipv6 routing table Date: Mon, 5 Feb 2001 13:47:11 +0800 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_000B_01C08F7A.23A6F420" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. ------=_NextPart_000_000B_01C08F7A.23A6F420 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, I enabled ipv6 in linux kernel 2.2.14. I can use ping6 now, but the = routing table is still in ipv4. Can anybody tell me how to set up ipv6 routing table? Thanks. Yao Qi ------=_NextPart_000_000B_01C08F7A.23A6F420 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi,
 
I enabled ipv6 in linux kernel 2.2.14. = I can use=20 ping6 now, but the routing table is still in ipv4.
Can anybody tell me how to set up ipv6 = routing=20 table?
 
Thanks.
 
Yao Qi
------=_NextPart_000_000B_01C08F7A.23A6F420-- From owner-netdev@oss.sgi.com Sun Feb 4 22:15:33 2001 Received: by oss.sgi.com id ; Sun, 4 Feb 2001 22:15:13 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:21262 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Sun, 4 Feb 2001 22:14:44 -0800 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.8.7) with ESMTP id AAA12620; Mon, 5 Feb 2001 00:21:39 -0700 Message-ID: <3A7E5483.734F6450@candelatech.com> Date: Mon, 05 Feb 2001 00:21:39 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.16 i586) X-Accept-Language: en MIME-Version: 1.0 To: mcr@solidum.com CC: John Fraizer , Ben Greear , linux-atm , netdev Subject: Re: packet (ppp) over Sonet in Linux References: <200102050050.f150oKB21645@marajade.sandelman.ottawa.on.ca> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing mcr@solidum.com wrote: > > >>>>> "John" == John Fraizer writes: > John> Anyone know what the streetprice for those two are though? > > Around $8K US. > > I should be getting one in a couple of months. I'd love to see any performance (like routing bps) numbers that you might find. If I get some first hand, I'll be sure to let you all know :) Thanks, Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Sun Feb 4 22:23:23 2001 Received: by oss.sgi.com id ; Sun, 4 Feb 2001 22:23:13 -0800 Received: from Overkill.EnterZone.Net ([66.35.65.2]:18466 "EHLO Overkill.EnterZone.Net") by oss.sgi.com with ESMTP id ; Sun, 4 Feb 2001 22:23:10 -0800 Received: from localhost (atm@localhost) by Overkill.EnterZone.Net (8.11.0/8.11.0) with ESMTP id f156MZn31933; Mon, 5 Feb 2001 01:22:35 -0500 Date: Mon, 5 Feb 2001 01:22:35 -0500 (EST) From: John Fraizer To: mcr@solidum.com cc: Ben Greear , linux-atm , netdev Subject: Re: packet (ppp) over Sonet in Linux In-Reply-To: <200102050050.f150oKB21645@marajade.sandelman.ottawa.on.ca> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Was that for the OC12 or the OC48? On Sun, 4 Feb 2001 mcr@solidum.com wrote: > > >>>>> "John" == John Fraizer writes: > John> Anyone know what the streetprice for those two are though? > > Around $8K US. > > I should be getting one in a couple of months. > > From: "Larsen, Jacob Z (Jacob)" > Subject: RE: [Fwd: OptiStar products] > Return-Path: jlarsen@lucent.com > Delivery-Date: Wed Dec 20 22:43:32 2000 > Return-Path: > > Michael, > > Sorry about the mail error. Glad we caught up anyway. > > All you need is to fax a P.O. to our fax number at 972 671 5476. > > The product comcode is 408270759. The price is $7995. > > We ship 2 weeks (and it takes usually 24 hours, but sometimes 48). That > means that you could have it before the end of the year or early January > depending on when we get your P.O. > > If you have any questions call me at 917 690 1885. > > Regards, > > Jacob > From owner-netdev@oss.sgi.com Mon Feb 5 00:16:14 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 00:15:54 -0800 Received: from shaku.sfc.wide.ad.jp ([203.178.143.49]:51121 "EHLO shaku.v6.linux.or.jp") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 00:15:34 -0800 Received: from YUMIKO.sfc.wide.ad.jp (dhcpw233.nc.u-tokyo.ac.jp [133.11.123.233]) by shaku.v6.linux.or.jp (8.11.0/3.7W) with ESMTP id f158Dqo28328 for ; Mon, 5 Feb 2001 17:13:53 +0900 Date: Mon, 05 Feb 2001 17:15:08 +0900 Message-ID: From: USAGI Project To: netdev@oss.sgi.com Subject: [ANN] 2nd STABLE release of USAGI Project Reply-To: usagi-core@linux-ipv6.org User-Agent: Wanderlust/1.1.1 (Purple Rain) REMI/1.14.1 (=?ISO-8859-4?Q?Mus?= =?ISO-8859-4?Q?higawa=F2sugi?=) Chao/1.14.0 (Momoyama) APEL/10.2 Emacs/20.7 (i386-*-nt5.0.2195) MULE/4.1 (AOI) Meadow/IPv6-1.13 Beta1++ (TANAHASHI:61) Organization: Keio University MIME-Version: 1.0 (generated by REMI 1.14.1 - =?ISO-8859-4?Q?=22Mushigawa=F2?= =?ISO-8859-4?Q?sugi=22?=) Content-Type: text/plain; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello ! We are glad that we can announce the 2nd STABLE RELEASE of USAGI (UniverSAl playGround for Ipv6) on February 5th, 2001. On this release, we provide only kernel, but two kernels. One is linux-2.2.18-usagi kernel and the other is linux-2.4.0-usagi kernel. The USAGI Project is managed by volunteers and aims to provide better IPv6 environment on Linux freely. We try to improve Linux kernel, IPv6 related libraries and IPv6 applications. Please visit http://www.linux-ipv6.org/ for further details of USAGI project. The improved features are listed below. - Linux kernel-2.2.18-usagi-20010205 Based on Linux kernel-2.2.18, we have improved and impremented + better source address selection, + ICMPv6 Node Information Queries, + SNMP statistics per device, + IPv6 khttpd, + re-joining all-node multicast address on network devices, + enabling double bind on the same port, + support backward compatibility of sin6_scope_id with old glibc, + enabling default route when ipv6 forwarding is enabled, + rejecting invalid ICMPs, + more compliance with RFC regarding Neighbor Discovery Protocol, Stateless Autoaddress Configuration, Multicast Listener Discovery Protocol, + processing extension headers, + INSTALL and Configuration documentation and + bug fixes against original kernel. - Linux kernel-2.4.0-usagi-20010205 Based on Linux kernel-2.4.0, we have improved and implemented same as kernel-2.2.18-usagi-20010205. You can get above source codes from the following URL. ftp://ftp.linux-ipv6.org/pub/usagi/stable/patch/ USAGI Project will release snapshot codes on each two weeks and stable codes on several times a year. We will announce latest information via web page. Please check our web page. BTW, we also provide the binary packages for some distributions. The binary packages will appear after February 12th, 2001. We will provide the packages for the following distribution. RedHat debian Turbo Linux Vine Linux Kondara/MNU Linux By the way, we manage the mailing list for USAGI users. If you have questions or advices, please join the mailing list. For more details, please see http://www.linux-ipv6.org/ml/ . Thanks. Related Web sites. WIDE Project http://www.wide.ad.jp/ KAME Project http://www.kame.net/ TAHI Project http://www.tahi.org/ -- USAGI Project From owner-netdev@oss.sgi.com Mon Feb 5 00:55:13 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 00:54:54 -0800 Received: from gb.bnet.pl ([212.160.188.33]:1776 "HELO nic.nigdzie") by oss.sgi.com with SMTP id ; Mon, 5 Feb 2001 00:54:25 -0800 Received: (qmail 6116 invoked by uid 0); 4 Feb 2001 15:48:46 -0000 Received: (qmail 5817 invoked by uid 500); 4 Feb 2001 15:31:25 -0000 Date: Sun, 4 Feb 2001 16:31:25 +0100 From: Jacek Konieczny To: Peter Bieringer Cc: netdev@oss.sgi.com Subject: Re: IPv6 & 2.2.17 + 2.4.0: autoconfiguration works only on bootup Message-ID: <20010204163125.B2993@nic.nigdzie> References: <5.0.2.1.0.20010203135649.00b1b090@mail.bieringer.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <5.0.2.1.0.20010203135649.00b1b090@mail.bieringer.de>; from pb@bieringer.de on Sat, Feb 03, 2001 at 02:16:23PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, Feb 03, 2001 at 02:16:23PM +0100, Peter Bieringer wrote: > Hi, > > following scenario: > > Gateway (running radvd) [00:50:BF:06:B4:F5] and a client with one interface > (forwarding disabled) [00:e0:18:90:92:05]. > > Client boots, neigborhood detection works and take the given prefix from > radvd to assign a global address for the one and only interface. [...] > But if I switch down and up the network (completly or only interface), the > mechanism won't work: [...] > Happens with 2.2.17 and 2.4.0. > > Can someone reproduce this? Any hints? I have reported this bug twice (AFAIR I have found it in 2.2.14 kernel). And I have even sent a patch (which is apllied to PLD distributions' kernel (used by many people here in poland) sice then) which fixes this. The problem is, that "all-nodes" multicast address is added to interface on its creation (eg. when module is loaded) and is removed, when interface goes down. But it is not added again when interface goes up. It is the most painful on machines which use DHCP for IPv4 address allocation --- dhcpcd seems to put interface up and down on startup. Greets, Jacek From owner-netdev@oss.sgi.com Mon Feb 5 01:13:34 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 01:13:23 -0800 Received: from gb.bnet.pl ([212.160.188.33]:46320 "HELO nic.nigdzie") by oss.sgi.com with SMTP id ; Mon, 5 Feb 2001 01:13:08 -0800 Received: (qmail 3326 invoked by uid 500); 5 Feb 2001 09:17:10 -0000 Date: Mon, 5 Feb 2001 10:17:10 +0100 From: Jacek Konieczny To: netdev@oss.sgi.com Subject: Re: [ANN] 2nd STABLE release of USAGI Project Message-ID: <20010205101710.B2754@nic.nigdzie> Mail-Followup-To: netdev@oss.sgi.com References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from sekiya@sfc.wide.ad.jp on Mon, Feb 05, 2001 at 05:15:08PM +0900 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, Feb 05, 2001 at 05:15:08PM +0900, USAGI Project wrote: > The improved features are listed below. > > - Linux kernel-2.2.18-usagi-20010205 > Based on Linux kernel-2.2.18, we have improved and impremented > + better source address selection, Better? You mean....? > + ICMPv6 Node Information Queries, > + SNMP statistics per device, > + IPv6 khttpd, > + re-joining all-node multicast address on network devices, > + enabling double bind on the same port, > + support backward compatibility of sin6_scope_id with > old glibc, > + enabling default route when ipv6 forwarding is enabled, Is this really needed? It is a very good feature, that default route is not available when ipv6 forwarding (most of IPv6 address space should never be forwarded by a default route). > + rejecting invalid ICMPs, > + more compliance with RFC regarding > Neighbor Discovery Protocol, > Stateless Autoaddress Configuration, > Multicast Listener Discovery Protocol, > + processing extension headers, > + INSTALL and Configuration documentation and > + bug fixes against original kernel. Greets, Jacek From owner-netdev@oss.sgi.com Mon Feb 5 01:23:44 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 01:23:34 -0800 Received: from shaku.sfc.wide.ad.jp ([203.178.143.49]:690 "EHLO shaku.v6.linux.or.jp") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 01:23:14 -0800 Received: from YUMIKO.sfc.wide.ad.jp (dhcpw233.nc.u-tokyo.ac.jp [133.11.123.233]) by shaku.v6.linux.or.jp (8.11.0/3.7W) with ESMTP id f159LUo00306; Mon, 5 Feb 2001 18:21:30 +0900 Date: Mon, 05 Feb 2001 18:22:45 +0900 Message-ID: From: Yuji Sekiya To: Jacek Konieczny Cc: netdev@oss.sgi.com Subject: Re: [ANN] 2nd STABLE release of USAGI Project In-Reply-To: In your message of "Mon, 5 Feb 2001 10:17:10 +0100" <20010205101710.B2754@nic.nigdzie> References: <20010205101710.B2754@nic.nigdzie> User-Agent: Wanderlust/1.1.1 (Purple Rain) REMI/1.14.1 (=?ISO-8859-4?Q?Mus?= =?ISO-8859-4?Q?higawa=F2sugi?=) Chao/1.14.0 (Momoyama) APEL/10.2 Emacs/20.7 (i386-*-nt5.0.2195) MULE/4.1 (AOI) Meadow/IPv6-1.13 Beta1++ (TANAHASHI:61) Organization: Keio University MIME-Version: 1.0 (generated by REMI 1.14.1 - =?ISO-8859-4?Q?=22Mushigawa=F2?= =?ISO-8859-4?Q?sugi=22?=) Content-Type: text/plain; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At Mon, 5 Feb 2001 10:17:10 +0100, Jacek Konieczny wrote: > On Mon, Feb 05, 2001 at 05:15:08PM +0900, USAGI Project wrote: > > The improved features are listed below. > > > > - Linux kernel-2.2.18-usagi-20010205 > > Based on Linux kernel-2.2.18, we have improved and impremented > > + better source address selection, > Better? You mean....? If you have IPv6 addresses on a interface, a better source address is selected automatically. > > + enabling default route when ipv6 forwarding is enabled, > Is this really needed? It is a very good feature, that default route is > not available when ipv6 forwarding (most of IPv6 address space should > never be forwarded by a default route). For routers in default free zone, it is true. But how about routers in leaf sites ? All router should have full routes ? -- Yuji Sekiya @ USAGI Project From owner-netdev@oss.sgi.com Mon Feb 5 01:33:44 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 01:33:34 -0800 Received: from salisbury.labs.futuretv.com ([194.216.164.17]:46843 "EHLO mail.futuretv.com") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 01:33:13 -0800 Received: from pig.labs.futuretv.com ([192.168.33.33]) by mail.futuretv.com with esmtp (Exim 3.12 #1) id 14PhzF-0006G0-00; Mon, 05 Feb 2001 09:31:05 +0000 Received: from localhost ([::ffff:127.0.0.1] helo=pig.labs.futuretv.com ident=pb) by pig.labs.futuretv.com with esmtp (Exim 3.12 #1) id 14PhzE-0003lS-00; Mon, 05 Feb 2001 09:31:04 +0000 X-Mailer: exmh version 2.1.1 10/15/1999 (debian) To: Yuji Sekiya cc: Jacek Konieczny , netdev@oss.sgi.com Subject: Re: [ANN] 2nd STABLE release of USAGI Project In-reply-to: Your message of "Mon, 05 Feb 2001 18:22:45 +0900." From: Philip Blundell Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 05 Feb 2001 09:31:04 +0000 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In message , Yuji Sekiya writes: >For routers in default free zone, it is true. But how about routers in >leaf sites ? All router should have full routes ? I don't see why not. A leaf site's provider ought to be able to aggregate the routes sufficiently that there would only be a manageable number. p. From owner-netdev@oss.sgi.com Mon Feb 5 01:40:04 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 01:39:44 -0800 Received: from shaku.sfc.wide.ad.jp ([203.178.143.49]:6066 "EHLO convert =?ISO-8859-1?Q?rfc822-to-8bitberli=89=E8=01?= shaku.v6.linux.or.jp") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 01:39:36 -0800 Received: from YUMIKO.sfc.wide.ad.jp (dhcpw233.nc.u-tokyo.ac.jp [133.11.123.233]) by shaku.v6.linux.or.jp (8.11.0/3.7W) with ESMTP id f159bqo01434; Mon, 5 Feb 2001 18:37:53 +0900 Date: Mon, 05 Feb 2001 18:39:08 +0900 Message-ID: From: Yuji Sekiya To: pb@futuretv.com Cc: jajcus@bnet.pl, netdev@oss.sgi.com Subject: Re: [ANN] 2nd STABLE release of USAGI Project In-Reply-To: In your message of "Mon, 05 Feb 2001 09:31:04 +0000" References: User-Agent: Wanderlust/1.1.1 (Purple Rain) REMI/1.14.1 (=?ISO-8859-4?Q?Mus?= =?ISO-8859-4?Q?higawa=F2sugi?=) Chao/1.14.0 (Momoyama) APEL/10.2 Emacs/20.7 (i386-*-nt5.0.2195) MULE/4.1 (AOI) Meadow/IPv6-1.13 Beta1++ (TANAHASHI:61) Organization: Keio University MIME-Version: 1.0 (generated by REMI 1.14.1 - =?ISO-8859-4?Q?=22Mushigawa=F2?= =?ISO-8859-4?Q?sugi=22?=) Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At Mon, 05 Feb 2001 09:31:04 +0000, Philip Blundell wrote: > >For routers in default free zone, it is true. But how about routers in > >leaf sites ? All router should have full routes ? > > I don't see why not. A leaf site's provider ought to be able to aggregate > the routes sufficiently that there would only be a manageable number. >From IPv6 address allocation policy, TLAs should aggregate their routes, but behavior of NLAs and SLAs are not defined. Then I think it is reasonable that we can choice whether purging default route or not. -- Yuji Sekiya @ USAGI Project From owner-netdev@oss.sgi.com Mon Feb 5 01:54:44 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 01:54:34 -0800 Received: from gb.bnet.pl ([212.160.188.33]:48626 "HELO nic.nigdzie") by oss.sgi.com with SMTP id ; Mon, 5 Feb 2001 01:54:21 -0800 Received: (qmail 3453 invoked by uid 500); 5 Feb 2001 09:58:22 -0000 Date: Mon, 5 Feb 2001 10:58:22 +0100 From: Jacek Konieczny To: netdev@oss.sgi.com Subject: Re: [ANN] 2nd STABLE release of USAGI Project Message-ID: <20010205105822.A3443@nic.nigdzie> Mail-Followup-To: netdev@oss.sgi.com References: <20010205101710.B2754@nic.nigdzie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from sekiya@sfc.wide.ad.jp on Mon, Feb 05, 2001 at 06:22:45PM +0900 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, Feb 05, 2001 at 06:22:45PM +0900, Yuji Sekiya wrote: > If you have IPv6 addresses on a interface, a better source address > is selected automatically. > > > > + enabling default route when ipv6 forwarding is enabled, > > Is this really needed? It is a very good feature, that default route is > > not available when ipv6 forwarding (most of IPv6 address space should > > never be forwarded by a default route). > > For routers in default free zone, it is true. But how about routers in > leaf sites ? All router should have full routes ? No just a route for all unicast addresses: 2000::/3 (AFAIR) And (maybe) something similar for multicast (if no better multicast routing is available) Greets, Jacek From owner-netdev@oss.sgi.com Mon Feb 5 02:22:04 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 02:21:44 -0800 Received: from shaku.sfc.wide.ad.jp ([203.178.143.49]:21682 "EHLO shaku.v6.linux.or.jp") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 02:21:16 -0800 Received: from chiriko.linux-ipv6.org (dhcpw233.nc.u-tokyo.ac.jp [133.11.123.233]) by shaku.v6.linux.or.jp (8.11.0/3.7W) with ESMTP id f15AImo04278; Mon, 5 Feb 2001 19:18:48 +0900 Date: Mon, 05 Feb 2001 19:20:07 +0900 Message-ID: <87g0htsa7c.wl@chiriko.linux-ipv6.org> From: Yuji Sekiya To: Jacek Konieczny Cc: netdev@oss.sgi.com Subject: Re: [ANN] 2nd STABLE release of USAGI Project In-Reply-To: <20010205105822.A3443@nic.nigdzie> References: <20010205101710.B2754@nic.nigdzie> <20010205105822.A3443@nic.nigdzie> User-Agent: Wanderlust/2.4.1 (Stand By Me) SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/20.7 (i386-debian-linux-gnu) MULE/4.0 (HANANOEN) Organization: Keio University MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At Mon, 5 Feb 2001 10:58:22 +0100, Jacek Konieczny wrote: > > > > + enabling default route when ipv6 forwarding is enabled, > > > Is this really needed? It is a very good feature, that default route is > > > not available when ipv6 forwarding (most of IPv6 address space should > > > never be forwarded by a default route). > > > > For routers in default free zone, it is true. But how about routers in > > leaf sites ? All router should have full routes ? > No just a route for all unicast addresses: 2000::/3 (AFAIR) > And (maybe) something similar for multicast (if no better multicast > routing is available) Actually it means default route. I can't see why you announce or add statically the route instead of default route. -- Yuji Sekiya From owner-netdev@oss.sgi.com Mon Feb 5 02:28:53 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 02:28:34 -0800 Received: from gb.bnet.pl ([212.160.188.33]:41971 "HELO nic.nigdzie") by oss.sgi.com with SMTP id ; Mon, 5 Feb 2001 02:28:22 -0800 Received: (qmail 3651 invoked by uid 500); 5 Feb 2001 10:32:36 -0000 Date: Mon, 5 Feb 2001 11:32:36 +0100 From: Jacek Konieczny To: netdev@oss.sgi.com Subject: Re: [ANN] 2nd STABLE release of USAGI Project Message-ID: <20010205113236.B3612@nic.nigdzie> Mail-Followup-To: netdev@oss.sgi.com References: <20010205101710.B2754@nic.nigdzie> <20010205105822.A3443@nic.nigdzie> <87g0htsa7c.wl@chiriko.linux-ipv6.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <87g0htsa7c.wl@chiriko.linux-ipv6.org>; from sekiya@sfc.wide.ad.jp on Mon, Feb 05, 2001 at 07:20:07PM +0900 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, Feb 05, 2001 at 07:20:07PM +0900, Yuji Sekiya wrote: > At Mon, 5 Feb 2001 10:58:22 +0100, > Jacek Konieczny wrote: > > > > > > + enabling default route when ipv6 forwarding is enabled, > > > > Is this really needed? It is a very good feature, that default route is > > > > not available when ipv6 forwarding (most of IPv6 address space should > > > > never be forwarded by a default route). > > > > > > For routers in default free zone, it is true. But how about routers in > > > leaf sites ? All router should have full routes ? > > No just a route for all unicast addresses: 2000::/3 (AFAIR) > > And (maybe) something similar for multicast (if no better multicast > > routing is available) > > Actually it means default route. I can't see why you announce or add > statically the route instead of default route. This is not the same as default route. Prefix 2000::/3 does not contain multicast nor link-local and site-local addresses. Only global unicast addresses. When using default route you should (but I am almost sure most people wouldn't do it) block those addresses by other means. There are a lot of IPv4 packages with "private network" addresses going through the Internet just because of badly configured routers -- all unknown packages go out the default route even if they are "non-routable". Greets, Jacek From owner-netdev@oss.sgi.com Mon Feb 5 05:34:03 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 05:33:53 -0800 Received: from lox.sandelman.ottawa.on.ca ([209.151.24.2]:28104 "EHLO lox.sandelman.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 05:33:38 -0800 Received: from nox.sandelman.ottawa.on.ca (nox.sandelman.ottawa.on.ca [209.151.24.6]) by lox.sandelman.ottawa.on.ca (8.8.7/8.8.8) with ESMTP id IAA06528; Mon, 5 Feb 2001 08:32:59 -0500 (EST) Received: from marajade.sandelman.ottawa.on.ca (marajade.sandelman.ottawa.on.ca [209.151.24.20]) by nox.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f15Dx4t09095 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK); Mon, 5 Feb 2001 05:59:05 -0800 (PST) Received: from marajade.sandelman.ottawa.on.ca (marajade.sandelman.ottawa.on.ca [127.0.0.1]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f15DLQB11782; Mon, 5 Feb 2001 08:21:34 -0500 (EST) Message-Id: <200102051321.f15DLQB11782@marajade.sandelman.ottawa.on.ca> To: Ben Greear From: mcr@solidum.com cc: John Fraizer , Ben Greear , linux-atm , netdev Subject: Re: packet (ppp) over Sonet in Linux In-reply-to: Your message of "Mon, 05 Feb 2001 00:21:39 MST." <3A7E5483.734F6450@candelatech.com> Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Mon, 05 Feb 2001 08:21:26 -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Ben" == Ben Greear writes: Ben> mcr@solidum.com wrote: >> >>>>> "John" == John Fraizer writes: John> Anyone know what the streetprice for those two are though? >> Around $8K US. >> >> I should be getting one in a couple of months. Ben> I'd love to see any performance (like routing bps) numbers that you Ben> might find. If I get some first hand, I'll be sure to let you all Ben> know :) I won't be doing performance testing on the card, but rather using it to do functional testing of another device. (A Smartbits/Adtech/Agilent/TBD will provide the performance testing of the device, but at much higher cost) ] Train travel features AC outlets with no take-off restrictions|gigabit is no[ ] Michael Richardson, Solidum Systems Oh where, oh where has|problem with[ ] mcr@solidum.com www.solidum.com the little fishy gone?|PAX.port 1100[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ From owner-netdev@oss.sgi.com Mon Feb 5 05:34:04 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 05:33:45 -0800 Received: from lox.sandelman.ottawa.on.ca ([209.151.24.2]:28360 "EHLO lox.sandelman.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 05:33:35 -0800 Received: from nox.sandelman.ottawa.on.ca (nox.sandelman.ottawa.on.ca [209.151.24.6]) by lox.sandelman.ottawa.on.ca (8.8.7/8.8.8) with ESMTP id IAA06535; Mon, 5 Feb 2001 08:33:07 -0500 (EST) Received: from marajade.sandelman.ottawa.on.ca (marajade.sandelman.ottawa.on.ca [209.151.24.20]) by nox.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f15DxLt09098 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK); Mon, 5 Feb 2001 05:59:22 -0800 (PST) Received: from marajade.sandelman.ottawa.on.ca (marajade.sandelman.ottawa.on.ca [127.0.0.1]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f15DLpB11801; Mon, 5 Feb 2001 08:21:51 -0500 (EST) Message-Id: <200102051321.f15DLpB11801@marajade.sandelman.ottawa.on.ca> To: John Fraizer FroM: mcr@solidum.com cc: Ben Greear , linux-atm , netdev Subject: Re: packet (ppp) over Sonet in Linux In-reply-to: Your message of "Mon, 05 Feb 2001 01:22:35 EST." Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Mon, 05 Feb 2001 08:21:51 -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "John" == John Fraizer writes: John> Was that for the OC12 or the OC48? OC-48 AFAIK. John> On Sun, 4 Feb 2001 mcr@solidum.com wrote: >> >>>>> "John" == John Fraizer writes: John> Anyone know what the streetprice for those two are though? >> Around $8K US. >> >> I should be getting one in a couple of months. >> >> From: "Larsen, Jacob Z (Jacob)" Subject: RE: >> [Fwd: OptiStar products] Return-Path: jlarsen@lucent.com >> Delivery-Date: Wed Dec 20 22:43:32 2000 Return-Path: >> >> >> Michael, >> >> Sorry about the mail error. Glad we caught up anyway. >> >> All you need is to fax a P.O. to our fax number at 972 671 5476. >> >> The product comcode is 408270759. The price is $7995. >> >> We ship 2 weeks (and it takes usually 24 hours, but sometimes >> 48). That means that you could have it before the end of the year or >> early January depending on when we get your P.O. >> >> If you have any questions call me at 917 690 1885. >> >> Regards, >> >> Jacob >> From owner-netdev@oss.sgi.com Mon Feb 5 05:48:33 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 05:48:23 -0800 Received: from lsb-catv-1-p021.vtxnet.ch ([212.147.5.21]:47109 "EHLO almesberger.net") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 05:48:02 -0800 Received: (from almesber@localhost) by almesberger.net (8.9.3/8.9.3) id OAA09287; Mon, 5 Feb 2001 14:47:22 +0100 Date: Mon, 5 Feb 2001 14:47:22 +0100 From: Werner Almesberger To: Ookhoi Cc: netdev@oss.sgi.com Subject: Re: Fwd: Re: vaio doesn't boot with 2.4.1-ac1, stops at PCI: Probing PCI hardware Message-ID: <20010205144722.K7561@almesberger.net> References: <20010203190758.O3922@ookhoi.dds.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010203190758.O3922@ookhoi.dds.nl>; from ookhoi@dds.nl on Sat, Feb 03, 2001 at 07:07:58PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Ookhoi wrote: > Now a mount / via nfs is something which really should work. It does, but in conjunction with PCMCIA, it gets very messy. You're exceeding the design limitations of nfsroot. You have to use initrd and change_root instead. You can find a description of how you can use initrd for an NFS root on the pivot_root(8) man page. I've documented using all this with PCMCIA a while back in the README of ftp://icaftp.epfl.ch/pub/people/almesber/misc/umount-root-6.tar.gz My example doesn't include bootp, so you'll have to add that one. If you haven't used initrd yet, I'd suggest you exercise all this first on a machine without PCMCIA. And only then deal with the added complexity of using PCMCIA. Good luck ! - Werner (ceterum censeo nfsroot esse delendum ;-) -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH Werner.Almesberger@epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Mon Feb 5 06:43:35 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 06:43:24 -0800 Received: from expanse.dds.nl ([194.109.10.118]:29448 "EHLO expanse.dds.nl") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 06:43:03 -0800 Received: (from ookhoi@localhost) by expanse.dds.nl (8.9.3/8.9.3) id PAA21291; Mon, 5 Feb 2001 15:42:02 +0100 Date: Mon, 5 Feb 2001 15:42:02 +0100 From: Ookhoi To: Werner Almesberger Cc: netdev@oss.sgi.com Subject: Re: Fwd: Re: vaio doesn't boot with 2.4.1-ac1, stops at PCI: Probing PCI hardware Message-ID: <20010205154202.U3922@ookhoi.dds.nl> Reply-To: ookhoi@dds.nl References: <20010203190758.O3922@ookhoi.dds.nl> <20010205144722.K7561@almesberger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.1.14i In-Reply-To: <20010205144722.K7561@almesberger.net>; from Werner.Almesberger@epfl.ch on Mon, Feb 05, 2001 at 02:47:22PM +0100 X-Uptime: 12:00pm up 3 days, 23:04, 22 users, load average: 0.72, 0.18, 0.08 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi Werner, > Ookhoi wrote: > > Now a mount / via nfs is something which really should work. > > It does, but in conjunction with PCMCIA, it gets very messy. You're > exceeding the design limitations of nfsroot. You have to use initrd > and change_root instead. Thanx, but the problem is that I boot from a usb floppy drive, and I can't access it after the kernel booted. All stuff for the usb drive is in the kernel and the drive is supported according to the information on the Internet, but I can't read a root image from the disk. It seemes that people only get the drive to work with modules, and that at boot-time the driver needs a bit more time. Initrd is strange anyway, as normaly it asks for a floppy and to press enter. With the usb floppy drive, it asks for the floppy and to press enter, but instead of waiting, it just continous, most likely because the drive gets /dev/sda assigned to it. > You can find a description of how you can use initrd for an NFS root > on the pivot_root(8) man page. I've documented using all this with > PCMCIA a while back in the README of > ftp://icaftp.epfl.ch/pub/people/almesber/misc/umount-root-6.tar.gz > > My example doesn't include bootp, so you'll have to add that one. > > If you haven't used initrd yet, I'd suggest you exercise all this > first on a machine without PCMCIA. And only then deal with the > added complexity of using PCMCIA. What is the problem with pcmcia and nfsroot? If I make the nic work, then nfsroot should work without even knowing the nick is a pcmcia, shouldn't it? It seemes to me that the only problem is that ipconfig (bootp or the info as kernel parameter ip) comes before the configuration of the nic and therefor fails. So if I can make ipconfig kick in a bit later (after the nic) I have an ip-address, no? Anyway, I would like to play with this if possible. :-) Can you give me a hint on how to move ipconfig behind the nic? Or do you have a tip for me on this problem? Tia!! Ookhoi From owner-netdev@oss.sgi.com Mon Feb 5 07:02:34 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 07:02:25 -0800 Received: from lsb-catv-1-p021.vtxnet.ch ([212.147.5.21]:52997 "EHLO almesberger.net") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 07:02:10 -0800 Received: (from almesber@localhost) by almesberger.net (8.9.3/8.9.3) id QAA09432; Mon, 5 Feb 2001 16:01:55 +0100 Date: Mon, 5 Feb 2001 16:01:55 +0100 From: Werner Almesberger To: Ookhoi Cc: netdev@oss.sgi.com Subject: Re: Fwd: Re: vaio doesn't boot with 2.4.1-ac1, stops at PCI: Probing PCI hardware Message-ID: <20010205160155.D9343@almesberger.net> References: <20010203190758.O3922@ookhoi.dds.nl> <20010205144722.K7561@almesberger.net> <20010205154202.U3922@ookhoi.dds.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010205154202.U3922@ookhoi.dds.nl>; from ookhoi@dds.nl on Mon, Feb 05, 2001 at 03:42:02PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Ookhoi wrote: > Thanx, but the problem is that I boot from a usb floppy drive, and I > can't access it after the kernel booted. That's why I'm suggesting to use initrd. > Initrd is strange anyway, as normaly it asks for a floppy and to press > enter. Uh ? No, initrd is loaded by the boot loader, not the kernel ... If your boot loader has problems with the BIOS, maube you should try another one then, e.g. LILO almost certainly works for this. > If I make the nic work, then nfsroot should work without even knowing > the nick is a pcmcia, shouldn't it? Okay, second possibility, which may or may not work: again, use initrd, but this time, exit from initrd after bringing up the NIC. Now you're using the old change root mechanism, which executes the "standard" root mount code for you, see also ftp://icaftp.epfl.ch/pub/people/almesber/booting/bootinglinux-current.ps.gz AFAIK, you can't set up the NIC without user-space PCMCIA support, so you still have to use initrd for this. Iff you figure out a way to do this, you could use a simple initrd that just sleeps for a few seconds, or loops until it finds the devices it wants. For constructing small initrds, see also ftp://icaftp.epfl.ch/pub/people/almesber/misc/newlib-linux/ newlib-linux-21.tar.gz (but in your case, just statically linking against glibc is probably okay, even if you waste some space this way.) - Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH Werner.Almesberger@epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Mon Feb 5 07:30:25 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 07:30:05 -0800 Received: from expanse.dds.nl ([194.109.10.118]:44808 "EHLO expanse.dds.nl") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 07:30:02 -0800 Received: (from ookhoi@localhost) by expanse.dds.nl (8.9.3/8.9.3) id QAA21943; Mon, 5 Feb 2001 16:29:03 +0100 Date: Mon, 5 Feb 2001 16:29:03 +0100 From: Ookhoi To: Werner Almesberger Cc: netdev@oss.sgi.com Subject: Re: Fwd: Re: vaio doesn't boot with 2.4.1-ac1, stops at PCI: Probing PCI hardware Message-ID: <20010205162903.V3922@ookhoi.dds.nl> Reply-To: ookhoi@dds.nl References: <20010203190758.O3922@ookhoi.dds.nl> <20010205144722.K7561@almesberger.net> <20010205154202.U3922@ookhoi.dds.nl> <20010205160155.D9343@almesberger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.1.14i In-Reply-To: <20010205160155.D9343@almesberger.net>; from Werner.Almesberger@epfl.ch on Mon, Feb 05, 2001 at 04:01:55PM +0100 X-Uptime: 12:00pm up 3 days, 23:04, 22 users, load average: 0.72, 0.18, 0.08 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Thank you for your quick answer. :-) > > Thanx, but the problem is that I boot from a usb floppy drive, and I > > can't access it after the kernel booted. > > That's why I'm suggesting to use initrd. > > > Initrd is strange anyway, as normaly it asks for a floppy and to press > > enter. > > Uh ? No, initrd is loaded by the boot loader, not the kernel ... So I don't need to read the initrd from the floppy with support in the kernel for the usb floppy drive because the boot loader loads it? Hmm, then I have to read up on initrd. :-) Will try that, thnx. :-) > If your boot loader has problems with the BIOS, maube you should try > another one then, e.g. LILO almost certainly works for this. Yeah, lilo boots the kernel fine. > > If I make the nic work, then nfsroot should work without even knowing > > the nick is a pcmcia, shouldn't it? > > Okay, second possibility, which may or may not work: again, use initrd, > but this time, exit from initrd after bringing up the NIC. Now you're > using the old change root mechanism, which executes the "standard" root > mount code for you, see also > > ftp://icaftp.epfl.ch/pub/people/almesber/booting/bootinglinux-current.ps.gz > > AFAIK, you can't set up the NIC without user-space PCMCIA support, so > you still have to use initrd for this. Iff you figure out a way to do > this, you could use a simple initrd that just sleeps for a few seconds, > or loops until it finds the devices it wants. Oke, will try that if I can make initrd work. > For constructing small initrds, see also > ftp://icaftp.epfl.ch/pub/people/almesber/misc/newlib-linux/ > newlib-linux-21.tar.gz > (but in your case, just statically linking against glibc is probably > okay, even if you waste some space this way.) Thanks a lot!! Ookhoi From owner-netdev@oss.sgi.com Mon Feb 5 10:51:57 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 10:51:36 -0800 Received: from palrel1.hp.com ([156.153.255.242]:9478 "HELO palrel1.hp.com") by oss.sgi.com with SMTP id ; Mon, 5 Feb 2001 10:51:31 -0800 Received: from tardy.cup.hp.com (tardy.cup.hp.com [15.8.80.176]) by palrel1.hp.com (Postfix) with ESMTP id D915CC9B; Mon, 5 Feb 2001 10:51:30 -0800 (PST) Received: from cup.hp.com (localhost [127.0.0.1]) by tardy.cup.hp.com (8.9.3 (PHNE_18546)/8.9.3 SMKit7.02) with ESMTP id KAA00772; Mon, 5 Feb 2001 10:51:29 -0800 (PST) Message-ID: <3A7EF631.174DB262@cup.hp.com> Date: Mon, 05 Feb 2001 10:51:29 -0800 From: Rick Jones Organization: the Unofficial HP X-Mailer: Mozilla 4.75 [en] (X11; U; HP-UX B.11.00 9000/785) X-Accept-Language: en MIME-Version: 1.0 To: jamal Cc: Ion Badulescu , Andrew Morton , lkml , "netdev@oss.sgi.com" Subject: Re: Still not sexy! (Re: sendfile+zerocopy: fairly sexy (nothing todowith ECN) References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > > As time marches on, the orders of magnitude of the constants may change, > > but basic concepts still remain, and the "lessons" learned in the past > > by one generation tend to get relearned in the next :) for example - > > there is no such a thing as a free lunch... :) > > ;-> > BTW, i am reading one of your papers (circa 1993 ;->, "we go fast with a > little help from your apps") in which you make an interesting > observation. That (figure 2) there is "a considerable increase in > efficiency but not a considerable increase in throughput" .... I "scanned" > to the end of the paper and dont see an explanation. That would be the copyavoidance paper using the very old G30 with the HP-PB (sometimes called PeanutButter) bus :) (http://ftp.cup.hp.com/dist/networking/briefs/) No, back then we were not going to describe the dirty laundry of the G30 hardware :) The limiter appears to have been the bus converter from the SGC (?) main bus of the Novas (8x7,F,G,H,I) to the HP-PB bus. The chip was (apropriately enough) codenamed "BOA" and it was a constrictor :) I never had a chance to carry-out the tests on an older 852 system - those have slower CPU's, but HP-PB was _the_ bus in the system. Prototypes leading to the HP-PB FDDI card achieved 10 MB/s on an 832 system using UDP - this was back in the 1988-1989 timeframe iirc. > I've made a somehow similar observation with the current zc patches and > infact observed that throughput goes down with the linux zc patches. > [This is being contested but no-one else is testing at gigE, so my word is > the only truth]. > Of course your paper doesnt talk about sendfile rather the page pinning + > COW tricks (which are considered taboo in Linux) but i do sense a > relationship. Well, the HP-PB FDDI card did follow buffer chains rather well, and there was no mapping overhead on a Nova - it was a non-coherent I/O subsystem and DMA was done exclusively with physical addresses (and requisite pre-DMA flushes on outbound, and purges on inbound - another reason why copy-avoidance was such a win overheadwise). Also, there was no throughput drop when going to copyavoidance in that stuff. So, I'd say that while somethings might "feel" similar, it does not go much deeper than that. rick > PS:- I dont have "my" machines yet and i have a feeling it will be a while > before i re-run the tests; however, i have created a patch for > linux-sendfile with netperf. Please take a look at it at: > http://www.cyberus.ca/~hadi/patch-nperf-sfile-linux.gz > tell me if is missing anything and if it is ok, could you please merge in > your tree? I will take a look. -- ftp://ftp.cup.hp.com/dist/networking/misc/rachel/ these opinions are mine, all mine; HP might not want them anyway... :) feel free to email, OR post, but please do NOT do BOTH... my email address is raj in the cup.hp.com domain... From owner-netdev@oss.sgi.com Mon Feb 5 17:42:48 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 17:42:29 -0800 Received: from pizda.ninka.net ([216.101.162.242]:60289 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 17:41:57 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id RAA15958; Mon, 5 Feb 2001 17:40:30 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14975.22030.765975.161693@pizda.ninka.net> Date: Mon, 5 Feb 2001 17:40:30 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] New zerocopy against 2.4.2-pre1 X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In the usual spot: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2p1-1.diff.gz Changes since last installment: 1) Merge in lots of AC patch fixes, from Alan. 2) Use more reasonable MTU for loopback under Zerocopy, basically it's 16K + sizeof TCP/IP headers now. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 5 17:45:18 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 17:44:58 -0800 Received: from cwcsun41.cwc.nus.edu.sg ([137.132.163.102]:5300 "EHLO cwcsun41.cwc.nus.edu.sg") by oss.sgi.com with ESMTP id ; Mon, 5 Feb 2001 17:44:50 -0800 Received: from yaoqi ([172.16.3.22]) by cwcsun41.cwc.nus.edu.sg (8.9.3/8.9.3) with SMTP id JAA12049 for ; Tue, 6 Feb 2001 09:43:38 +0800 (SGT) Message-ID: <001801c08fde$c3935be0$160310ac@cwc.nus.edu.sg> From: "Yao Qi" To: Subject: ipv6 router Date: Tue, 6 Feb 2001 09:45:42 +0800 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0011_01C09021.9243A0C0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. ------=_NextPart_000_0011_01C09021.9243A0C0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi,=20 To set up an ipv4 router, we need to enable the net.ipv4.ip_forwarding. = Is there anything similar to set up an ipv6 router, such as = net.ipv6.ip_forwarding=3D1? Thanks for all your help. Yao Qi ------=_NextPart_000_0011_01C09021.9243A0C0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi,
 
To set up an ipv4 router, we need to = enable the=20 net.ipv4.ip_forwarding. Is there anything similar to set up an ipv6 = router, such=20 as net.ipv6.ip_forwarding=3D1?
 
Thanks for all your help.
 
Yao Qi
------=_NextPart_000_0011_01C09021.9243A0C0-- From owner-netdev@oss.sgi.com Mon Feb 5 23:17:20 2001 Received: by oss.sgi.com id ; Mon, 5 Feb 2001 23:17:01 -0800 Received: from mail.bieringer.de ([195.226.187.51]:43794 "HELO titan.bieringer.de") by oss.sgi.com with SMTP id ; Mon, 5 Feb 2001 23:16:33 -0800 Received: (qmail 18221 invoked from network); 6 Feb 2001 07:16:29 -0000 Received: from p3e9b8e26.dip.t-dialin.net (HELO worker.bieringer.de) (62.155.142.38) by mail.bieringer.de with SMTP; 6 Feb 2001 07:16:29 -0000 Message-Id: <5.0.2.1.0.20010206081551.00ae2110@mail.bieringer.de> X-Sender: list4peter@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Tue, 06 Feb 2001 08:17:51 +0100 To: "Yao Qi" , From: Peter Bieringer Subject: Re: ipv6 router In-Reply-To: <001801c08fde$c3935be0$160310ac@cwc.nus.edu.sg> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At 02:45 06.02.2001, Yao Qi wrote: >To set up an ipv4 router, we need to enable the net.ipv4.ip_forwarding. Is >there anything similar to set up an ipv6 router, such as >net.ipv6.ip_forwarding=1? Yes, per interface (decided on received packets) and globally (all/default) i.e. find /proc -name forwarding |grep ipv6 /proc/sys/net/ipv6/conf/sit1/forwarding /proc/sys/net/ipv6/conf/eth1/forwarding /proc/sys/net/ipv6/conf/sit0/forwarding /proc/sys/net/ipv6/conf/eth0/forwarding /proc/sys/net/ipv6/conf/lo/forwarding /proc/sys/net/ipv6/conf/default/forwarding /proc/sys/net/ipv6/conf/all/forwarding Peter From owner-netdev@oss.sgi.com Tue Feb 6 00:06:51 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 00:06:41 -0800 Received: from shaku.sfc.wide.ad.jp ([203.178.143.49]:61107 "EHLO shaku.v6.linux.or.jp") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 00:06:29 -0800 Received: from chiriko.linux-ipv6.org (dhcpw233.nc.u-tokyo.ac.jp [133.11.123.233]) by shaku.v6.linux.or.jp (8.11.0/3.7W) with ESMTP id f1683Mo21280; Tue, 6 Feb 2001 17:03:22 +0900 Date: Tue, 06 Feb 2001 17:04:46 +0900 Message-ID: <87elxcs0dd.wl@chiriko.linux-ipv6.org> From: Yuji Sekiya To: Jacek Konieczny Cc: netdev@oss.sgi.com Subject: Re: [ANN] 2nd STABLE release of USAGI Project In-Reply-To: <20010205113236.B3612@nic.nigdzie> References: <20010205101710.B2754@nic.nigdzie> <20010205105822.A3443@nic.nigdzie> <87g0htsa7c.wl@chiriko.linux-ipv6.org> <20010205113236.B3612@nic.nigdzie> User-Agent: Wanderlust/2.4.1 (Stand By Me) SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/20.7 (i386-debian-linux-gnu) MULE/4.0 (HANANOEN) Organization: Keio University MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At Mon, 5 Feb 2001 11:32:36 +0100, Jacek Konieczny wrote: > > Actually it means default route. I can't see why you announce or add > > statically the route instead of default route. > > This is not the same as default route. Prefix 2000::/3 does not contain > multicast nor link-local and site-local addresses. Only global unicast > addresses. Have you ever seen IPv6 routing table ?? 8) In routing table, default routes for multicast and link-local address exist. fe80::/10 :: UA 256 0 0 eth0 ff00::/8 :: UA 256 0 0 eth0 So if a default route exist on routing table, multicast and link-local packets are not affected. In case of site-local, IMO, it is broken spec and never work 8)8)8). Anyway, if you want to mention about filtering invalid packets with purging default route, I think it should be solved as OPERATIONAL issues not TECHNICAL issues. Then I think we should be able to choice using a specific route or a default route for default unicast routing. Even if the linux kernel accept default route, you can use a specific route for unicast routing without default route. Regards. -- Yuji Sekiya From owner-netdev@oss.sgi.com Tue Feb 6 05:42:54 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 05:42:44 -0800 Received: from pak145.pakuni.net ([205.138.121.145]:16113 "EHLO postal.paktronix.com") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 05:42:29 -0800 Received: from netmonster.pakint.net (netmonster [192.168.3.13]) by postal.paktronix.com (8.9.3/8.9.3) with ESMTP id IAA09681; Tue, 6 Feb 2001 08:33:22 -0600 Date: Tue, 6 Feb 2001 07:35:47 -0600 (CST) From: "Matthew G. Marsh" X-Sender: mgm@netmonster.pakint.net To: Yao Qi cc: netdev@oss.sgi.com Subject: Re: ipv6 routing table In-Reply-To: <000e01c08f37$22b47940$160310ac@cwc.nus.edu.sg> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 5 Feb 2001, Yao Qi wrote: > Hi, > > I enabled ipv6 in linux kernel 2.2.14. I can use ping6 now, but the > routing table is still in ipv4. Can anybody tell me how to set up ipv6 > routing table? Using IPROUTE2 = ip -6 route add ... I suspect if you have ip6 running then you already have route entries. ip -6 route list should show them. Have not used ifconfig/route for 3 years now and never under v6 so can't help you if those are your utilities. > Thanks. > > Yao Qi -------------------------------------------------- Matthew G. Marsh, President Paktronix Systems LLC 1506 North 59th Street Omaha NE 68104 Phone: (402) 932-7250 Email: mgm@paktronix.com WWW: http://www.paktronix.com -------------------------------------------------- From owner-netdev@oss.sgi.com Tue Feb 6 08:00:47 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 08:00:27 -0800 Received: from gale.cs.cornell.edu ([128.84.154.54]:35345 "EHLO marduk.litech.org") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 08:00:10 -0800 Received: from lutchann (helo=localhost) by marduk.litech.org with local-esmtp (Exim 3.20 #3) id 14QAX7-0001ly-00; Tue, 06 Feb 2001 10:59:57 -0500 Date: Tue, 6 Feb 2001 10:59:57 -0500 (EST) From: Nathan Lutchansky To: "usagi-core@linux-ipv6.org" cc: "netdev@oss.sgi.com" Subject: Re: [ANN] 2nd STABLE release of USAGI Project In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 5 Feb 2001, USAGI Project wrote: > We are glad that we can announce the 2nd STABLE RELEASE of USAGI > (UniverSAl playGround for Ipv6) on February 5th, 2001. On this > release, we provide only kernel, but two kernels. One is > linux-2.2.18-usagi kernel and the other is linux-2.4.0-usagi kernel. > > - Linux kernel-2.4.0-usagi-20010205 > Based on Linux kernel-2.4.0, we have improved and implemented > same as kernel-2.2.18-usagi-20010205. Will the patch for 2.4.0 apply to 2.4.1? Or are you planning to release a 2.4.1 snap soon? -Nathan -- +-------------------+---------------------+------------------------+ | Nathan Lutchansky | lutchann@litech.org | Lithium Technologies | +------------------------------------------------------------------+ | I dread success. To have succeeded is to have finished one's | | business on earth... I like a state of continual becoming, | | with a goal in front and not behind. - George Bernard Shaw | +------------------------------------------------------------------+ From owner-netdev@oss.sgi.com Tue Feb 6 09:44:07 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 09:43:58 -0800 Received: from mail.bieringer.de ([195.226.187.51]:13843 "HELO titan.bieringer.de") by oss.sgi.com with SMTP id ; Tue, 6 Feb 2001 09:43:36 -0800 Received: (qmail 22931 invoked from network); 6 Feb 2001 17:43:33 -0000 Received: from pd9502bca.dip.t-dialin.net (HELO worker.bieringer.de) (217.80.43.202) by mail.bieringer.de with SMTP; 6 Feb 2001 17:43:33 -0000 Message-Id: <5.0.2.1.0.20010206184415.019ca8a8@mail.bieringer.de> X-Sender: list4peter@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Tue, 06 Feb 2001 18:44:55 +0100 To: Nathan Lutchansky , "usagi-core@linux-ipv6.org" From: Peter Bieringer Subject: Re: [ANN] 2nd STABLE release of USAGI Project Cc: "netdev@oss.sgi.com" In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At 16:59 06.02.2001, Nathan Lutchansky wrote: >On Mon, 5 Feb 2001, USAGI Project wrote: > > > We are glad that we can announce the 2nd STABLE RELEASE of USAGI > > (UniverSAl playGround for Ipv6) on February 5th, 2001. On this > > release, we provide only kernel, but two kernels. One is > > linux-2.2.18-usagi kernel and the other is linux-2.4.0-usagi kernel. > > > > - Linux kernel-2.4.0-usagi-20010205 > > Based on Linux kernel-2.4.0, we have improved and implemented > > same as kernel-2.2.18-usagi-20010205. > >Will the patch for 2.4.0 apply to 2.4.1? Or are you planning to release a >2.4.1 snap soon? -Nathan Current patch will work on 2.4.1, too. Also the 2.2.18 works on 2.2.19pre8 (tested yesterday!) Peter From owner-netdev@oss.sgi.com Tue Feb 6 09:50:07 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 09:49:47 -0800 Received: from mean.netppl.fi ([195.242.208.16]:64265 "EHLO mean.netppl.fi") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 09:49:33 -0800 Received: from evil.netppl.fi (root@evil.netppl.fi [195.242.209.201]) by mean.netppl.fi (8.9.3/8.9.3) with ESMTP id TAA27672 for ; Tue, 6 Feb 2001 19:49:19 +0200 Received: (from pp@localhost) by evil.netppl.fi (8.9.3/8.9.3) id TAA06077 for netdev@oss.sgi.com; Tue, 6 Feb 2001 19:49:19 +0200 Date: Tue, 6 Feb 2001 19:49:19 +0200 From: Pekka Pietikainen To: netdev@oss.sgi.com Subject: zerocopy results on GigE Message-ID: <20010206194919.A633@netppl.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Here's some benchmarks I ran today, which look quite similar to what Jamal was getting. Jumbo frames, sender is dual pIII/500 with 32/66 PCI, receiver a dual pII/450 with 32/33, both have 1MB Alteons. CPU use is measured using cyclesoak. SO_RCVBUF/SO_SNDBUF set to 512k, no other sockopts touched. I did the tests with a nice little modular network tester I've been hacking on, ttcp/gensink give similar results... zerocopy-2.4.1-2, writes were done in 512k chunks (except sendfile() where the file was transmitted all at once) Test bandwith CPU(rcv) CPU(transmit) writing from buffer 65MB/s 45% 30% same with MSG_TRUNC on receiver 81MB/s 21% 40% 64MB file: read/write from file 48MB/s 34% 53% read/write from file+MSG_TRUNC 48MB/s 14% 53% sendfile() 62MB/s 45% 8% sendfile()+MSG_TRUNC 80MB/s 21% 14% mmap()/write 64MB/s 45% 35% mmap()/write+MSG_TRUNC 81MB/s 21% 52% 128MB file: (128M memory so causes some paging which caused cpu use and performance to bounce around) read/write from file 43MB/s 30% 55% +MSG_TRUNC 44MB/s 12% 56% sendfile() 62MB/s 45% 33% (+-5%) +MSG_TRUNC 81MB/s 21% 47% mmap()/write 40MB/s 27% (+-7%) 80% +MSG_TRUNC 45MB/s 12% 80% 2.4.2-pre1 writing from buffer 70MB/s 54% 33% same with MSG_TRUNC on receiver 98MB/s 19% 46% 64M file read/write from file 50MB/s 33% 50% +MSG_TRUNC 51MB/s 16% 53% sendfile() 68MB/s 48% 41% +MSG_TRUNC 93MB/s 36% 56% mmap()/write 57MB/s 39% 32% +MSG_TRUNC 87MB/s 28% 53% 128M file read/write from file 44MB/s 31% 55% sendfile() 64MB/s 47% 60% sendfile+MSG_TRUNC 64MB/s 23% 60% mmap()/write 33MB/s 26% 70% And for comparison: STP 98MB/s 2.8% 17% Might be that the problems are in fact caused by the non-zc related "optimizations" in the acenic driver, I'll try playing with it a bit more tomorrow. -- Pekka Pietikainen From owner-netdev@oss.sgi.com Tue Feb 6 10:40:28 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 10:40:08 -0800 Received: from [63.93.198.67] ([63.93.198.67]:6712 "EHLO mercury.mayannetworks.com") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 10:39:59 -0800 Received: from fs-phx.mayannetworks.com (fs-phx.mayannetworks.com [10.4.1.3]) by mercury.mayannetworks.com (8.9.3/8.9.3) with ESMTP id KAA11315; Tue, 6 Feb 2001 10:39:56 -0800 (PST) Received: from mayannetworks.com (bgreear@[10.4.1.247]) by fs-phx.mayannetworks.com (8.8.8/8.8.8) with ESMTP id LAA09990; Tue, 6 Feb 2001 11:39:52 -0700 (MST) Message-ID: <3A804411.F8EE54D0@mayannetworks.com> Date: Tue, 06 Feb 2001 11:36:01 -0700 From: Ben Greear X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.16-22 i686) X-Accept-Language: en MIME-Version: 1.0 To: Pekka Pietikainen CC: netdev@oss.sgi.com Subject: Re: zerocopy results on GigE References: <20010206194919.A633@netppl.fi> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Pekka Pietikainen wrote: > > Here's some benchmarks I ran today, which look quite similar to what > Jamal was getting. > > Jumbo frames, sender is dual pIII/500 with 32/66 PCI, receiver a dual pII/450 > with 32/33, both have 1MB Alteons. CPU use is measured using cyclesoak. > SO_RCVBUF/SO_SNDBUF set to 512k, no other sockopts touched. > > I did the tests with a nice little modular network tester I've been > hacking on, ttcp/gensink give similar results... I assume this copies (at least logically) the data to/from user-space? If so, would routing numbers be even better, with regard to throughput? For instance, does anyone have any numbers on routing between two GigE interfaces?? Also, what does MSG_TRUNC do?? It seems to have a serious affect on traffic! THanks, Ben -- Ben Greear (bgreear@mayannetworks.com) http://www.mayannetworks.com NAM Team, Phoenix http://www-internal/~bgreear Phone: 602-325-2043 Personal Cell Phone: 602-502-6887 From owner-netdev@oss.sgi.com Tue Feb 6 10:48:58 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 10:48:38 -0800 Received: from mean.netppl.fi ([195.242.208.16]:31755 "EHLO mean.netppl.fi") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 10:48:31 -0800 Received: from evil.netppl.fi (root@evil.netppl.fi [195.242.209.201]) by mean.netppl.fi (8.9.3/8.9.3) with ESMTP id UAA28632; Tue, 6 Feb 2001 20:48:19 +0200 Received: (from pp@localhost) by evil.netppl.fi (8.9.3/8.9.3) id UAA22095; Tue, 6 Feb 2001 20:48:19 +0200 Date: Tue, 6 Feb 2001 20:48:19 +0200 From: Pekka Pietikainen To: Ben Greear Cc: Pekka Pietikainen , netdev@oss.sgi.com Subject: Re: zerocopy results on GigE Message-ID: <20010206204819.A22011@netppl.fi> References: <20010206194919.A633@netppl.fi> <3A804411.F8EE54D0@mayannetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <3A804411.F8EE54D0@mayannetworks.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Feb 06, 2001 at 11:36:01AM -0700, Ben Greear wrote: > I assume this copies (at least logically) the data to/from user-space? Yup, this is user-space to user-space (including the STP numbers :) ), except the MSG_TRUNC which makes the kernel skip the copy to userspace on the receiver. Useless in real life, but nice for testing maximum transmission speed (since it shows how fast things would be if the receiver could keep up) -- Pekka Pietikainen From owner-netdev@oss.sgi.com Tue Feb 6 13:32:09 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 13:31:59 -0800 Received: from mta6.snfc21.pbi.net ([206.13.28.240]:45776 "EHLO mta6.snfc21.pbi.net") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 13:31:32 -0800 Received: from kryptonite ([206.171.33.88]) by mta6.snfc21.pbi.net (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9) with SMTP id <0G8C004CVU6ZXD@mta6.snfc21.pbi.net> for netdev@oss.sgi.com; Tue, 6 Feb 2001 13:25:59 -0800 (PST) Date: Tue, 06 Feb 2001 13:26:36 -0800 From: David Brownell Subject: Help on a network driver ... To: netdev@oss.sgi.com Message-id: <059f01c09083$82ed9e20$6800000a@brownell.org> MIME-version: 1.0 X-Mailer: Microsoft Outlook Express 5.50.4133.2400 Content-type: text/plain; charset="iso-8859-1" Content-transfer-encoding: 7bit X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 X-Priority: 3 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I've hit a sticking point with a network driver, and am wondering if anyone here can provide some helpful suggestions. Seems like maybe I'm doing something the network layer doesn't like, but it's not been very helpful about helping figure out just what! So it's a good time to ask for help from folk who know the network stack better. DRIVER: "drivers/usb/usbnet.c" (2.4.1-ac2), earlier kernels have an earlier version with the same problem ("drivers/usb/net1080.c") which handled fewer device types. Basically, USB is used as a network, following the same model used by Win32 drivers for these devices: Ethernet-ish devices, with bridging so that end-users can use simple configurations (don't need to enable IP routing etc). Win32 appears to have something less powerful than the "bridge" driver built into its networking stack (no loops allowed). "Laplink" style usb-to-usb cables work this way, and smart-enough USB devices (iPaq, Yopy ... ;-) have been known to do the same thing. I've seen throughput up to about 700 KByte/sec with USB 1.1 ("12 Mbit/s"). I hope to see USB 2.0 ("480 Mbit/s") devices doing this sometime later this year! PROBLEM: TCP connections won't establish -- sometimes. The same driver executable may work one day, fail the next. If I watch things with "tcpdump" what I'll see is TCP setup packets (say for FTP, SSH, rpcinfo) arriving ... but no acks getting sent back. Meanwhile, "ping" traffic flies (both directions) without any problems at all. A while back I computed checksums of packets on both sides (tx/rx) and they were the same ... suggesting that if the data got corrupted, it happened after netif_rx(). The network service in question is "live" enough to access by loopback (on the server) or through a regular Ethernet connection, though I've sometimes seen strange messages from the networking code about not being able to find a route to an interface hosted by this driver. So what I'd like is suggestions that can help me figure out why TCP seems to be ignoring these packets ... though I'd not turn down a patch if one were offered! - Dave From owner-netdev@oss.sgi.com Tue Feb 6 13:38:39 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 13:38:29 -0800 Received: from pizda.ninka.net ([216.101.162.242]:6027 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 13:38:16 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id NAA20106; Tue, 6 Feb 2001 13:36:32 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14976.28256.593782.781889@pizda.ninka.net> Date: Tue, 6 Feb 2001 13:36:32 -0800 (PST) To: Pekka Pietikainen Cc: netdev@oss.sgi.com Subject: Re: zerocopy results on GigE In-Reply-To: <20010206194919.A633@netppl.fi> References: <20010206194919.A633@netppl.fi> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Pekka Pietikainen writes: > Jumbo frames, sender is dual pIII/500 with 32/66 PCI, receiver a dual pII/450 > with 32/33, both have 1MB Alteons. CPU use is measured using cyclesoak. > SO_RCVBUF/SO_SNDBUF set to 512k, no other sockopts touched. Strange, cpu usage is close to nothing for sendfile cases yet full bandwidth is not obtained. Here is what I am getting on UltraSparc systems: ZC results: 2.4.2pre1+zerocopy+sendfile acenic-->Syskonnect 120 MB/sec, 32% load 2.4.2pre1+zerocopy acenic-->Syskonnect 94 MB/sec, 52% load 2.4.2pre1+sendfile acenic-->Syskonnect 110 MB/sec, 49% load 2.4.2pre1 acenic-->Syskonnect 80 MB/sec, 49% load I am using the Netgear (ie. 512K memory) acenic cards as the sender. Your idea to experiment with removing the non-zerocopy related tweaks from the zerocopy patches may prove fruitful, please let everyone know what you do or do not discover here. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Feb 6 14:04:09 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 14:03:49 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:62726 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 14:03:18 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id XAA30079; Tue, 6 Feb 2001 23:02:55 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id XAA26471; Tue, 6 Feb 2001 23:02:55 +0100 To: "David S. Miller" Cc: Pekka Pietikainen , netdev@oss.sgi.com Subject: Re: zerocopy results on GigE References: <20010206194919.A633@netppl.fi> <14976.28256.593782.781889@pizda.ninka.net> From: Jes Sorensen Date: 06 Feb 2001 23:02:54 +0100 In-Reply-To: "David S. Miller"'s message of "Tue, 6 Feb 2001 13:36:32 -0800 (PST)" Message-ID: Lines: 17 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "David" == David S Miller writes: David> Pekka Pietikainen writes: >> Jumbo frames, sender is dual pIII/500 with 32/66 PCI, receiver a >> dual pII/450 with 32/33, both have 1MB Alteons. CPU use is measured >> using cyclesoak. SO_RCVBUF/SO_SNDBUF set to 512k, no other >> sockopts touched. David> Strange, cpu usage is close to nothing for sendfile cases yet David> full bandwidth is not obtained. Here is what I am getting on David> UltraSparc systems: One thing that might be worth investigating is that the AceNIC has a high latency for reading buffer descriptors. One of the plans I have is to linearize small skb's before handing them to the NIC. Jes From owner-netdev@oss.sgi.com Tue Feb 6 16:24:30 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 16:24:20 -0800 Received: from [203.126.247.144] ([203.126.247.144]:49398 "EHLO esngs144.nortelnetworks.com") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 16:24:02 -0800 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by esngs144.nortelnetworks.com; Wed, 7 Feb 2001 08:22:51 +0800 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id D84A6570; Wed, 7 Feb 2001 08:23:19 +0800 Received: from asiapacificm01.nt.com (pwold011.asiapac.nortel.com [47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id 1NV4D9QW; Wed, 7 Feb 2001 11:23:15 +1100 Message-ID: <3A809569.A9313C6F@asiapacificm01.nt.com> Date: Wed, 07 Feb 2001 00:23:05 +0000 From: "Andrew Morton" Organization: Nortel Networks, Wollongong Australia X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: Pekka Pietikainen , netdev@oss.sgi.com Subject: Re: zerocopy results on GigE References: <20010206194919.A633@netppl.fi> <14976.28256.593782.781889@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing "David S. Miller" wrote: > > Strange, cpu usage is close to nothing for sendfile cases yet full > bandwidth is not obtained. Is it possible that the receiver is going into discard, and TCP is backing off? From owner-netdev@oss.sgi.com Tue Feb 6 16:46:20 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 16:46:10 -0800 Received: from pizda.ninka.net ([216.101.162.242]:38284 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 16:45:49 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA21070; Tue, 6 Feb 2001 16:43:56 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14976.39500.484616.490586@pizda.ninka.net> Date: Tue, 6 Feb 2001 16:43:56 -0800 (PST) To: "Andrew Morton" Cc: Pekka Pietikainen , netdev@oss.sgi.com Subject: Re: zerocopy results on GigE In-Reply-To: <3A809569.A9313C6F@asiapacificm01.nt.com> References: <20010206194919.A633@netppl.fi> <14976.28256.593782.781889@pizda.ninka.net> <3A809569.A9313C6F@asiapacificm01.nt.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew Morton writes: > "David S. Miller" wrote: > > > > Strange, cpu usage is close to nothing for sendfile cases yet full > > bandwidth is not obtained. > > Is it possible that the receiver is going into discard, > and TCP is backing off? Or even worse, the Gige cards are emitting flow control frames. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Feb 6 18:07:11 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 18:06:50 -0800 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:58077 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 18:06:32 -0800 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Tue, 6 Feb 2001 20:01:19 -0600 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id 1LJK2B60; Tue, 6 Feb 2001 20:06:06 -0600 Received: from asiapacificm01.nt.com (pwold011.asiapac.nortel.com [47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id 1NV4D9T0; Wed, 7 Feb 2001 13:06:03 +1100 Message-ID: <3A80AD84.680BF8AA@asiapacificm01.nt.com> Date: Wed, 07 Feb 2001 02:05:56 +0000 From: "Andrew Morton" Organization: Nortel Networks, Wollongong Australia X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: Pekka Pietikainen , netdev@oss.sgi.com Subject: Re: zerocopy results on GigE References: <3A809569.A9313C6F@asiapacificm01.nt.com>, <20010206194919.A633@netppl.fi> <14976.28256.593782.781889@pizda.ninka.net> <3A809569.A9313C6F@asiapacificm01.nt.com> <14976.39500.484616.490586@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing "David S. Miller" wrote: > > Andrew Morton writes: > > "David S. Miller" wrote: > > > > > > Strange, cpu usage is close to nothing for sendfile cases yet full > > > bandwidth is not obtained. > > > > Is it possible that the receiver is going into discard, > > and TCP is backing off? > > Or even worse, the Gige cards are emitting flow control frames. > What would trigger that behaviour? Presumably, the lack of any Rx DMA descriptors. It would be interesting to bump up acenic's RX_RING_SIZE and friends, as well as the backlog queue. From owner-netdev@oss.sgi.com Tue Feb 6 23:45:43 2001 Received: by oss.sgi.com id ; Tue, 6 Feb 2001 23:45:23 -0800 Received: from pizda.ninka.net ([216.101.162.242]:12672 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Tue, 6 Feb 2001 23:45:12 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id XAA01253; Tue, 6 Feb 2001 23:43:40 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14976.64684.579919.558725@pizda.ninka.net> Date: Tue, 6 Feb 2001 23:43:40 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] New zerocopy patch. X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Against 2.4.2-pre1: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2p1-2.diff.gz Only one notable change since the last installment, but an important one: 1) When doing paged SKB sendmsg(), use csum_and_copy_from_user instead of copy_from_user. The problem is that there appears to be some performance bug with some x86 processors when doing non-8-byte aligned memcpy operations via rep/movsl (P-II Mendocino is one known chip with the problem). So this change aims to remove this x86 anomaly from the zerocopy performance characteristics so we can see if there are some real implementation issues compared to running without the zerocopy patch applied. This is not to say that the x86 memcpy performance thing is being ignored, Linus and others are working on what to do about that seperately. Please test, thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Feb 7 01:46:12 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 01:46:03 -0800 Received: from colin.muc.de ([193.149.48.1]:37898 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Wed, 7 Feb 2001 01:45:44 -0800 Received: by colin.muc.de id <140600-2>; Wed, 7 Feb 2001 10:45:27 +0100 Message-ID: <20010207104520.05560@colin.muc.de> From: Andi Kleen To: Andrew Morton Cc: "David S. Miller" , Pekka Pietikainen , netdev@oss.sgi.com Subject: Re: zerocopy results on GigE References: <20010206194919.A633@netppl.fi> <14976.28256.593782.781889@pizda.ninka.net> <3A809569.A9313C6F@asiapacificm01.nt.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <3A809569.A9313C6F@asiapacificm01.nt.com>; from Andrew Morton on Wed, Feb 07, 2001 at 01:25:14AM +0100 Date: Wed, 7 Feb 2001 10:45:21 +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Feb 07, 2001 at 01:25:14AM +0100, Andrew Morton wrote: > "David S. Miller" wrote: > > > > Strange, cpu usage is close to nothing for sendfile cases yet full > > bandwidth is not obtained. > > Is it possible that the receiver is going into discard, > and TCP is backing off? You can check all that by reading the TCP_INFO socket option from inside the process after a gap occurs. It unfortunately does not work from outside the process currently, except with some dirty unsafe hacks for socket /proc open. -Andi From owner-netdev@oss.sgi.com Wed Feb 7 01:58:32 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 01:58:13 -0800 Received: from pop3.galileo.co.il ([199.203.130.130]:9375 "EHLO galileo5.galileo.co.il") by oss.sgi.com with ESMTP id ; Wed, 7 Feb 2001 01:57:59 -0800 Received: from galileo.co.il (rabeeh@linux2.galileo.co.il [10.2.40.2]) by galileo.co.il (8.8.5/8.8.5) with ESMTP id LAA12458; Wed, 7 Feb 2001 11:58:05 +0200 (GMT-2) Message-ID: <3A811B95.4080303@galileo.co.il> Date: Wed, 07 Feb 2001 11:55:33 +0200 From: Rabeeh Khoury Organization: Galileo Technology User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.14-5.0 i686; en-US; m18) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: David Brownell CC: netdev@oss.sgi.com Subject: Re: Help on a network driver ... References: <059f01c09083$82ed9e20$6800000a@brownell.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I had a problem like this - If you have a socket buffer that you are going to handle to the upper layers, usually you write - netif_rx(skb); just add the following line - skb->pkt_type = PACKET_HOST; before the 'netif_rx(skb);' line. The reason for that is that the TCP layer doesn't accept any packet that the pkt_type is not a PACKET_HOST. This feature was added to TCP layer in order to distnguish between packets that are for the host and packets which are not, which is usually noticed when the port works in promiscuous mode ! Regards, Rabeeh David Brownell wrote: > Hi, > > I've hit a sticking point with a network driver, and am wondering if > anyone here can provide some helpful suggestions. Seems like maybe > I'm doing something the network layer doesn't like, but it's not been > very helpful about helping figure out just what! So it's a good time > to ask for help from folk who know the network stack better. > > > DRIVER: "drivers/usb/usbnet.c" (2.4.1-ac2), earlier kernels have > an earlier version with the same problem ("drivers/usb/net1080.c") > which handled fewer device types. > > Basically, USB is used as a network, following the same model used > by Win32 drivers for these devices: Ethernet-ish devices, with > bridging so that end-users can use simple configurations (don't need > to enable IP routing etc). Win32 appears to have something less > powerful than the "bridge" driver built into its networking stack (no > loops allowed). "Laplink" style usb-to-usb cables work this way, and > smart-enough USB devices (iPaq, Yopy ... ;-) have been known to > do the same thing. I've seen throughput up to about 700 KByte/sec > with USB 1.1 ("12 Mbit/s"). I hope to see USB 2.0 ("480 Mbit/s") > devices doing this sometime later this year! > > > PROBLEM: TCP connections won't establish -- sometimes. > The same driver executable may work one day, fail the next. > > If I watch things with "tcpdump" what I'll see is TCP setup packets > (say for FTP, SSH, rpcinfo) arriving ... but no acks getting sent > back. Meanwhile, "ping" traffic flies (both directions) without any > problems at all. A while back I computed checksums of packets > on both sides (tx/rx) and they were the same ... suggesting that if > the data got corrupted, it happened after netif_rx(). The network > service in question is "live" enough to access by loopback (on > the server) or through a regular Ethernet connection, though I've > sometimes seen strange messages from the networking code about > not being able to find a route to an interface hosted by this driver. > > > So what I'd like is suggestions that can help me figure out why TCP > seems to be ignoring these packets ... though I'd not turn down a > patch if one were offered! > > - Dave > > > > > From owner-netdev@oss.sgi.com Wed Feb 7 04:50:53 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 04:50:34 -0800 Received: from mean.netppl.fi ([195.242.208.16]:28166 "EHLO mean.netppl.fi") by oss.sgi.com with ESMTP id ; Wed, 7 Feb 2001 04:50:05 -0800 Received: from evil.netppl.fi (root@evil.netppl.fi [195.242.209.201]) by mean.netppl.fi (8.9.3/8.9.3) with ESMTP id OAA08478 for ; Wed, 7 Feb 2001 14:49:50 +0200 Received: (from pp@localhost) by evil.netppl.fi (8.9.3/8.9.3) id OAA22945 for netdev@oss.sgi.com; Wed, 7 Feb 2001 14:49:46 +0200 Date: Wed, 7 Feb 2001 14:49:46 +0200 From: Pekka Pietikainen To: netdev@oss.sgi.com Subject: Re: zerocopy results on GigE Message-ID: <20010207144946.A16003@netppl.fi> References: <20010206194919.A633@netppl.fi> <14976.28256.593782.781889@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <14976.28256.593782.781889@pizda.ninka.net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Feb 06, 2001 at 01:36:32PM -0800, David S. Miller wrote: > Your idea to experiment with removing the non-zerocopy related tweaks > from the zerocopy patches may prove fruitful, please let everyone > know what you do or do not discover here. That seems to be it. straight after reboot (this is the MSG_TRUNC test): [root@toy3 /root]# /src/yantt/yantt -c 192.168.9.2 Setting send window to 524288 (set by kernel to 1048576) Setting receive window to 524288 (set by kernel to 1048576) Connecting to port 5000 on host 192.168.9.2 315097088 80.984162MB/s 315097088 80.834002MB/s 315097088 80.853424MB/s 315097088 98.895074MB/s 315097088 98.731376MB/s 315097088 98.792009MB/s 315097088 99.132581MB/s 315097088 99.081884MB/s 315097088 99.156591MB/s dmesg: acenic.c: v0.50 02/02/2001 Jes Sorensen, linux-acenic@SunSITE.dk http://home.cern.ch/~jes/gige/acenic.html eth1: Alteon AceNIC Gigabit Ethernet at 0xfd6fc000, irq 9 Tigon II (Rev. 6), Firmware: 12.4.11, MAC: 00:60:6d:21:01:b2 PCI bus width: 32 bits, speed: 33MHz, latency: 64 clks PCI memory write & invalidate enabled by BIOS, enabling counter measures Enabling PCI Fast Back to Back eth1: Firmware up and running eth1: Optical link UP (FD+; FC: TX+, RX+) eth1: Enabling Jumbo frame support eth1: Jumbo ring flushed acenic.c: v0.50 02/02/2001 Jes Sorensen, linux-acenic@SunSITE.dk http://home.cern.ch/~jes/gige/acenic.html eth1: Alteon AceNIC Gigabit Ethernet at 0xfd6fc000, irq 9 Tigon II (Rev. 6), Firmware: 12.4.11, MAC: 00:60:6d:21:01:b2 PCI bus width: 32 bits, speed: 33MHz, latency: 64 clks Disabling PCI memory write and invalidate eth1: Firmware up and running eth1: Optical link UP eth1: Enabling Jumbo frame support eth1: Jumbo ring flushed acenic.c: v0.50 02/02/2001 Jes Sorensen, linux-acenic@SunSITE.dk http://home.cern.ch/~jes/gige/acenic.html eth1: Alteon AceNIC Gigabit Ethernet at 0xfd6fc000, irq 9 Tigon II (Rev. 6), Firmware: 12.4.11, MAC: 00:60:6d:21:01:b2 PCI bus width: 32 bits, speed: 33MHz, latency: 64 clks eth1: Firmware up and running eth1: Optical link UP (FD+; FC: TX+, RX+) eth1: Enabling Jumbo frame support One noticeable thing is that on the 3rd time, it didn't disable PCI memory write/invalidate like it did on the previous times. I'll continue poking around and see if I can make it work straight after rebooting. -- Pekka Pietikainen From owner-netdev@oss.sgi.com Wed Feb 7 06:14:04 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 06:13:54 -0800 Received: from mean.netppl.fi ([195.242.208.16]:19466 "EHLO mean.netppl.fi") by oss.sgi.com with ESMTP id ; Wed, 7 Feb 2001 06:13:47 -0800 Received: from evil.netppl.fi (root@evil.netppl.fi [195.242.209.201]) by mean.netppl.fi (8.9.3/8.9.3) with ESMTP id QAA10419 for ; Wed, 7 Feb 2001 16:13:34 +0200 Received: (from pp@localhost) by evil.netppl.fi (8.9.3/8.9.3) id QAA26976 for netdev@oss.sgi.com; Wed, 7 Feb 2001 16:13:34 +0200 Date: Wed, 7 Feb 2001 16:13:34 +0200 From: Pekka Pietikainen To: netdev@oss.sgi.com Subject: Re: zerocopy results on GigE Message-ID: <20010207161334.A26508@netppl.fi> References: <20010206194919.A633@netppl.fi> <14976.28256.593782.781889@pizda.ninka.net> <20010207144946.A16003@netppl.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <20010207144946.A16003@netppl.fi> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Feb 07, 2001 at 02:49:46PM +0200, Pekka Pietikainen wrote: > One noticeable thing is that on the 3rd time, it didn't disable PCI memory > write/invalidate like it did on the previous times. I'll continue poking > around and see if I can make it work straight after rebooting. Ha! --- acenic.c~ Wed Feb 7 15:00:51 2001 +++ acenic.c Wed Feb 7 15:08:45 2001 @@ -487,7 +487,7 @@ static int max_tx_desc[ACE_MAX_MOD_PARMS]; static int max_rx_desc[ACE_MAX_MOD_PARMS]; static int tx_ratio[ACE_MAX_MOD_PARMS]; -static int dis_pci_mem_inval[ACE_MAX_MOD_PARMS]; +static int dis_pci_mem_inval[ACE_MAX_MOD_PARMS] = {1, 1, 1, 1, 1, 1, 1, 1}; static char version[] __initdata = "acenic.c: v0.50 02/02/2001 Jes Sorensen, linux-acenic@SunSITE.dk\n" -- Pekka Pietikainen From owner-netdev@oss.sgi.com Wed Feb 7 06:53:14 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 06:53:05 -0800 Received: from mean.netppl.fi ([195.242.208.16]:50960 "EHLO mean.netppl.fi") by oss.sgi.com with ESMTP id ; Wed, 7 Feb 2001 06:52:46 -0800 Received: from evil.netppl.fi (root@evil.netppl.fi [195.242.209.201]) by mean.netppl.fi (8.9.3/8.9.3) with ESMTP id QAA11524 for ; Wed, 7 Feb 2001 16:52:32 +0200 Received: (from pp@localhost) by evil.netppl.fi (8.9.3/8.9.3) id QAA28874 for netdev@oss.sgi.com; Wed, 7 Feb 2001 16:52:32 +0200 Date: Wed, 7 Feb 2001 16:52:32 +0200 From: Pekka Pietikainen To: netdev@oss.sgi.com Subject: Re: zerocopy results on GigE Message-ID: <20010207165232.A27548@netppl.fi> References: <20010206194919.A633@netppl.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <20010206194919.A633@netppl.fi> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Feb 06, 2001 at 07:49:19PM +0200, Pekka Pietikainen wrote: Kick ass! Basically it's no worse than non-zc in anything (and it's slightly better even in tests where there is no zc happening, probably due to more aggressive acenic interrupt coalescing etc. settings). > 2.4.2-pre1 zerocopy-2.4.2-2 > > writing from buffer 70MB/s 54% 33% 72MB/s 40% 25% > +MSG_TRUNC on receiver 98MB/s 19% 46% 100MB/s 28% 60% > > 64M file > > read/write from file 50MB/s 33% 50% 54MB/s 40% 59% > +MSG_TRUNC 51MB/s 16% 53% 54MB/s 17% 60% > sendfile() 68MB/s 48% 41% 72MB/s 58% 11% > +MSG_TRUNC 93MB/s 36% 56% 100MB/s 31% 23% > mmap()/write 57MB/s 39% 32% 70MB/s 59% 47% > +MSG_TRUNC 87MB/s 28% 53% 100MB/s 32% 63% -- Pekka Pietikainen From owner-netdev@oss.sgi.com Wed Feb 7 07:28:14 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 07:28:04 -0800 Received: from pizda.ninka.net ([216.101.162.242]:8323 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Wed, 7 Feb 2001 07:27:58 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id HAA12820; Wed, 7 Feb 2001 07:26:26 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14977.26914.120545.466680@pizda.ninka.net> Date: Wed, 7 Feb 2001 07:26:26 -0800 (PST) To: Pekka Pietikainen Cc: netdev@oss.sgi.com Subject: Re: zerocopy results on GigE In-Reply-To: <20010207165232.A27548@netppl.fi> References: <20010206194919.A633@netppl.fi> <20010207165232.A27548@netppl.fi> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Pekka Pietikainen writes: > On Tue, Feb 06, 2001 at 07:49:19PM +0200, Pekka Pietikainen wrote: > > Kick ass! > OK, so what is the final change you made to reach this state? Can you send a patch? Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Feb 7 08:35:05 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 08:34:54 -0800 Received: from shaku.sfc.wide.ad.jp ([203.178.143.49]:40375 "EHLO shaku.v6.linux.or.jp") by oss.sgi.com with ESMTP id ; Wed, 7 Feb 2001 08:34:35 -0800 Received: from shaku.sfc.wide.ad.jp ([::1]) by shaku.v6.linux.or.jp (8.11.0/3.7W) with ESMTP id f17GWko15023; Thu, 8 Feb 2001 01:32:46 +0900 Date: Thu, 08 Feb 2001 01:32:46 +0900 Message-ID: From: Yuji Sekiya To: netdev@oss.sgi.com Cc: usagi-core@linux-ipv6.org Subject: Re: (usagi-core 01672) Re: [ANN] 2nd STABLE release of USAGI Project In-Reply-To: <5.0.2.1.0.20010206184415.019ca8a8@mail.bieringer.de> References: <5.0.2.1.0.20010206184415.019ca8a8@mail.bieringer.de> User-Agent: Wanderlust/2.4.0 (Rio) REMI/1.14.2 (=?SHIFT_JIS?Q?Hokuhoku-=D2?= =?SHIFT_JIS?Q?shima?=) Chao/1.14.1 (=?ISO-8859-1?Q?Rokujiz=F2?=) APEL/10.2 MULE XEmacs/21.1 (patch 9) (Canyonlands) (i686-pc-linux) Organization: Keio University MIME-Version: 1.0 (generated by REMI 1.14.2 - =?SHIFT_JIS?Q?=22Hokuhoku-=D2s?= =?SHIFT_JIS?Q?hima=22?=) Content-Type: text/plain; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At Tue, 06 Feb 2001 18:44:55 +0100, Peter Bieringer wrote: > > > - Linux kernel-2.4.0-usagi-20010205 > > > Based on Linux kernel-2.4.0, we have improved and implemented > > > same as kernel-2.2.18-usagi-20010205. > > > >Will the patch for 2.4.0 apply to 2.4.1? Or are you planning to release a > >2.4.1 snap soon? -Nathan > > Current patch will work on 2.4.1, too. Also the 2.2.18 works on 2.2.19pre8 > (tested yesterday!) Our cvs tree has already been synchronized with 2.4.1. Regards. -- Yuji Sekiya From owner-netdev@oss.sgi.com Wed Feb 7 11:41:44 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 11:41:34 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:13322 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 7 Feb 2001 11:41:18 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA05181; Wed, 7 Feb 2001 22:39:58 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102071939.WAA05181@ms2.inr.ac.ru> Subject: Re: zerocopy results on GigE To: davem@redhat.COM (David S. Miller) Date: Wed, 7 Feb 2001 22:39:58 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <14976.28256.593782.781889@pizda.ninka.net> from "David S. Miller" at Feb 7, 1 00:45:00 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 2335 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! Dave writes: > Strange, cpu usage is close to nothing for sendfile cases yet full > bandwidth is not obtained. He-he-he... It was the first puzzle, which I observed after zerocopy started to work. Throughput on Intel increses insignificantly, increases on alpha, but a lot of room for further increase remained (about 20% of cpu). Actually, even without zerocopy cpu at sender is underloaded a bit. Until now I have no idea, why this happens and how to fight this. Actually, this smells like a bug in TCP. But tcpdump does not discover anything pathological. Andrew writes: > Is it possible that the receiver is going into discard, > and TCP is backing off? In this case we would see not numbers ~90MB/sec, but something more spectacular. Even Jamal's 68MB/sec is not enough bad number for losses. 8) No, this is impossible. TCP has no such pathologies. Dave writes: > Or even worse, the Gige cards are emitting flow control frames. This does not happen. This can be diagnosed with acestat though. > What would trigger that behaviour? Presumably, the lack > of any Rx DMA descriptors. Shortage of MAC descriptors inside NIC, which can happen not only when there are not enough of RX descriptors. In practice flow control is triggered only if dma is slower than nic (I have never seen this on intel, but had to fight with this on alpha), or if one side emits stream of small packets with rate >100Kpps. This does not happen with TCP, especially, with jumbo mtu. Disabling flow control does not change TCP behaviour, by the way. Jes wrote: > One thing that might be worth investigating is that the AceNIC has > a high latency for reading buffer descriptors. One of the plans I have > is to linearize small skb's before handing them to the NIC. Small skbs in these tests are ACKs, they are linear. Also, even with host ring, all the fragment descriptors are read in one DMA transaction. Or do you mean reading data chunks, not descriptors? In any case, maximal latency is 5-7usec, which is not a big number for TCP with jumbo mtu, where latency is dominated by bulk dma. But, if my arithmetics is correct, this really puts theoretical limit on transmission of 1500 byte frames: ~90MB/sec. (BTW, Jes, you enabled tx host ring in the latest driver. Did you notice that it increases latency by ~1 usec?) Alexey From owner-netdev@oss.sgi.com Wed Feb 7 13:21:55 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 13:21:35 -0800 Received: from kanga.kvack.org ([216.129.200.3]:27141 "EHLO kanga.kvack.org") by oss.sgi.com with ESMTP id ; Wed, 7 Feb 2001 13:21:32 -0800 Received: (from localhost user: 'blah', uid#63042) by kanga.kvack.org with SMTP id ; Wed, 7 Feb 2001 16:18:11 -0500 Date: Wed, 7 Feb 2001 16:18:11 -0500 (EST) From: "Benjamin C.R. LaHaise" To: kuznet@ms2.inr.ac.ru cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: zerocopy results on GigE In-Reply-To: <200102071939.WAA05181@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, 7 Feb 2001 kuznet@ms2.inr.ac.ru wrote: > In any case, maximal latency is 5-7usec, which is not a big number > for TCP with jumbo mtu, where latency is dominated by bulk dma. > But, if my arithmetics is correct, this really puts theoretical > limit on transmission of 1500 byte frames: ~90MB/sec. > (BTW, Jes, you enabled tx host ring in the latest driver. > Did you notice that it increases latency by ~1 usec?) How many fragments are there in the typical tx packet? Are these machines with 64 bit / 66 MHz PCI, or just plain old 32 bit 33 MHz? If it's a 33 MHz bus, the overhead for starting new PCI cycles to gather fragments could be maxing out the bandwidth. A PCI analyzer could be helpful =) -ben From owner-netdev@oss.sgi.com Wed Feb 7 14:45:17 2001 Received: by oss.sgi.com id ; Wed, 7 Feb 2001 14:45:08 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:46352 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Wed, 7 Feb 2001 14:44:52 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id XAA22690; Wed, 7 Feb 2001 23:44:38 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id XAA24611; Wed, 7 Feb 2001 23:44:36 +0100 To: kuznet@ms2.inr.ac.ru Cc: davem@redhat.COM (David S. Miller), netdev@oss.sgi.com Subject: Re: zerocopy results on GigE References: <200102071939.WAA05181@ms2.inr.ac.ru> From: Jes Sorensen Date: 07 Feb 2001 23:44:36 +0100 In-Reply-To: kuznet@ms2.inr.ac.ru's message of "Wed, 7 Feb 2001 22:39:58 +0300 (MSK)" Message-ID: Lines: 35 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "ANK" == kuznet writes: ANK> Jes wrote: >> One thing that might be worth investigating is that the AceNIC has >> a high latency for reading buffer descriptors. One of the plans I >> have is to linearize small skb's before handing them to the NIC. ANK> Small skbs in these tests are ACKs, they are linear. ANK> Also, even with host ring, all the fragment descriptors are read ANK> in one DMA transaction. Or do you mean reading data chunks, not ANK> descriptors? I don't remember all the details, I just remembe Ted Schroeder (one of the Alteon founders) recommending me to linearize small transfers as loading buffer descriptors could cost up to 5us. ANK> In any case, maximal latency is 5-7usec, which is not a big ANK> number for TCP with jumbo mtu, where latency is dominated by bulk ANK> dma. But, if my arithmetics is correct, this really puts ANK> theoretical limit on transmission of 1500 byte frames: ~90MB/sec. ANK> (BTW, Jes, you enabled tx host ring in the latest driver. Did ANK> you notice that it increases latency by ~1 usec?) The numbers for Jumbo MTU's are not all that exciting, what really matters if how we perform on 1.5K packets. 95% of the switches on the market don't do 9K packets hence very very few people use it ;-( No I didn't notice the 1us extra latency, I made the change to reduce the slow writes to PCI shared mem which are becoming even more significant now with the increase in host memory speed and no increase in PCI speed. If it becomes a real issue we can stick the non host ring support back in. Jes From owner-netdev@oss.sgi.com Thu Feb 8 00:09:42 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 00:09:23 -0800 Received: from asbestos.linuxcare.com.au ([203.17.0.30]:3326 "EHLO halfway") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 00:09:06 -0800 Received: from halfway ([127.0.0.1] helo=linuxcare.com.au ident=rusty) by halfway with esmtp (Exim 3.22 #1 (Debian)) id 14QTTs-0005Yn-00; Wed, 07 Feb 2001 23:13:52 +1100 From: Rusty Russell To: "David S. Miller" , netdev@oss.sgi.com Subject: Is mac address reference safe w/ zero copy? Date: Wed, 07 Feb 2001 23:13:51 +1100 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi Dave, et al, The following code in route.c:ip_route_input_slow() if (dev->hard_header_len) { int i; unsigned char *p = skb->mac.raw; printk(KERN_WARNING "ll header: "); for (i=0; ihard_header_len; i++, p++) { printk("%02x", *p); Is this safe w/ paged skbs? Can I assume this in the ipt_mac mac-address-matching netfilter module (which can only be called on incoming or forwarded packets)? Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Thu Feb 8 00:16:12 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 00:15:53 -0800 Received: from pizda.ninka.net ([216.101.162.242]:21639 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 00:15:45 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id AAA06396; Thu, 8 Feb 2001 00:14:11 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14978.21843.569243.187755@pizda.ninka.net> Date: Thu, 8 Feb 2001 00:14:11 -0800 (PST) To: Rusty Russell Cc: netdev@oss.sgi.com Subject: Re: Is mac address reference safe w/ zero copy? In-Reply-To: References: X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rusty Russell writes: > Hi Dave, et al, > > The following code in route.c:ip_route_input_slow() > > if (dev->hard_header_len) { > int i; > unsigned char *p = skb->mac.raw; > printk(KERN_WARNING "ll header: "); > for (i=0; ihard_header_len; i++, p++) { > printk("%02x", *p); > > Is this safe w/ paged skbs? Can I assume this in the ipt_mac > mac-address-matching netfilter module (which can only be called on > incoming or forwarded packets)? It is safe unless you can show a place where an ethernet header lands somewhere other than skb->data on input :-) Format of input SKBs really have not changed at device header level. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Thu Feb 8 08:05:36 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 08:05:26 -0800 Received: from se1.cogenit.fr ([195.68.53.173]:37386 "EHLO se1.cogenit.fr") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 08:05:11 -0800 Received: (from romieu@localhost) by se1.cogenit.fr (8.11.1/8.11.1) id f18G53T10740 for netdev@oss.sgi.com; Thu, 8 Feb 2001 17:05:03 +0100 Date: Thu, 8 Feb 2001 17:05:02 +0100 From: Francois Romieu To: netdev@oss.sgi.com Subject: slowdown nt <-> linux Message-ID: <20010208170502.A10274@se1.cogenit.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3us X-Organisation: Marie's fan club Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I've got some strange bahaviours between an NT server and a Linux box. Say: 1 - Linux 1 -> NT client 1: fast (ftp, smb) 2 - NT server 1 -> NT client 1: fast (ftp, smb) 3 - NT server 1 -> NT server 2: fast (ftp, smb) 4 - Linux 1 -> NT server 1: slow as hell (ftp, smb) fast = some Mo/s on a 100Mb/s link. slow = 50-100 ko/s on the same link. Everybody is on the same switch. Linux is 2.2.18pre15. WFIW NT has SP6 applied. I tcpdump a smb transfer (4) and I'm frankly confused: [...] 14:53:52.356875 NT.1881 > gw-rdc-1.netbios-ssn: P 8857:8921(64) ack 89256 win 8760 (DF) [tos 0x20] 14:53:52.360751 gw-rdc-1.netbios-ssn > NT.1881: P 89256:90716(1460) ack 8921 win 32120 (DF) [tos 0x10] 14:53:52.360791 gw-rdc-1.netbios-ssn > NT.1881: P 90716:92176(1460) ack 8921 win 32120 (DF) [tos 0x10] 14:53:52.360812 gw-rdc-1.netbios-ssn > NT.1881: P 92176:93636(1460) ack 8921 win 32120 (DF) [tos 0x10] 14:53:52.360830 gw-rdc-1.netbios-ssn > NT.1881: . 93636:95096(1460) ack 8921 win 32120 (DF) [tos 0x10] ^^^^^ 14:53:52.360854 gw-rdc-1.netbios-ssn > NT.1881: . 95096:96556(1460) ack 8921 win 32120 (DF) [tos 0x10] 14:53:52.360874 gw-rdc-1.netbios-ssn > NT.1881: P 96556:97511(955) ack 8921 win 32120 (DF) [tos 0x10 ] 14:53:52.356875 NT.1881 > gw-rdc-1.netbios-ssn: . ack 92176 win 8760 (DF) [tos 0x20] 14:53:52.356875 NT.1881 > gw-rdc-1.netbios-ssn: . ack 93636 win 8760 (DF) [tos 0x20] ^^^^^ 14:53:52.356875 NT.1881 > gw-rdc-1.netbios-ssn: . ack 93636 win 8760 (DF) [tos 0x20] ^^^^^ Two acks with the same timestamp (+/-10ms slot I guess, no ?). 14:53:52.556900 gw-rdc-1.netbios-ssn > NT.1881: . 93636:95096(1460) ack 8921 win 32120 (DF) [tos 0x10] 14:53:52.556877 NT.1881 > gw-rdc-1.netbios-ssn: . ack 97511 win 8760 (DF) [tos 0x20] 14:53:52.556877 NT.1881 > gw-rdc-1.netbios-ssn: P 8921:8985(64) ack 97511 win 8760 (DF) [tos 0x20] 14:53:52.558006 gw-rdc-1.netbios-ssn > NT.1881: P 97511:98971(1460) ack 8985 win 32120 (DF) [tos 0x10] At a normal speed (1), I never see this kind of ack sequences. What do they mean ? -- Ueimor From owner-netdev@oss.sgi.com Thu Feb 8 08:34:45 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 08:34:35 -0800 Received: from se1.cogenit.fr ([195.68.53.173]:12043 "EHLO se1.cogenit.fr") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 08:34:30 -0800 Received: (from romieu@localhost) by se1.cogenit.fr (8.11.1/8.11.1) id f18GYR711211 for netdev@oss.sgi.com; Thu, 8 Feb 2001 17:34:27 +0100 Date: Thu, 8 Feb 2001 17:34:26 +0100 From: Francois Romieu To: netdev@oss.sgi.com Subject: Re: slowdown nt <-> linux Message-ID: <20010208173426.A10997@se1.cogenit.fr> References: <20010208170502.A10274@se1.cogenit.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0pre3us In-Reply-To: <20010208170502.A10274@se1.cogenit.fr> X-Organisation: Marie's fan club Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Francois Romieu écrit : [...] Ok, it happens on the 8th packet that Linux sent, something around 16ko... No comment. -- Ueimor From owner-netdev@oss.sgi.com Thu Feb 8 09:22:15 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 09:22:06 -0800 Received: from pop3.galileo.co.il ([199.203.130.130]:10886 "EHLO galileo5.galileo.co.il") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 09:21:50 -0800 Received: from galileo.co.il (rabeeh@linux2.galileo.co.il [10.2.40.2]) by galileo.co.il (8.8.5/8.8.5) with ESMTP id TAA24679 for ; Thu, 8 Feb 2001 19:22:12 +0200 (GMT-2) Message-ID: <3A82D52A.6000003@galileo.co.il> Date: Thu, 08 Feb 2001 19:19:38 +0200 From: Rabeeh Khoury Organization: Galileo Technology User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.14-5.0 i686; en-US; m18) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Linux router Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi All,

I would like to ask about Linux as a router if it has the following functionality -

Is it possible that the kernel routes packets according to the destination port AND the interface where the packet is received from ?

If so, please point me to a documentation regarding it (if available) ; and what it is called in the kernel source tree.

If not, is there any alternatives ways such adding packages to the kernel sources in order to get this functionality ?

Regards,
Rabeeh
From owner-netdev@oss.sgi.com Thu Feb 8 10:40:36 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 10:40:16 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:9222 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 8 Feb 2001 10:40:08 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA22345; Thu, 8 Feb 2001 21:39:43 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102081839.VAA22345@ms2.inr.ac.ru> Subject: Re: zerocopy results on GigE To: blah@kvack.org (Benjamin C.R. LaHaise) Date: Thu, 8 Feb 2001 21:39:43 +0300 (MSK) Cc: davem@redhat.COM, netdev@oss.sgi.com In-Reply-To: from "Benjamin C.R. LaHaise" at Feb 7, 1 04:18:11 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 949 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > How many fragments are there in the typical tx packet? 2 or 3 for mtu=1500 (66% of two fragment and 33% of three fragment) Well, you may count this assuming that header (ethernet + ip + tcp/udp) is the first fragment, and payload consists of 4K pages. > Are these machines > with 64 bit / 66 MHz PCI, or just plain old 32 bit 33 MHz? If it's a 33 > MHz bus, the overhead for starting new PCI cycles to gather fragments > could be maxing out the bandwidth. A PCI analyzer could be helpful =) Yes, all the experimental data are about 32/33 pci. Actually, it is not pci latency, but plain straight acenic bug. F.e. e1000 does not have this overhead and reachs pps rates at least twice more than acenic not straining muscles. 8) >=5 usecs per dma transaction is something wicked, but it is not a fatal flaw. At least, it should not be sensed by TCP, it is parallel to software/protocol latencies and completely hidden. Alexey From owner-netdev@oss.sgi.com Thu Feb 8 11:00:06 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 10:59:56 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:32775 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 8 Feb 2001 10:59:31 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA23074; Thu, 8 Feb 2001 21:59:09 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102081859.VAA23074@ms2.inr.ac.ru> Subject: Re: zerocopy results on GigE To: jes@linuxcare.com (Jes Sorensen) Date: Thu, 8 Feb 2001 21:59:09 +0300 (MSK) Cc: davem@redhat.COM, netdev@oss.sgi.com In-Reply-To: from "Jes Sorensen" at Feb 7, 1 11:44:36 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 2026 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I don't remember all the details, I just remembe Ted Schroeder (one of > the Alteon founders) recommending me to linearize small transfers as > loading buffer descriptors could cost up to 5us. It is even > 5usec. ~5 usec is plain featureless mode. Each feature adds ~1sec: tx host ring? +1usec. tx checksumming? +1usec Well, driver should not be bothered about this: it is problem of protocol not to generate silly packets shredded to small pieces. F.e. current TCP _does_ generate telnet or lat_tcp packets with 1 byte fragment. So what? Latency does not change, these 5usecs are something ridiculous comparing to latency caused by broken mitigation timer in acenic. Thoughput is bogus in any case. > ANK> dma. But, if my arithmetics is correct, this really puts > ANK> theoretical limit on transmission of 1500 byte frames: ~90MB/sec. ... > The numbers for Jumbo MTU's are not all that exciting, what really > matters if how we perform on 1.5K packets. 95% of the switches on the > market don't do 9K packets hence very very few people use it ;-( I said exactly about 1500 mtu. To sense these 5usecs on this, we should reach such rates, where it becomes important for beginning. We did not. We are still bounded by software latencies. > No I didn't notice the 1us extra latency, I made the change to reduce > the slow writes to PCI shared mem which are becoming even more > significant now with the increase in host memory speed and no increase > in PCI speed. If it becomes a real issue we can stick the non host > ring support back in. Well, I reminded this because it were you who bothered about latency. 8)8) Actually, this feature was added to tux some time ago exactly by the same reason. 8) Plus, it allows to load whole set of fragment descriptors at one DMA transaction. Not big win, if to believe to Ingo's results, but something yet. tx host ring reduces maximal pps, reached with acenic by 20%. That's all. So what? No problems, we are not going to compete with XXMegapps switches. Alexey From owner-netdev@oss.sgi.com Thu Feb 8 11:21:56 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 11:21:36 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:59655 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 8 Feb 2001 11:21:27 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA23368; Thu, 8 Feb 2001 22:20:56 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102081920.WAA23368@ms2.inr.ac.ru> Subject: Re: Is mac address reference safe w/ zero copy? To: rusty@linuxcare.COM.AU (Rusty Russell) Date: Thu, 8 Feb 2001 22:20:56 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Rusty Russell" at Feb 8, 1 11:15:00 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 850 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! The answer to the question is "yes". > The following code in route.c:ip_route_input_slow() > > if (dev->hard_header_len) { But this code is buggy even without split skbs! Good discovery. 8) mac.raw+dev->hard_header_len is not guaranteed to be inside packet. It should terminate at skb->tail. Generally, each level has its headers on their usual place. If you are inside IP, mac and ip headers are there, where you expect them. But udp/tcp/icmp headers are _not_, they are pulled only by udp/tcp/icmp modules. Actually, the biggest advantage of split skbs is that it forces to think about such questions. 8) Look f.e. into 2.2: trans.proxy and masquerading make hundred of illegal accesses to protocol headers. Paul, could you send me snapshots of your work on this? Probably, we could make this faster and more clean then. Alexey From owner-netdev@oss.sgi.com Thu Feb 8 13:11:27 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 13:11:08 -0800 Received: from mta5.snfc21.pbi.net ([206.13.28.241]:38883 "EHLO mta5.snfc21.pbi.net") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 13:10:38 -0800 Received: from kryptonite ([206.170.6.124]) by mta5.snfc21.pbi.net (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9) with SMTP id <0G8G00JJ0I9TBW@mta5.snfc21.pbi.net> for netdev@oss.sgi.com; Thu, 8 Feb 2001 12:58:43 -0800 (PST) Date: Thu, 08 Feb 2001 12:59:29 -0800 From: David Brownell Subject: Re: Help on a network driver ... [strange TCP problems] To: Rabeeh Khoury Cc: netdev@oss.sgi.com Message-id: <054701c09212$0806bda0$6800000a@brownell.org> MIME-version: 1.0 X-Mailer: Microsoft Outlook Express 5.50.4133.2400 Content-type: text/plain; charset="iso-8859-1" Content-transfer-encoding: 7bit X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 References: <059f01c09083$82ed9e20$6800000a@brownell.org> <3A811B95.4080303@galileo.co.il> X-Priority: 3 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rebeeh, > just add the following line - > > skb->pkt_type = PACKET_HOST; That's being done by default in the sk_buff constructors, and nobody's changing that default. Plus, if that were the problem (vs say memory corruption, for which there is no evidence at all) then TCP would never work at all (not the symptom :-). I just got a report of a symptom, likely related, where tcpdump shows the ACK is getting sent and then dropped. (Below, it's not even getting sent.) And in fact, I just saw this version a few moments ago myself. I've seen this problem with this driver off and on since about test8; it might have been there in test6/test7 when the driver was first developed (it's still "experimental") without being noticed ("intermittent"), but in any case this is rather perplexing. What do folk do when trying to get TCP to tell them what's up? Enabling TCP_DEBUG or NETDEBUG is no help at all. - Dave > > PROBLEM: TCP connections won't establish -- sometimes. > > The same driver executable may work one day, fail the next. > > > > If I watch things with "tcpdump" what I'll see is TCP setup packets > > (say for FTP, SSH, rpcinfo) arriving ... but no acks getting sent > > back. Meanwhile, "ping" traffic flies (both directions) without any > > problems at all. A while back I computed checksums of packets > > on both sides (tx/rx) and they were the same ... From owner-netdev@oss.sgi.com Thu Feb 8 16:46:40 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 16:46:30 -0800 Received: from lobo.net ([216.84.148.4]:40465 "HELO lobo.net") by oss.sgi.com with SMTP id ; Thu, 8 Feb 2001 16:46:05 -0800 Received: (qmail 22646 invoked from network); 9 Feb 2001 00:43:13 -0000 Received: from tc10-086.lobo.net (HELO lobo.net) (216.84.148.176) by lobo.net with SMTP; 9 Feb 2001 00:43:13 -0000 Message-ID: <3A833BB9.B3B087C7@lobo.net> Date: Thu, 08 Feb 2001 17:37:14 -0700 From: John Brockmeyer Organization: BCS X-Mailer: Mozilla 4.05 [en] (Win95; I) MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: docs on network device driver design, 2.4 vs 2.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I teach Linux Device Drivers courses for Intel and IBM, among others. I include in it a chapter on network device drivers for linux 2.2. I have read several network device drivers for 2.4 but would appreciate any info you might have on a general network device driver design document, since much has changed since 2.2. Sorry to bother you John Brockmeyer From owner-netdev@oss.sgi.com Thu Feb 8 17:36:10 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 17:36:00 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:26526 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 17:35:52 -0800 Received: from fred.muc.de (noidentity@ns1139.munich.netsurf.de [195.180.235.139]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id CAA22409; Fri, 9 Feb 2001 02:35:45 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id 0750FE3913; Fri, 9 Feb 2001 02:33:51 +0100 (CET) Date: Fri, 9 Feb 2001 02:33:50 +0100 From: Andi Kleen To: David Brownell Cc: Rabeeh Khoury , netdev@oss.sgi.com Subject: Re: Help on a network driver ... [strange TCP problems] Message-ID: <20010209023350.B15861@fred.local> References: <059f01c09083$82ed9e20$6800000a@brownell.org> <3A811B95.4080303@galileo.co.il> <054701c09212$0806bda0$6800000a@brownell.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <054701c09212$0806bda0$6800000a@brownell.org>; from david-b@pacbell.net on Thu, Feb 08, 2001 at 10:12:14PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, Feb 08, 2001 at 10:12:14PM +0100, David Brownell wrote: > I just got a report of a symptom, likely related, where > tcpdump shows the ACK is getting sent and then dropped. > (Below, it's not even getting sent.) And in fact, I just saw > this version a few moments ago myself. This usually points to checksum problems. tcpdump doesn't check checksums. > > I've seen this problem with this driver off and on since > about test8; it might have been there in test6/test7 when > the driver was first developed (it's still "experimental") > without being noticed ("intermittent"), but in any case > this is rather perplexing. What do folk do when trying > to get TCP to tell them what's up? Enabling TCP_DEBUG > or NETDEBUG is no help at all. Add printks or ktrace calls until you get enlighted. You could e.g. instrument __kfree_skb to see where it is freed (using __builtin_return_address(0) when you're using the right compiler) -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Thu Feb 8 17:36:30 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 17:36:11 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:24734 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 17:35:52 -0800 Received: from fred.muc.de (noidentity@ns1139.munich.netsurf.de [195.180.235.139]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id CAA22410; Fri, 9 Feb 2001 02:35:45 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id D5AFBE3447; Fri, 9 Feb 2001 02:31:03 +0100 (CET) Date: Fri, 9 Feb 2001 02:31:03 +0100 From: Andi Kleen To: John Brockmeyer Cc: netdev@oss.sgi.com Subject: Re: docs on network device driver design, 2.4 vs 2.2 Message-ID: <20010209023103.A15861@fred.local> References: <3A833BB9.B3B087C7@lobo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <3A833BB9.B3B087C7@lobo.net>; from jab@lobo.net on Fri, Feb 09, 2001 at 01:47:29AM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, Feb 09, 2001 at 01:47:29AM +0100, John Brockmeyer wrote: > I teach Linux Device Drivers courses for Intel and IBM, among others. I include > in it a chapter on > network device drivers for linux 2.2. > I have read several network device drivers for 2.4 but would appreciate any info > you might > have on a general network device driver design document, since much has changed > since 2.2. Try http://www.firstfloor.org/~andi/softnet Unfortunately some minor things have changed again since these mails. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Thu Feb 8 20:31:12 2001 Received: by oss.sgi.com id ; Thu, 8 Feb 2001 20:31:02 -0800 Received: from mta6.snfc21.pbi.net ([206.13.28.240]:53656 "EHLO mta6.snfc21.pbi.net") by oss.sgi.com with ESMTP id ; Thu, 8 Feb 2001 20:30:47 -0800 Received: from kryptonite ([206.170.7.83]) by mta6.snfc21.pbi.net (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9) with SMTP id <0G8H00GJ631JIC@mta6.snfc21.pbi.net> for netdev@oss.sgi.com; Thu, 8 Feb 2001 20:27:20 -0800 (PST) Date: Thu, 08 Feb 2001 18:16:34 -0800 From: David Brownell Subject: Re: Help on a network driver ... To: Rabeeh Khoury Cc: netdev@oss.sgi.com Message-id: <06a401c09250$b5510a40$6800000a@brownell.org> MIME-version: 1.0 X-Mailer: Microsoft Outlook Express 5.50.4133.2400 Content-type: text/plain; charset="iso-8859-1" Content-transfer-encoding: 7bit X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 References: <059f01c09083$82ed9e20$6800000a@brownell.org> <3A811B95.4080303@galileo.co.il> X-Priority: 3 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > The reason for that is that the TCP layer doesn't accept any packet that > the pkt_type is not a PACKET_HOST. This feature was added to TCP layer > in order to distnguish between packets that are for the host and packets > which are not, which is usually noticed when the port works in > promiscuous mode ! Turns out pkt_type may have figured in the problem (didn't look too much at that code) because some intermediate layer set pkt_type to the wrong value -- not the driver, a multicast bit in the MAC address was set wrong. Nothing gave a diagnostic. - Dave From owner-netdev@oss.sgi.com Fri Feb 9 00:23:03 2001 Received: by oss.sgi.com id ; Fri, 9 Feb 2001 00:22:53 -0800 Received: from pizda.ninka.net ([216.101.162.242]:22400 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Fri, 9 Feb 2001 00:22:44 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id AAA01092; Fri, 9 Feb 2001 00:21:14 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14979.43130.731593.90703@pizda.ninka.net> Date: Fri, 9 Feb 2001 00:21:14 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] zerocopy patch against 2.4.2-pre2 X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing As usual: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2p2-1.diff.gz It's updated to be against the latest (2.4.2-pre2) and I've removed the non-zerocopy related fixes from the patch (because I've sent them under seperate cover to Linus). Enjoy. As usual, I am very seriously interested in any bugs or performance problems introduced by this patch. Thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Feb 9 02:47:44 2001 Received: by oss.sgi.com id ; Fri, 9 Feb 2001 02:47:35 -0800 Received: from zikova.cvut.cz ([147.32.235.100]:41993 "EHLO zikova.cvut.cz") by oss.sgi.com with ESMTP id ; Fri, 9 Feb 2001 02:47:10 -0800 Received: from vcnet.vc.cvut.cz (vcnet.vc.cvut.cz [147.32.240.61]) by zikova.cvut.cz (8.9.0.Beta5/8.9.0.Beta5) with ESMTP id LAA65070; Fri, 9 Feb 2001 11:45:57 +0100 Received: from VCNET/SpoolDir by vcnet.vc.cvut.cz (Mercury 1.21); 9 Feb 101 11:46:00 MET-1MEST Received: from SpoolDir by VCNET (Mercury 1.30); 9 Feb 101 11:45:44 MET-1MEST From: "Petr Vandrovec" Organization: CC CTU Prague To: netdev@oss.sgi.com Date: Fri, 9 Feb 2001 11:45:36 MET-1 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: assertion tp->lost_out == 0 failed CC: davem@redhat.com X-mailer: Pegasus Mail v3.40 Message-ID: <14FAF640553C@vcnet.vc.cvut.cz> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi Dave, hi others, I know that 2.4.0-test7-pre4 is very old, so just FYI, today at 11:32 CET platan.vc.cvut.cz logged following message: KERNEL: assertion (tp->lost_out == 0) failed at tcp_input.c(1173):tcp_remove_reno_sacks I'll compile new kernel ASAP, but as it happened once during 119 days uptime, it may be hard to reproduce it ;-) Thanks, Petr Vandrovec vandrove@vc.cvut.cz (and yes, it is different machine than one for which I reported other failed assertions during November/December) From owner-netdev@oss.sgi.com Fri Feb 9 04:03:26 2001 Received: by oss.sgi.com id ; Fri, 9 Feb 2001 04:02:55 -0800 Received: from c4.h061013036.is.net.tw ([61.13.36.4]:36872 "EHLO exchsmtp.via.com.tw") by oss.sgi.com with ESMTP id ; Fri, 9 Feb 2001 04:02:33 -0800 Received: by EXCHSMTP with Internet Mail Service (5.5.2650.21) id ; Fri, 9 Feb 2001 20:02:23 +0800 Message-ID: <611C3E2A972ED41196EF0050DA92E076010B346F@EXCHANGE2> From: Yiping Chen To: "'netdev@oss.sgi.com'" Subject: Question about priority transmit in linux. Date: Fri, 9 Feb 2001 20:02:44 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="BIG5" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, sorry to bother you. I am a network adapter linux driver programmer, I am trying to write driver support vlan. But I don't know where to get the IEEE 802.1Q tag information including priority and VLAN ID from upper protocol. Whether we can get such informations from skbuff structure? I have read Intel NIC driver, they get this kind information from cb field in skbuff structure when transmit frame. But I am not sure what's the meaning of cb field in skbuff structure. Would you mind giving me any suggestions ? Thanks!! -------------------------------------------------- Yiping Chen VIA Technologies, Inc. LAN Software 533 Chung Cheng Road 8F Hsin Tien, Taipei Taiwan TEL : 886-2-22185452 EXT.7512 FAX : 886-2-22187527 E-mail : YipingChen@via.com.tw From owner-netdev@oss.sgi.com Fri Feb 9 12:23:58 2001 Received: by oss.sgi.com id ; Fri, 9 Feb 2001 12:23:38 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:14854 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 9 Feb 2001 12:23:21 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA11696; Fri, 9 Feb 2001 23:22:59 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102092022.XAA11696@ms2.inr.ac.ru> Subject: Re: assertion tp->lost_out == 0 failed To: VANDROVE@vc.cvut.cz (Petr Vandrovec) Date: Fri, 9 Feb 2001 23:22:59 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <14FAF640553C@vcnet.vc.cvut.cz> from "Petr Vandrovec" at Feb 9, 1 02:15:00 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 284 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I'll compile new kernel ASAP, but as it happened once during > 119 days uptime, it may be hard to reproduce it ;-) Do not worry too much. Information that this happens is more than enough to find reason of incorrect logic assumption. (And this is surely harmless.) Alexey From owner-netdev@oss.sgi.com Sat Feb 10 00:26:02 2001 Received: by oss.sgi.com id ; Sat, 10 Feb 2001 00:25:43 -0800 Received: from mail.bieringer.de ([195.226.187.51]:6664 "HELO titan.bieringer.de") by oss.sgi.com with SMTP id ; Sat, 10 Feb 2001 00:25:32 -0800 Received: (qmail 8498 invoked from network); 10 Feb 2001 08:25:29 -0000 Received: from p3e9b8e55.dip.t-dialin.net (HELO worker.bieringer.de) (62.155.142.85) by mail.bieringer.de with SMTP; 10 Feb 2001 08:25:29 -0000 Message-Id: <5.0.2.1.0.20010210092546.0222bd28@mail.bieringer.de> X-Sender: list4peter@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Sat, 10 Feb 2001 09:26:53 +0100 To: netdev@oss.sgi.com From: Peter Bieringer Subject: Re: IPv6 & 2.2.17 + 2.4.0: autoconfiguration works only on bootup In-Reply-To: <20010204163125.B2993@nic.nigdzie> References: <5.0.2.1.0.20010203135649.00b1b090@mail.bieringer.de> <5.0.2.1.0.20010203135649.00b1b090@mail.bieringer.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At 16:31 04.02.2001, Jacek Konieczny wrote: >On Sat, Feb 03, 2001 at 02:16:23PM +0100, Peter Bieringer wrote: > > following scenario: > > > > Gateway (running radvd) [00:50:BF:06:B4:F5] and a client with one > interface > > (forwarding disabled) [00:e0:18:90:92:05]. > > > > Client boots, neigborhood detection works and take the given prefix from > > radvd to assign a global address for the one and only interface. >[...] > > But if I switch down and up the network (completly or only interface), the > > mechanism won't work: >[...] > > Happens with 2.2.17 and 2.4.0. > > > > Can someone reproduce this? Any hints? >I have reported this bug twice (AFAIR I have found it in 2.2.14 kernel). >And I have even sent a patch (which is apllied to PLD distributions' >kernel (used by many people here in poland) sice then) which fixes this. >The problem is, that "all-nodes" multicast address is added to interface >on its creation (eg. when module is loaded) and is removed, when >interface goes down. But it is not added again when interface goes up. >It is the most painful on machines which use DHCP for IPv4 address >allocation --- dhcpcd seems to put interface up and down on startup. Applying newest USAGI kernel patches solve this problem. Anyone knows when they will be migrated to standard kernel? Peter From owner-netdev@oss.sgi.com Sat Feb 10 03:57:44 2001 Received: by oss.sgi.com id ; Sat, 10 Feb 2001 03:57:34 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:2987 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Sat, 10 Feb 2001 03:57:19 -0800 Received: from fred.muc.de (noidentity@ns1207.munich.netsurf.de [195.180.235.207]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id MAA01433; Sat, 10 Feb 2001 12:57:12 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id 87C56E3911; Fri, 9 Feb 2001 14:43:45 +0100 (CET) Date: Fri, 9 Feb 2001 14:43:45 +0100 From: Andi Kleen To: Yiping Chen Cc: "'netdev@oss.sgi.com'" Subject: Re: Question about priority transmit in linux. Message-ID: <20010209144345.A3742@fred.local> References: <611C3E2A972ED41196EF0050DA92E076010B346F@EXCHANGE2> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <611C3E2A972ED41196EF0050DA92E076010B346F@EXCHANGE2>; from YipingChen@via.com.tw on Fri, Feb 09, 2001 at 01:04:16PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, Feb 09, 2001 at 01:04:16PM +0100, Yiping Chen wrote: > Hi, sorry to bother you. > I am a network adapter linux driver programmer, I am trying to write driver > support vlan. > But I don't know where to get the IEEE 802.1Q tag information including > priority and VLAN ID from upper protocol. > Whether we can get such informations from skbuff structure? > I have read Intel NIC driver, they get this kind information from cb field > in skbuff structure when transmit frame. > But I am not sure what's the > meaning of cb field in skbuff structure. > Would you mind giving me any suggestions ? Linux currently has no support for VLANs in any standard kernel, so therefore there is no standard way to get it. Intel iirc has an own VLAN implementation, there are also others. -Andi From owner-netdev@oss.sgi.com Sat Feb 10 05:19:46 2001 Received: by oss.sgi.com id ; Sat, 10 Feb 2001 05:19:36 -0800 Received: from f00f.stub.clear.net.nz ([203.167.224.51]:3590 "HELO metastasis.f00f.org") by oss.sgi.com with SMTP id ; Sat, 10 Feb 2001 05:19:16 -0800 Received: by metastasis.f00f.org (Postfix, from userid 1000) id C46D3A509; Sun, 11 Feb 2001 02:19:13 +1300 (NZDT) Date: Sun, 11 Feb 2001 02:19:13 +1300 From: Chris Wedgwood To: Andi Kleen Cc: Yiping Chen , netdev@oss.sgi.com, Dave Miller , Alexey Kuznetsov Subject: Re: Question about priority transmit in linux. Message-ID: <20010211021913.D9570@metastasis.f00f.org> References: <611C3E2A972ED41196EF0050DA92E076010B346F@EXCHANGE2> <20010209144345.A3742@fred.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010209144345.A3742@fred.local>; from ak@muc.de on Fri, Feb 09, 2001 at 02:43:45PM +0100 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, Feb 09, 2001 at 02:43:45PM +0100, Andi Kleen wrote: Linux currently has no support for VLANs in any standard kernel, so therefore there is no standard way to get it. Intel iirc has an own VLAN implementation, there are also others. I'd really like to see one of the two (three) implementations out there cleaned up and integrated real soon if possible. There is .1q aware hardware so having some infrastructure will help guide people there. Not to mention, it's quite a useful thing to have, something NetBSD has had for a while and FreeBSD for a little less time now. Dave/Alexey -- have either of you two looked at the code that is out there and considered how difficult integration might be? --cw From owner-netdev@oss.sgi.com Sat Feb 10 12:21:51 2001 Received: by oss.sgi.com id ; Sat, 10 Feb 2001 12:21:42 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:24325 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Sat, 10 Feb 2001 12:21:26 -0800 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.8.7) with ESMTP id OAA16035; Sat, 10 Feb 2001 14:29:11 -0700 Message-ID: <3A85B2A7.AB1C222C@candelatech.com> Date: Sat, 10 Feb 2001 14:29:11 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.16 i586) X-Accept-Language: en MIME-Version: 1.0 To: Chris Wedgwood CC: Andi Kleen , Yiping Chen , netdev@oss.sgi.com, Dave Miller , Alexey Kuznetsov Subject: Re: Question about priority transmit in linux. References: <611C3E2A972ED41196EF0050DA92E076010B346F@EXCHANGE2> <20010209144345.A3742@fred.local> <20010211021913.D9570@metastasis.f00f.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Chris Wedgwood wrote: > > On Fri, Feb 09, 2001 at 02:43:45PM +0100, Andi Kleen wrote: > > Linux currently has no support for VLANs in any standard kernel, > so therefore there is no standard way to get it. Intel iirc has > an own VLAN implementation, there are also others. > > I'd really like to see one of the two (three) implementations out > there cleaned up and integrated real soon if possible. > > There is .1q aware hardware so having some infrastructure will help > guide people there. Not to mention, it's quite a useful thing to > have, something NetBSD has had for a while and FreeBSD for a little > less time now. > > Dave/Alexey -- have either of you two looked at the code that is out > there and considered how difficult integration might be? I believe I've cleaned up most of the complaints I've heard in the past with my VLAN patch (http://scry.wanfear.com/~greear/vlan.html) I would be honored if it found it's way into the kernel. It has been in use for some time and seems to be pretty stable (there is a small compile problem when compiled as a module..I'll fix when I update the patch to the next 2.4.2 kernel) If anyone has any comments/suggestions, please let me know..and feel free to cc the vlan@scry.wanfear.com mailing list (if you're subscribed) Thanks, Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Sat Feb 10 20:14:45 2001 Received: by oss.sgi.com id ; Sat, 10 Feb 2001 20:14:35 -0800 Received: from mx2.idealab.com ([64.208.8.4]:34757 "HELO corleone.idealab.com") by oss.sgi.com with SMTP id ; Sat, 10 Feb 2001 20:14:21 -0800 Received: (qmail 16394 invoked by alias); 11 Feb 2001 04:14:10 -0000 Received: (qmail 16366 invoked from network); 11 Feb 2001 04:14:09 -0000 Received: from unknown (HELO mindspring.com) (63.194.21.72) by corleone.idealab.com with SMTP; 11 Feb 2001 04:14:09 -0000 Message-ID: <3A8610AB.9040105@mindspring.com> Date: Sat, 10 Feb 2001 20:10:19 -0800 From: Jeff McClure User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.1 i686; en-US; m18) Gecko/20010208 X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: IPChains support in 2.4.1 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hey Team,, I'm not sure who to report this to, this is the best I could figure out by looking through the source code and the MAINTAINERS file. I'm using 2.4.1,and had some difficulty getting the IPChains backward compatability working. The net of it is I couldn't get it working as a module no matter what I did (the module complained when it loaded), but it works fine compiled into the kernel. I hope you find this message helpful. Jeff McClure From owner-netdev@oss.sgi.com Sun Feb 11 00:51:28 2001 Received: by oss.sgi.com id ; Sun, 11 Feb 2001 00:51:18 -0800 Received: from colin.muc.de ([193.149.48.1]:42000 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Sun, 11 Feb 2001 00:51:00 -0800 Received: by colin.muc.de id <140565-3>; Sun, 11 Feb 2001 09:50:46 +0100 Message-ID: <20010211095035.11498@colin.muc.de> From: Andi Kleen To: Jeff McClure Cc: netdev@oss.sgi.com Subject: Re: IPChains support in 2.4.1 References: <3A8610AB.9040105@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <3A8610AB.9040105@mindspring.com>; from Jeff McClure on Sun, Feb 11, 2001 at 05:15:23AM +0100 Date: Sun, 11 Feb 2001 09:50:36 +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, Feb 11, 2001 at 05:15:23AM +0100, Jeff McClure wrote: > Hey Team,, > > I'm not sure who to report this to, this is the best I could figure out > by looking through the source code and the MAINTAINERS file. > > I'm using 2.4.1,and had some difficulty getting the IPChains backward > compatability working. The net of it is I couldn't get it working as a > module no matter what I did (the module complained when it loaded), but > it works fine compiled into the kernel. > > I hope you find this message helpful. Not particular, you forgot to include the complaints of the module. Here modular ipchains works fine. -Andi From owner-netdev@oss.sgi.com Sun Feb 11 05:04:28 2001 Received: by oss.sgi.com id ; Sun, 11 Feb 2001 05:04:19 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:2979 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Sun, 11 Feb 2001 05:04:02 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id AAA23440; Mon, 12 Feb 2001 00:03:44 +1100 (EST) Message-ID: <3A868FF3.BC7F6679@uow.edu.au> Date: Mon, 12 Feb 2001 00:13:23 +1100 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.2-pre2 i586) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] zerocopy patch against 2.4.2-pre2 References: <14979.43130.731593.90703@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing "David S. Miller" wrote: > > As usual: > > ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2p2-1.diff.gz > > It's updated to be against the latest (2.4.2-pre2) and I've removed > the non-zerocopy related fixes from the patch (because I've sent them > under seperate cover to Linus). > Changing the memory copy function did make some difference in my setup. But the performance drop on send(8k) is only approx 10%, partly because I changed the way I'm testing it - `cyclesoak' is now penalised more heavily by cache misses, and amount of cache missing which networking causes cyclesoak is basically the same, whether or not the ZC patch is applied. I tried a number of things to try to optimise this situation on an SG-capable NIC with the ZC patch: while (more_to_send) { read(fd, buf, 8192); send(sock, buf, 8192); } Things I tried: - Use the csum_copy() functions - Use copy_from_user() - Use copy_from_user if src and dest are 8-byte aligned, else use csum_copy. - Force data alignment. Explain: If an application sends a few bytes to a connection (say, some headers) and then starts pumping bulk data down the same connection, we end up in the situation where the source of a copy_from_user is poorly aligned, and it *stays* that way for the whole operation. This is because new, incoming data is always tacked onto the end of the socket write buffer. Copying from a poorly aligned source address takes 1.5 to 2 times as long, depending upon the combination of source-cached and dest-cached. So I special-cased this in tcp_sendmsg: if we see a large write from userspace and we're poorly aligned then just send out a single undersized frame so we can drop back into alignment. This didn't make a lot of difference, which perhaps indicates that the dominating factor is misses, not alignment. If it _is_ misses, they're probably due to aliasing - Ingo said his toy has 2 megs of full-speed L2. - skbuff_cache. Explain: When we build an skbuff for ZC transmit it is always the same size - it only holds the headers. The data is put into the fragment buffer. So I created a slab cache for skbuffs whose data length is <= 256 bytes, and used that. This didn't make much difference. send(8k), no SG 19.2% send(8k), SG, csum_copy 20.3% send(8k), SG, copy_from_user 20.9% send(8k), SG, choose copy 20.6% (huh?) send(8k), SG, page-aligned, choose copy 20.3% send(8k), SG, page-aligned, csum_copy 20.2% send(8k), SG, csum_copy, skbuff_cache 20.5% (huh?) send(8k), SG, csum_copy, skbuff_cache, page-aligned 20.2% send(8k), SG, copy_from_user, skbuff_cache, page-aligned 20.2% That's all pretty uninteresting, except for the observation that not using Pentium string ops on un-8byte-aligned is the biggest win. And the two huhs, the first of which is bizarre. I've checked that code over and over: if (((long)_from | (long)_to) & 7) csum_and_copy() else copy_from_user() and it's slower than an unconditional csum_and_copy(). Wierd. The profiles are more interesting: send(8k), no SG 18.2% ========================================= c0224734 tcp_transmit_skb 47 0.0347 c01127dc schedule 54 0.0340 c021599c ip_output 54 0.1688 c010a768 handle_IRQ_event 55 0.4583 c02041ec skb_release_data 60 0.5357 c0211068 ip_route_input 69 0.1938 c022abac tcp_v4_rcv 75 0.0470 c0215adc ip_queue_xmit 76 0.0571 c0204410 skb_clone 85 0.1986 c0219a54 tcp_sendmsg_copy 99 0.0270 c02209fc tcp_clean_rtx_queue 101 0.1153 c02042c4 __kfree_skb 113 0.3404 c024a3cc csum_partial_copy_generic 436 1.7581 c0125580 file_read_actor 548 6.5238 00000000 total 2874 0.0021 send(8k), SG, csum copy 20.3% ========================================= c0211068 ip_route_input 47 0.1320 c011be60 del_timer 49 0.6806 c021599c ip_output 49 0.1531 c010a768 handle_IRQ_event 56 0.4667 c022abac tcp_v4_rcv 66 0.0414 c02041ec skb_release_data 69 0.6161 c0215adc ip_queue_xmit 69 0.0518 c0204410 skb_clone 70 0.1636 c02042c4 __kfree_skb 96 0.2892 c02209fc tcp_clean_rtx_queue 100 0.1142 c021b6fc tcp_sendmsg 109 0.0439 c021a8ac do_tcp_sendpages 152 0.0440 c024a3cc csum_partial_copy_generic 520 2.0968 c0125580 file_read_actor 615 7.3214 00000000 total 3222 0.0024 Note that in each and every profile which I've taken with the ZC patch, file_read_actor() took a big hit. That's the read from the kernel into userspace. Which confirms that ZC is more cache-hungry. Also, note that the sum of ZC's do_tcp_sendpages and tcp_sendmsg is considerably higher than non-ZC's tcp_sendmsg_copy. These profiles also show us the aggregate impact of the ZC patch: 3222/2874 = 1.12. Twelve percent. The device driver cost isn't shown here because it's in a module. Here is a profile with a non-modular driver: c020833c skb_release_data 54 0.3857 c0218d4c ip_queue_xmit 63 0.0497 c010a768 handle_IRQ_event 70 0.5833 c024d4f8 __strncpy_from_user 74 2.0556 c021e53c tcp_sendmsg 75 0.0302 c022d9ec tcp_v4_rcv 77 0.0482 c022383c tcp_clean_rtx_queue 90 0.1027 c0208434 __kfree_skb 104 0.4000 c021d6ec do_tcp_sendpages 122 0.0353 c01d3a08 boomerang_rx 150 0.1276 c01d32ac boomerang_interrupt 287 0.2781 c01d2c94 boomerang_start_xmit 324 0.4475 c024d1cc csum_partial_copy_generic 443 1.7863 c01255f0 file_read_actor 643 7.6548 00000000 total 3852 0.0028 Device driver cost is 15-20% of the kernel occupancy. We can pull ZC's 12% cost down a little bit by using a slab cache on the skbuffs, and by forcing better copy alignment (not sure how though). Also perhaps by doing some colouring on the data structures which are being used. These differences are all quite small, which makes this stuff pretty tricky. I now need to go off and hunt down a Pentium performance counter patch which take less than a year to understand, find out where these alleged misses are happening. - From owner-netdev@oss.sgi.com Sun Feb 11 11:54:23 2001 Received: by oss.sgi.com id ; Sun, 11 Feb 2001 11:54:03 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:5126 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Sun, 11 Feb 2001 11:53:49 -0800 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.8.7) with ESMTP id OAA01616 for ; Sun, 11 Feb 2001 14:01:47 -0700 Message-ID: <3A86FDBB.32E04656@candelatech.com> Date: Sun, 11 Feb 2001 14:01:47 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.16 i586) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: testing list (sorry) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Wonder if this will go through... Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Sun Feb 11 12:04:03 2001 Received: by oss.sgi.com id ; Sun, 11 Feb 2001 12:03:53 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:13574 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Sun, 11 Feb 2001 12:03:37 -0800 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.8.7) with ESMTP id OAA01661 for ; Sun, 11 Feb 2001 14:11:38 -0700 Message-ID: <3A87000A.6F65E998@candelatech.com> Date: Sun, 11 Feb 2001 14:11:38 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.16 i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Re: Question about priority transmit in linux. References: <611C3E2A972ED41196EF0050DA92E076010B346F@EXCHANGE2> <20010209144345.A3742@fred.local> <20010211021913.D9570@metastasis.f00f.org> <3A85B2A7.AB1C222C@candelatech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This didn't seem to make the list earlier...or I hosed up my mail setup somehow.... Ben Greear wrote: > > Chris Wedgwood wrote: > > > > On Fri, Feb 09, 2001 at 02:43:45PM +0100, Andi Kleen wrote: > > > > Linux currently has no support for VLANs in any standard kernel, > > so therefore there is no standard way to get it. Intel iirc has > > an own VLAN implementation, there are also others. > > > > I'd really like to see one of the two (three) implementations out > > there cleaned up and integrated real soon if possible. > > > > There is .1q aware hardware so having some infrastructure will help > > guide people there. Not to mention, it's quite a useful thing to > > have, something NetBSD has had for a while and FreeBSD for a little > > less time now. > > > > Dave/Alexey -- have either of you two looked at the code that is out > > there and considered how difficult integration might be? > > I believe I've cleaned up most of the complaints I've heard in the > past with my VLAN patch (http://scry.wanfear.com/~greear/vlan.html) > > I would be honored if it found it's way into the kernel. It has > been in use for some time and seems to be pretty stable (there > is a small compile problem when compiled as a module..I'll fix > when I update the patch to the next 2.4.2 kernel) > > If anyone has any comments/suggestions, please let me know..and > feel free to cc the vlan@scry.wanfear.com mailing list (if you're > subscribed) > > Thanks, > Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Feb 12 22:59:03 2001 Received: by oss.sgi.com id ; Mon, 12 Feb 2001 22:58:44 -0800 Received: from pizda.ninka.net ([216.101.162.242]:60802 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 12 Feb 2001 22:58:28 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id WAA03317; Mon, 12 Feb 2001 22:56:46 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14984.55981.974147.573306@pizda.ninka.net> Date: Mon, 12 Feb 2001 22:56:45 -0800 (PST) To: Andrew Morton Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] zerocopy patch against 2.4.2-pre2 In-Reply-To: <3A868FF3.BC7F6679@uow.edu.au> References: <14979.43130.731593.90703@pizda.ninka.net> <3A868FF3.BC7F6679@uow.edu.au> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew Morton writes: > Changing the memory copy function did make some difference > in my setup. But the performance drop on send(8k) is only approx 10%, > partly because I changed the way I'm testing it - `cyclesoak' is > now penalised more heavily by cache misses, and amount of cache > missing which networking causes cyclesoak is basically the same, > whether or not the ZC patch is applied. Ok ok ok, but are we at the point where there are no sizable "over the wire" performance anomalies anymore? That is what is important, what are the localhost bandwidth measurements looking like for you now with/without the patch applied? I want to reach a known state where we can conclude "over the wire is about as good or better than before, but there is a cpu/cache usage penalty from the zerocopy stuff". This is important. It lets us get to the next stage which is to use your tools, numbers, and some profiling to see if we can get some of that cpu overhead back. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 12 23:15:14 2001 Received: by oss.sgi.com id ; Mon, 12 Feb 2001 23:15:04 -0800 Received: from pizda.ninka.net ([216.101.162.242]:4483 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 12 Feb 2001 23:14:55 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id XAA03384; Mon, 12 Feb 2001 23:13:15 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14984.56970.908321.405130@pizda.ninka.net> Date: Mon, 12 Feb 2001 23:13:14 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] zerocopy + powder rule X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing The only change is to update things to 2.4.2-pre3: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2p3-1.diff.gz All the reports I am getting now appear to be consistent, and they all basically show me that: 1) There are no known bugs (as in things that crash the kernel or corrupt data) 2) The loopback etc. raw performance anomalies have been killed by the P-II Mendocino unaligned memcpy workaround. 3) The acenic/gbit performance anomalies have been cured by reverting the PCI mem_inval tweaks. 4) The zerocopy patches have a small yet non-neglible cpu usage cost for normal write/send/sendmsg. If this truly is the current state of affairs, then I am pretty happy as this is where I wanted things to be when I first began to publish these zerocopy diffs. The next step is to begin profiling things heavily to see if we can back some of that extra cpu usage the pages SKBs afford us. Due to the powder rule (Lake Tahoe received 6 or so feet of snow this past weekend) I will be a bit quiet until Friday night. However, I'll be doing my own profiling of the zerocopy stuff on my laptop while I'm up there. Later, David Snowboard Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Feb 13 05:19:15 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 05:19:05 -0800 Received: from isis.its.uow.edu.au ([130.130.68.21]:60393 "EHLO isis.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Tue, 13 Feb 2001 05:18:55 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by isis.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id AAA29829; Wed, 14 Feb 2001 00:17:30 +1100 (EST) Message-ID: <3A89362E.A0DE6C14@uow.edu.au> Date: Wed, 14 Feb 2001 00:27:10 +1100 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.2-pre2 i586) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] zerocopy patch against 2.4.2-pre2 References: <3A868FF3.BC7F6679@uow.edu.au>, <14979.43130.731593.90703@pizda.ninka.net> <3A868FF3.BC7F6679@uow.edu.au> <14984.55981.974147.573306@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing "David S. Miller" wrote: > > Andrew Morton writes: > > Changing the memory copy function did make some difference > > in my setup. But the performance drop on send(8k) is only approx 10%, > > partly because I changed the way I'm testing it - `cyclesoak' is > > now penalised more heavily by cache misses, and amount of cache > > missing which networking causes cyclesoak is basically the same, > > whether or not the ZC patch is applied. > > Ok ok ok, but are we at the point where there are no sizable "over the > wire" performance anomalies anymore? That is what is important, what > are the localhost bandwidth measurements looking like for you now > with/without the patch applied? Using 2.4.2-pre3 + zerocopy-2.4.2p3-1.diff All numbers in megs/sec zcc/zcs is doing read(8k)/send(8k) to localhost. On the dual 500MHz PII: zcc/zcs bw_tcp Unpatched: 70 66 Patched: 67 66 Single 500MHz PII: Unpatched: 58 54 Patched: 49 52 Single 650MHz PIII Coppermine: Unpatched: 140 180-250 Patched: 107 159 With or without ZC, there is Wierd Stuff happening with local networking. Throughput is all over the place. - With zcs reporting throughput once per second, the numbers were jumping around by +/-10%. Had to bump the averaging period to 5 seconds to make much sense of it. With a real network, they're rock solid. - The difference between the PII and PIII is far beyond anything I see with any other workload. - The difference between zcc/zcs and bw_tcp on the PIII is interesting. It's still apparent when zcc/zcs uses a 64k transfer buffer, like bw_tcp. zcc/zcs is doing file system reads, whereas bw_tcp isn't. But the discrepancy isn't there on the PII. - On the unpatched kernel, I saw one bw_tcp run after a reboot report 410 Mbytes/sec. Thereafter it's around 210. err.. make that 180. No, make that 254. WTF? Amongst all the noise it seems there's a problem on the PIII but not the PII. It's getting very lonely testing this stuff. It would be useful if someone else could help out - at least running the bw_tcp tests. It's pretty simple: bw_tcp -s ; bw_tcp 0 > I want to reach a known state where we can conclude "over the wire is > about as good or better than before, but there is a cpu/cache usage > penalty from the zerocopy stuff". > > This is important. It lets us get to the next stage which is to > use your tools, numbers, and some profiling to see if we can get > some of that cpu overhead back. Seems, with the 100baseT NIC the performance drop on the Coppermine is only half that of the Mendocino. I _think_ the Mendocino is only 4-way associative, but reports vary on this. Coppermine is 8-way. - From owner-netdev@oss.sgi.com Tue Feb 13 07:50:26 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 07:50:16 -0800 Received: from mgw-x1.nokia.com ([131.228.20.21]:45238 "EHLO mgw-x1.nokia.com") by oss.sgi.com with ESMTP id ; Tue, 13 Feb 2001 07:50:05 -0800 Received: from esvir05nok.ntc.nokia.com (esvir05nokt.ntc.nokia.com [172.21.143.37]) by mgw-x1.nokia.com (Switch-2.1.0/Switch-2.1.0) with ESMTP id f1DFnaS10058 for ; Tue, 13 Feb 2001 17:49:37 +0200 (EET) Received: from esebh03nok.ntc.nokia.com (unverified) by esvir05nok.ntc.nokia.com (Content Technologies SMTPRS 4.2.1) with ESMTP id for ; Tue, 13 Feb 2001 17:50:02 +0200 Received: by esebh03nok with Internet Mail Service (5.5.2652.78) id <1LYNVFZH>; Tue, 13 Feb 2001 17:50:02 +0200 Message-ID: <2D6CADE9B0C6D411A27500508BB3CBD063CD76@eseis15nok> From: Imran.Patel@nokia.com To: netdev@oss.sgi.com Subject: icmpv6 Date: Tue, 13 Feb 2001 17:50:01 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2652.78) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing hello, i am currently working on a nat-pt (rfc 2766) implementation on linux. But i have just run into some sort of problem. i would like to convert between icmpv4 and icmpv6. But the skbuff handed over to me doesn't have any icmpv6 member in the transport header. I have seen in the skbuff.h and i have found out that there is no icmpv6hdr member in the h union. so how do i get a reference to the icmpv6 data. I have used icmphdr instead and i am getting weird results. ps: i am not on the list TIA, imran From owner-netdev@oss.sgi.com Tue Feb 13 08:48:48 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 08:48:38 -0800 Received: from colin.muc.de ([193.149.48.1]:46599 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Tue, 13 Feb 2001 08:48:16 -0800 Received: by colin.muc.de id <140608-3>; Tue, 13 Feb 2001 17:48:02 +0100 Message-ID: <20010213174759.47213@colin.muc.de> From: Andi Kleen To: Imran.Patel@nokia.com Cc: netdev@oss.sgi.com Subject: Re: icmpv6 References: <2D6CADE9B0C6D411A27500508BB3CBD063CD76@eseis15nok> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <2D6CADE9B0C6D411A27500508BB3CBD063CD76@eseis15nok>; from Imran.Patel@nokia.com on Tue, Feb 13, 2001 at 04:51:21PM +0100 Date: Tue, 13 Feb 2001 17:48:00 +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Feb 13, 2001 at 04:51:21PM +0100, Imran.Patel@nokia.com wrote: > hello, > i am currently working on a nat-pt (rfc 2766) implementation on linux. But i > have just run into some sort of problem. i would like to convert between > icmpv4 and icmpv6. But the skbuff handed over to me doesn't have any icmpv6 > member in the transport header. I have seen in the skbuff.h and i have found > out that there is no icmpv6hdr member in the h union. so how do i get a > reference to the icmpv6 data. I have used icmphdr instead and i am getting > weird results. You didn't describe how you hook into the stack. In case of a netfilter module you currently need to walk the complete Ipv6 header chain to find the protocol header you want, beginning with the ipv6 header (->nh.ipv6h) -Andi From owner-netdev@oss.sgi.com Tue Feb 13 09:03:18 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 09:03:08 -0800 Received: from mgw-x3.nokia.com ([131.228.20.26]:39934 "EHLO mgw-x3.nokia.com") by oss.sgi.com with ESMTP id ; Tue, 13 Feb 2001 09:02:54 -0800 Received: from esvir07nok.ntc.nokia.com (esvir07nokt.ntc.nokia.com [172.21.143.39]) by mgw-x3.nokia.com (Switch-2.1.0/Switch-2.1.0) with ESMTP id f1DH3Ej25790 for ; Tue, 13 Feb 2001 19:03:14 +0200 (EET) Received: from esebh12nok.ntc.nokia.com (unverified) by esvir07nok.ntc.nokia.com (Content Technologies SMTPRS 4.2.1) with ESMTP id for ; Tue, 13 Feb 2001 19:02:50 +0200 Received: by esebh12nok with Internet Mail Service (5.5.2652.78) id <1LYRCTBB>; Tue, 13 Feb 2001 19:02:50 +0200 Message-ID: <2D6CADE9B0C6D411A27500508BB3CBD063CD79@eseis15nok> From: Imran.Patel@nokia.com To: netdev@oss.sgi.com Subject: FW: icmpv6 Date: Tue, 13 Feb 2001 19:02:49 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2652.78) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > -----Original Message----- > From: Patel Imran (NET/Helsinki) > Sent: 13. February 2001 18:53 > To: 'ext Andi Kleen' > Subject: RE: icmpv6 > > > > You didn't describe how you hook into the stack. In case of a > > netfilter > > module you currently need to walk the complete Ipv6 header > > chain to find > > the protocol header you want, beginning with the ipv6 header > > (->nh.ipv6h) > > i hook into pre-routing hook. but the point is i can't access > the icmpv6 header by skb->h.icmpv6h because icmp6 is not > listed as one of the protocol in the transport header union of skbuff. > > regards, > imran > From owner-netdev@oss.sgi.com Tue Feb 13 10:25:37 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 10:25:17 -0800 Received: from colin.muc.de ([193.149.48.1]:37130 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Tue, 13 Feb 2001 10:25:12 -0800 Received: by colin.muc.de id <140598-2>; Tue, 13 Feb 2001 19:25:01 +0100 Message-ID: <20010213192458.31598@colin.muc.de> From: Andi Kleen To: Imran.Patel@nokia.com Cc: netdev@oss.sgi.com Subject: Re: FW: icmpv6 References: <2D6CADE9B0C6D411A27500508BB3CBD063CD79@eseis15nok> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <2D6CADE9B0C6D411A27500508BB3CBD063CD79@eseis15nok>; from Imran.Patel@nokia.com on Tue, Feb 13, 2001 at 06:04:00PM +0100 Date: Tue, 13 Feb 2001 19:24:59 +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Feb 13, 2001 at 06:04:00PM +0100, Imran.Patel@nokia.com wrote: > > > > -----Original Message----- > > From: Patel Imran (NET/Helsinki) > > Sent: 13. February 2001 18:53 > > To: 'ext Andi Kleen' > > Subject: RE: icmpv6 > > > > > > > You didn't describe how you hook into the stack. In case of a > > > netfilter > > > module you currently need to walk the complete Ipv6 header > > > chain to find > > > the protocol header you want, beginning with the ipv6 header > > > (->nh.ipv6h) > > > > i hook into pre-routing hook. but the point is i can't access > > the icmpv6 header by skb->h.icmpv6h because icmp6 is not > > listed as one of the protocol in the transport header union of skbuff. In prerouting the stack hasn't tried to parse any extension header yet. You have to do that yourself. -Andi From owner-netdev@oss.sgi.com Tue Feb 13 11:17:07 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 11:16:58 -0800 Received: from mplspop4.mpls.uswest.net ([204.147.80.14]:36111 "HELO mplspop4.mpls.uswest.net") by oss.sgi.com with SMTP id ; Tue, 13 Feb 2001 11:16:47 -0800 Received: (qmail 5606 invoked from network); 13 Feb 2001 19:15:29 -0000 Received: from gate-mn.bravidacorp.com (HELO rick.bravidacorp.com) (38.192.40.194) by mplspop4.mpls.uswest.net with SMTP; 13 Feb 2001 19:15:29 -0000 Received: (from rick@localhost) by rick.bravidacorp.com (8.11.0/8.11.0) id f1DJFSV01777 for netdev@oss.sgi.com; Tue, 13 Feb 2001 13:15:28 -0600 Date: Tue, 13 Feb 2001 13:15:28 -0600 Message-ID: <20010213131528.A1746@bravidacorp.com> From: "Rick Richardson" To: netdev@oss.sgi.com Subject: External Loopback with crossed route entries Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I saw the thread/argument on a 2.0.36 patch so that you could do external ethernet loopback with a crossover cable using crossed route entries. The thread was dated 03/29/1999. Apparently, the patch was rejected by Alan Cox. Well, it turns out I'm just beating my head against this same wall. But using kernel 2.4.0. Can't seem to get crossed routing with an external loopback to work. I need this for a big simulation project I am working on. Any idea if 2.4.0 supports crossed routing between two ethernet ports out of the box? If not, is there a patch yet for 2.4.0? -Rick -- Rick Richardson rick@bravidacorp.com http://home.mn.rr.com/richardsons/ From owner-netdev@oss.sgi.com Tue Feb 13 15:28:39 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 15:28:29 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:21725 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Tue, 13 Feb 2001 15:28:10 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id KAA17344; Wed, 14 Feb 2001 10:27:52 +1100 (EST) Message-ID: <3A89C2F5.853775B8@uow.edu.au> Date: Tue, 13 Feb 2001 23:27:49 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: Rick Richardson CC: netdev@oss.sgi.com Subject: Re: External Loopback with crossed route entries References: <20010213131528.A1746@bravidacorp.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rick Richardson wrote: > > I saw the thread/argument on a 2.0.36 patch so that you could do > external ethernet loopback with a crossover cable using crossed route > entries. The thread was dated 03/29/1999. Apparently, the patch was > rejected by Alan Cox. > > Well, it turns out I'm just beating my head against this same wall. > But using kernel 2.4.0. Can't seem to get crossed routing with an > external loopback to work. I need this for a big simulation project I > am working on. > > Any idea if 2.4.0 supports crossed routing between two ethernet ports > out of the box? If not, is there a patch yet for 2.4.0? I bet you can do it with policy routing. Send us a little ASCII network diagram. From owner-netdev@oss.sgi.com Tue Feb 13 15:57:38 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 15:57:28 -0800 Received: from mplspop2.mpls.uswest.net ([204.147.80.4]:28688 "HELO mplspop2.mpls.uswest.net") by oss.sgi.com with SMTP id ; Tue, 13 Feb 2001 15:57:16 -0800 Received: (qmail 42303 invoked from network); 13 Feb 2001 23:57:15 -0000 Received: from gate-mn.bravidacorp.com (HELO rick.bravidacorp.com) (38.192.40.194) by mplspop2.mpls.uswest.net with SMTP; 13 Feb 2001 23:57:15 -0000 Received: (from rick@localhost) by rick.bravidacorp.com (8.11.0/8.11.0) id f1DNvDv02418; Tue, 13 Feb 2001 17:57:13 -0600 Date: Tue, 13 Feb 2001 17:57:13 -0600 Message-ID: <20010213175713.B1249@bravidacorp.com> From: "Rick Richardson" To: "Andrew Morton" Cc: netdev@oss.sgi.com Subject: Re: External Loopback with crossed route entries References: <20010213131528.A1746@bravidacorp.com> <3A89C2F5.853775B8@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A89C2F5.853775B8@uow.edu.au>; from andrewm@uow.edu.au on Tue, Feb 13, 2001 at 11:27:49PM +0000 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Feb 13, 2001 at 11:27:49PM +0000, Andrew Morton wrote: > Rick Richardson wrote: > > > > I saw the thread/argument on a 2.0.36 patch so that you could do > > external ethernet loopback with a crossover cable using crossed route > > entries. The thread was dated 03/29/1999. Apparently, the patch was > > rejected by Alan Cox. > > > > Well, it turns out I'm just beating my head against this same wall. > > But using kernel 2.4.0. Can't seem to get crossed routing with an > > external loopback to work. I need this for a big simulation project I > > am working on. > > > > Any idea if 2.4.0 supports crossed routing between two ethernet ports > > out of the box? If not, is there a patch yet for 2.4.0? > > I bet you can do it with policy routing. Send us a little > ASCII network diagram. +----------------------+ | | | +------------------+| | | eth0 192.168.1.1 |----------+ | +------------------+| | | | | crossover cable | +------------------+| | | | eth1 192.168.2.1 |----------+ | +------------------+| | | | PC with two I/F's | +----------------------+ A "ping 192.168.2.1" should go out the top port (eth0), over the crossover cable, and back in to the lower port (eth1). I tried these crossed route entries that I could not get to work... # route add -host 192.168.2.1 eth0 # route add -host 192.168.1.1 eth1 -Rick -- Rick Richardson rick@bravidacorp.com http://home.mn.rr.com/richardsons/ Genetic manipulation will lead to swimmers born with flipper feet. Hows the IOC gonna stop that? From owner-netdev@oss.sgi.com Tue Feb 13 17:03:19 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 17:03:10 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:11237 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Tue, 13 Feb 2001 17:02:41 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id MAA12305; Wed, 14 Feb 2001 12:02:27 +1100 (EST) Message-ID: <3A89D921.F5B18A96@uow.edu.au> Date: Wed, 14 Feb 2001 01:02:25 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: Rick Richardson CC: Andrew Morton , netdev@oss.sgi.com Subject: Re: External Loopback with crossed route entries References: <20010213131528.A1746@bravidacorp.com> <3A89C2F5.853775B8@uow.edu.au> <20010213175713.B1249@bravidacorp.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rick Richardson wrote: > > +----------------------+ > | | > | +------------------+| eth0 > | | eth0 192.168.1.1 |----------+ > | +------------------+| | > | | | crossover cable > | +------------------+| eth1 | > | | eth1 192.168.2.1 |----------+ > | +------------------+| > | | > | PC with two I/F's | > +----------------------+ > > A "ping 192.168.2.1" should go out the top port (eth0), over the > crossover cable, and back in to the lower port (eth1). > Interesting. # There's probably a simpler way of doing this # # Open the links ip link set eth0 up ip link set eth1 up # Give them an IP ip addr add 192.168.1.1 dev eth0 ip addr add 192.168.1.2 dev eth1 # kill the local routes ip route del 192.168.1.1 dev eth0 table local ip route del 192.168.1.2 dev eth1 table local # Provide the crossed routes ip route add 192.168.1.2 dev eth0 ip route add 192.168.1.1 dev eth1 # OK, now packets cross over the loopback. Now what? This will send packets across the loopback, but when they arrive back, they of course loop-unto-TTL-expiry. What do you want to do with the incoming packets? From owner-netdev@oss.sgi.com Tue Feb 13 18:48:09 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 18:47:49 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:47373 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Tue, 13 Feb 2001 18:47:20 -0800 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.8.7) with ESMTP id UAA11809; Tue, 13 Feb 2001 20:56:12 -0700 Message-ID: <3A8A01DC.1504DCCD@candelatech.com> Date: Tue, 13 Feb 2001 20:56:12 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.16 i586) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: Rick Richardson , netdev@oss.sgi.com Subject: Re: External Loopback with crossed route entries References: <20010213131528.A1746@bravidacorp.com> <3A89C2F5.853775B8@uow.edu.au> <20010213175713.B1249@bravidacorp.com> <3A89D921.F5B18A96@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew Morton wrote: > > Rick Richardson wrote: > > > > > +----------------------+ > > | | > > | +------------------+| eth0 > > | | eth0 192.168.1.1 |----------+ > > | +------------------+| | > > | | | crossover cable > > | +------------------+| eth1 | > > | | eth1 192.168.2.1 |----------+ > > | +------------------+| > > | | > > | PC with two I/F's | > > +----------------------+ > > > > A "ping 192.168.2.1" should go out the top port (eth0), over the > > crossover cable, and back in to the lower port (eth1). > > > > Interesting. > > # There's probably a simpler way of doing this [snip] Why not use source-based routing, and use ping with the -I command. Here's an example from a product I've built: (vlan0009 will be your eth1 or something...) ip ru del from 192.168.9.3 lookup 7 # Setup for device: vlan0009 IP: 192.168.9.3 ip link set vlan0009 down ip link set vlan0009 up ip addr flush dev vlan0009 ip address add 192.168.9.3/24 broadcast 192.168.9.255 dev vlan0009 ip link set dev vlan0009 up ip link set vlan0009 txqueuelen 400 ip ru add from 192.168.9.3/32 table 7 ip route add 192.168.9.0/24 via 192.168.9.3 table 7 ip route add 0/0 via 192.168.9.1 dev vlan0009 table 7 -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Tue Feb 13 19:28:30 2001 Received: by oss.sgi.com id ; Tue, 13 Feb 2001 19:28:10 -0800 Received: from msp-65-25-214-194.mn.rr.com ([65.25.214.194]:62359 "EHLO msp-65-25-214-194.mn.rr.com") by oss.sgi.com with ESMTP id ; Tue, 13 Feb 2001 19:27:56 -0800 Received: (from rick@localhost) by msp-65-25-214-194.mn.rr.com (8.11.0/8.8.7) id f1E3Rnl13921; Tue, 13 Feb 2001 21:27:49 -0600 Date: Tue, 13 Feb 2001 21:27:49 -0600 From: Rick Richardson To: Andrew Morton Cc: netdev@oss.sgi.com Subject: Re: External Loopback with crossed route entries Message-ID: <20010213212749.C6488@bravidacorp.com> References: <20010213131528.A1746@bravidacorp.com> <3A89C2F5.853775B8@uow.edu.au> <20010213175713.B1249@bravidacorp.com> <3A89D921.F5B18A96@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A89D921.F5B18A96@uow.edu.au>; from andrewm@uow.edu.au on Wed, Feb 14, 2001 at 01:02:25AM +0000 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I didn't have any luck with your approach: $ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.11.2 0.0.0.0 255.255.255.255 UH 40 0 0 eth1 192.168.11.1 0.0.0.0 255.255.255.255 UH 40 0 0 eth2 192.168.200.0 0.0.0.0 255.255.255.0 U 40 0 0 vmnet1 172.16.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo 0.0.0.0 172.16.0.1 0.0.0.0 UG 40 0 0 eth0 $ ping 192.168.11.1 connect: Invalid argument There is a tiny patch for kernel 2.0.36 that made external loopback with crossed route entries work: http://www.uwsg.iu.edu/hypermail/linux/kernel/9903.3/0614.html Description: Fixes problem with external loopback, using crossed route entries. Version: 2.0.36 (& 2.0.37-pre9) Testing: Connected two NICs together, using an external cross-over cable, swapped route entries (to force packets out the opposite interfaces), and then issued pings, telnets, & ftps to both addreseses. Unplugged the cable to verify pings, etc. were appropriately halted. Request: Please CC bmoyle@redcreek.com with any comments/feedback. The patch was rejected with some argument and never made it into any kernels: http://www.uwsg.iu.edu/hypermail/linux/kernel/9903.3/0887.html http://www.uwsg.iu.edu/hypermail/linux/kernel/9903.3/0948.html Now, given the size of the patch for 2.0.36, one would think it would be an easy thing to update the patch for 2.4.0. Anybody know for sure? -Rick On Wed, Feb 14, 2001 at 01:02:25AM +0000, Andrew Morton wrote: > Rick Richardson wrote: > > > > > +----------------------+ > > | | > > | +------------------+| eth0 > > | | eth0 192.168.1.1 |----------+ > > | +------------------+| | > > | | | crossover cable > > | +------------------+| eth1 | > > | | eth1 192.168.2.1 |----------+ > > | +------------------+| > > | | > > | PC with two I/F's | > > +----------------------+ > > > > A "ping 192.168.2.1" should go out the top port (eth0), over the > > crossover cable, and back in to the lower port (eth1). > > > > Interesting. > > # There's probably a simpler way of doing this > # > # Open the links > ip link set eth0 up > ip link set eth1 up > > # Give them an IP > ip addr add 192.168.1.1 dev eth0 > ip addr add 192.168.1.2 dev eth1 > > # kill the local routes > ip route del 192.168.1.1 dev eth0 table local > ip route del 192.168.1.2 dev eth1 table local > > # Provide the crossed routes > ip route add 192.168.1.2 dev eth0 > ip route add 192.168.1.1 dev eth1 > > # OK, now packets cross over the loopback. Now what? > > > This will send packets across the loopback, but when > they arrive back, they of course loop-unto-TTL-expiry. > What do you want to do with the incoming packets? Ping, telnet, ftp, ttcp, etc. between the two ports. -- Rick Richardson rick@bravidacorp.com http://home.mn.rr.com/richardsons/ Twins Cities traffic animations are at http://members.nbci.com/tctraffic/#1 I've never used Linux, and probably never will. -- Me, 06/22/94 I'm using Linux. -- Me, 12/14/95 From owner-netdev@oss.sgi.com Wed Feb 14 02:52:02 2001 Received: by oss.sgi.com id ; Wed, 14 Feb 2001 02:51:53 -0800 Received: from [192.72.45.189] ([192.72.45.189]:48011 "EHLO stsl.siemens.com.tw") by oss.sgi.com with ESMTP id ; Wed, 14 Feb 2001 02:51:31 -0800 Received: from stslex.siemens.com.tw (stslex [192.72.45.13]) by stsl.siemens.com.tw (8.9.1/8.9.1) with ESMTP id TAA18445 for ; Wed, 14 Feb 2001 19:00:11 +0800 (CST) Received: by stslex.siemens.com.tw with Internet Mail Service (5.5.2448.0) id <1ZQAWC97>; Wed, 14 Feb 2001 18:51:32 +0800 Message-ID: <92C0C0AC8AE8D411864300105A835CBB50144A@stslex.siemens.com.tw> From: Moter Du To: netdev@oss.sgi.com Subject: ndisc_send_ns: always use link-local addr of the leaving device Date: Wed, 14 Feb 2001 18:51:30 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C09674.16F6FA12" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C09674.16F6FA12 Content-Type: text/plain; charset="Big5" Patch in ndisc.c, function ndisc_send_ns; Without the patch, the following scenario illustrates how it violates RFC2461 : Link0 +---------------------+ |eth0 |eth0 [Test Node] [Router Under Test] |eth1 |eth1 +---------------------+ Link1 - At [Test Node] interface eth0, ping global-unicast address of eth1 of [Router Under Test] - [Router Under Test] will perform NS for address resolution on Link0 Without the patch: NS contains global-unicast address of eth1 of [RUT] as Source Address (ERROR!!) With the patch: NS contains link-local address of eth0 of [RUT] as Source Address --- linux-2.4.0-ORIG/net/ipv6/ndisc.c Wed Nov 29 13:53:45 2000 +++ linux-2.4.0/net/ipv6/ndisc.c Mon Feb 12 13:32:48 2001 @@ -403,13 +407,18 @@ return; } - if (saddr == NULL) { +// MWI: 20010212: +// - Force Neighbour Solicitation always use link-local Source Address +// of the leaving device, because here 'saddr' may be belong to other devices +// thus using it violates RFC2461. +// +// if (saddr == NULL) { if (ipv6_get_lladdr(dev, &addr_buf)) { kfree_skb(skb); return; } saddr = &addr_buf; - } +// } if (ndisc_build_ll_hdr(skb, dev, daddr, neigh, len) == 0) { kfree_skb(skb); > Sincerely > Moter Du > > ============================================================ > Moter Du D410, Siemens STSL > phone +886 2 25186011 > Fax +886 2 25053866 > mailto:moter_du@stsl.siemens.com.tw ============================================================ ------_=_NextPart_001_01C09674.16F6FA12 Content-Type: text/html; charset="Big5" Content-Transfer-Encoding: quoted-printable ndisc_send_ns: always use link-local addr of the leaving = device

Patch in ndisc.c, function = ndisc_send_ns;
Without the patch, the = following scenario illustrates how it violates RFC2461 :

           &= nbsp;   Link0
      = +---------------------+
      = |eth0           &= nbsp;     |eth0
[Test = Node]        [Router Under = Test]
      = |eth1           &= nbsp;     |eth1
      = +---------------------+
           &= nbsp;   Link1

- At [Test Node] interface eth0, = ping global-unicast address of eth1 of [Router Under Test]
- [Router Under Test] will = perform NS for address resolution on Link0

Without the patch:
  NS contains = global-unicast address of eth1 of [RUT] as Source Address = (ERROR!!)
With the patch:
  NS contains link-local = address of eth0 of [RUT] as Source Address


--- = linux-2.4.0-ORIG/net/ipv6/ndisc.c  Wed Nov 29 13:53:45 2000
+++ = linux-2.4.0/net/ipv6/ndisc.c Mon Feb 12 13:32:48 2001
@@ -403,13 +407,18 @@
      = return;
   }
           &= nbsp;           &= nbsp;           &= nbsp;           &= nbsp;           &= nbsp;           &= nbsp;       
-  if (saddr =3D=3D NULL) = {
+// MWI: 20010212:
+// - Force Neighbour = Solicitation always use link-local Source Address
+//   of the leaving = device, because here 'saddr' may be belong to other devices
+//   thus using it = violates RFC2461.
+//
+//   if (saddr = =3D=3D NULL) {
      = if (ipv6_get_lladdr(dev, &addr_buf)) {
         = kfree_skb(skb);
         return;
      = }
      = saddr =3D &addr_buf;
-  }
+//   }
           &= nbsp;           &= nbsp;           &= nbsp;           &= nbsp;           &= nbsp;           &= nbsp;       
   if = (ndisc_build_ll_hdr(skb, dev, daddr, neigh, len) =3D=3D 0) {
      = kfree_skb(skb);




      Sincerely =
      Moter = Du

      =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
      Moter = Du           &nbs= p;           &nbs= p;          D410, Siemens = STSL
      phone   +886 2 25186011
      Fax     +886 2 25053866
      mailto:moter_du@stsl.siemen= s.com.tw
      =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

------_=_NextPart_001_01C09674.16F6FA12-- From owner-netdev@oss.sgi.com Wed Feb 14 08:35:35 2001 Received: by oss.sgi.com id ; Wed, 14 Feb 2001 08:35:14 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:25605 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 14 Feb 2001 08:35:08 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA32036; Wed, 14 Feb 2001 19:34:19 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102141634.TAA32036@ms2.inr.ac.ru> Subject: Re: ndisc_send_ns: always use link-local addr of the leaving device To: Moter_Du@stsl.siemens.COM.TW (Moter Du) Date: Wed, 14 Feb 2001 19:34:19 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <92C0C0AC8AE8D411864300105A835CBB50144A@stslex.siemens.com.tw> from "Moter Du" at Feb 14, 1 02:15:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 573 Lines: 22 Hello! > Without the patch, the following scenario illustrates how it violates > RFC2461 : Please, cite the place which is violated. (I'm sorry, lazy to search myself). > NS contains global-unicast address of eth1 of [RUT] as Source Address > (ERROR!!) I see no error. If it is global scope, it is on link by definition. But bug is really present: it could be link local for another link! > With the patch: > NS contains link-local address of eth0 of [RUT] as Source Address BTW this is surely error, killing all the idea behind solicitation avoidance. Alexey From owner-netdev@oss.sgi.com Wed Feb 14 12:52:46 2001 Received: by oss.sgi.com id ; Wed, 14 Feb 2001 12:52:27 -0800 Received: from router-100M.swansea.linux.org.uk ([194.168.151.17]:57617 "EHLO the-village.bc.nu") by oss.sgi.com with ESMTP id ; Wed, 14 Feb 2001 12:52:02 -0800 Received: from alan by the-village.bc.nu with local (Exim 2.12 #1) id 14T8tv-00060Y-00; Wed, 14 Feb 2001 20:51:47 +0000 Subject: Re: MTU and 2.4.x kernel To: roger@maths.grace.cri.nz Date: Wed, 14 Feb 2001 20:51:45 +0000 (GMT) Cc: linux-kernel@vger.kernel.org, roger@kea.grace.cri.nz In-Reply-To: <200102142039.PAA07913@whio.grace.cri.nz> from "roger@maths.grace.cri.nz" at Feb 14, 2001 03:39:09 PM X-Mailer: ELM [version 2.5 PL1] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1203 Lines: 30 > Kernel 2.4.x apparently disregards my ppp options MTU setting of 552 > and sets mss=536 (=> MTU=576). Kernel 2.2.16 sets mss=512 correctly. > Is this a kernel bug or what? The kernel is entitled to set an MSS that may cause fragmentation. So no it isnt a bug. 536 + 40 = 576 Im not sure why it made that choice but it is allowed to. (cc'd to netdev to see if they know) > Description: Typically Netscape/Lynx will connect to a remote site but > will not download (it will hang indefinitely). When the browser is in Typically indicates your ISP has path mtu problems. > the browser is locked for almost all remote sites, I _am_ able to > connect to (the web page of) the proxy server itself. And after I do > this the browser is *unlocked*, and I can connect/download from any web > address. However this only lasts for 5 minutes or so, after which time That would be a cached pmtu for that connection. I suspect the connections via the proxy server are not sending back valid ICMP fragmentation required frames for path mtu discovery. That would suggest the problem is the ISP. 2.2 happened to cover this up for the case of a single host directly connected to a modem with a low mtu. Alan From owner-netdev@oss.sgi.com Wed Feb 14 17:40:49 2001 Received: by oss.sgi.com id ; Wed, 14 Feb 2001 17:40:30 -0800 Received: from [192.72.45.189] ([192.72.45.189]:31378 "EHLO stsl.siemens.com.tw") by oss.sgi.com with ESMTP id ; Wed, 14 Feb 2001 17:40:05 -0800 Received: from stslex.siemens.com.tw (stslex [192.72.45.13]) by stsl.siemens.com.tw (8.9.1/8.9.1) with ESMTP id JAA24948; Thu, 15 Feb 2001 09:48:12 +0800 (CST) Received: by stslex.siemens.com.tw with Internet Mail Service (5.5.2448.0) id <1ZQAW188>; Thu, 15 Feb 2001 09:39:31 +0800 Message-ID: <92C0C0AC8AE8D411864300105A835CBB50144D@stslex.siemens.com.tw> From: Moter Du To: "'kuznet@ms2.inr.ac.ru'" Cc: netdev@oss.sgi.com Subject: =?Big5?B?pl7C0DogbmRpc2Nfc2VuZF9uczogYWx3YXlzIHVzZSBsaW5rLWxvY2Fs?= =?Big5?B?IGFkZHIgb2YgdGhlIGxlYXZpbmcgZGV2aWNl?= Date: Thu, 15 Feb 2001 09:39:25 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C096F0.2432CB80" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 9280 Lines: 257 This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C096F0.2432CB80 Content-Type: text/plain; charset="Big5" Content-Transfer-Encoding: quoted-printable 4.3. Neighbor Solicitation Message Format IP Fields: Source Address Either an address assigned to the interface from which this message is sent or (if Duplicate = Address Detection is in progress [ADDRCONF]) the unspecified address. 7.2.2. Sending Neighbor Solicitations (the 2nd para.) If the source address of the packet prompting the solicitation is = the same as one of the addresses assigned to the outgoing interface, = that address SHOULD be placed in the IP Source Address of the outgoing solicitation. Otherwise, any one of the addresses assigned to the interface should be used. Using the prompting packet's source address when possible insures that the recipient of the Neighbor Solicitation installs in its Neighbor Cache the IP address that is highly likely to be used in subsequent return traffic belonging to the prompting packet's "connection". Specifications are listed above for your reference. Note, in the illustration using [RUT] eth1 global-unicast address as Source Address = of NS always violates the specification, because it's none of the addresses assigned to the outgoing interface (i.e., eth0). Any scope of addresses of [RUT] eth0 may be used as Source Address of = NS. The patch uses link-local always and I see no improper at all. Could = you show me more hints for "solicitation avoidance"? > Sincerely=20 > Moter Du >=20 > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Moter Du D410, Siemens STSL > phone +886 2 25186011 > Fax +886 2 25053866 > mailto:moter_du@stsl.siemens.com.tw = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D -----=AD=EC=A9l=B6l=A5=F3----- =B1H=A5=F3=AA=CC: kuznet@ms2.inr.ac.ru [mailto:kuznet@ms2.inr.ac.ru] =B1H=A5=F3=A4=E9=B4=C1: 2001=A6~2=A4=EB15=A4=E9 AM 12:34 =A6=AC=A5=F3=AA=CC: Moter_Du@stsl.siemens.COM.TW =B0=C6=A5=BB: netdev@oss.sgi.com =A5D=A6=AE: Re: ndisc_send_ns: always use link-local addr of the = leaving device Hello! > Without the patch, the following scenario illustrates how it violates > RFC2461 : Please, cite the place which is violated. (I'm sorry, lazy to search myself). > NS contains global-unicast address of eth1 of [RUT] as Source = Address > (ERROR!!) I see no error. If it is global scope, it is on link by definition. But bug is really present: it could be link local for another link! > With the patch: > NS contains link-local address of eth0 of [RUT] as Source Address BTW this is surely error, killing all the idea behind solicitation avoidance. Alexey ------_=_NextPart_001_01C096F0.2432CB80 Content-Type: text/html; charset="Big5" Content-Transfer-Encoding: quoted-printable =A6^=C2=D0: ndisc_send_ns: always use link-local addr of the = leaving device

4.3.  Neighbor Solicitation Message = Format

   IP Fields:

      Source Address
          &nb= sp;          Either an = address assigned to the interface from
          &nb= sp;          which this = message is sent or (if Duplicate Address
          &nb= sp;          Detection is = in progress [ADDRCONF]) the
          &nb= sp;          unspecified = address.

7.2.2.  Sending Neighbor Solicitations
(the 2nd para.)
   If the source address of the packet = prompting the solicitation is the
   same as one of the addresses assigned = to the outgoing interface, that
   address SHOULD be placed in the IP = Source Address of the outgoing
   solicitation.  Otherwise, any one = of the addresses assigned to the
   interface should be used.  Using = the prompting packet's source
   address when possible insures that the = recipient of the Neighbor
   Solicitation installs in its Neighbor = Cache the IP address that is
   highly likely to be used in subsequent = return traffic belonging to
   the prompting packet's = "connection".

Specifications are listed above for your = reference.  Note, in the illustration using [RUT] eth1 = global-unicast address as Source Address of NS always violates the = specification, because it's none of the addresses assigned to the = outgoing interface (i.e., eth0).

Any scope of addresses of [RUT] eth0 may be used as = Source Address of NS.  The patch uses link-local always and I see = no improper at all.  Could you show me more hints for = "solicitation avoidance"?

>       Sincerely
>       Moter Du
>
>       = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>       Moter = Du           &nbs= p;           &nbs= p;          D410, Siemens = STSL
>       = phone   +886 2 25186011
>       = Fax     +886 2 25053866
>       mailto:moter_du@stsl.siemen= s.com.tw
        =         =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D


-----=AD=EC=A9l=B6l=A5=F3-----
=B1H=A5=F3=AA=CC: kuznet@ms2.inr.ac.ru [mailto:kuznet@ms2.inr.ac.ru]
=B1H=A5=F3=A4=E9=B4=C1: 2001=A6~2=A4=EB15=A4=E9 AM = 12:34
=A6=AC=A5=F3=AA=CC: = Moter_Du@stsl.siemens.COM.TW
=B0=C6=A5=BB: netdev@oss.sgi.com
=A5D=A6=AE: Re: ndisc_send_ns: always use link-local = addr of the leaving
device


Hello!

> Without the patch, the following scenario = illustrates how it violates
> RFC2461 :

Please, cite the place which is violated. (I'm sorry, = lazy to search myself).


>   NS contains global-unicast address = of eth1 of [RUT] as Source Address
> (ERROR!!)

I see no error. If it is global scope, it is on link = by definition.

But bug is really present: it could be link local for = another link!

> With the patch:
>   NS contains link-local address of = eth0 of [RUT] as Source Address

BTW this is surely error, killing all the idea behind = solicitation
avoidance.

Alexey

------_=_NextPart_001_01C096F0.2432CB80-- From owner-netdev@oss.sgi.com Wed Feb 14 21:32:30 2001 Received: by oss.sgi.com id ; Wed, 14 Feb 2001 21:32:10 -0800 Received: from tolkien.sys.i.kyoto-u.ac.jp ([130.54.156.161]:26571 "EHLO ais.sys.i.kyoto-u.ac.jp") by oss.sgi.com with ESMTP id ; Wed, 14 Feb 2001 21:31:44 -0800 Received: from zaidan (zaidan [130.54.156.182]) by ais.sys.i.kyoto-u.ac.jp (8.9.3/3.7W) with SMTP id OAA22369 for ; Thu, 15 Feb 2001 14:30:14 +0900 (JST) From: "Ishtiaq Ahmed" To: Subject: TCP Algorithm Date: Thu, 15 Feb 2001 14:35:02 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-reply-To: <20010215050920Z553705-486+1415@oss.sgi.com> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 353 Lines: 11 I am new to this mailing list. I would like to ask a question. Could any body please tell me what kind of TCP algorithm have been implemented in Linux-2.4.0? Is there any documentation available to read about the implementation. I know that its TCP is being supported by several options like SACK, FACK, DSACK, ECN, window_scaling etc. Ishtiaq Ahmed From owner-netdev@oss.sgi.com Thu Feb 15 05:19:18 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 05:19:08 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:32270 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Thu, 15 Feb 2001 05:18:51 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id OAA24090 for ; Thu, 15 Feb 2001 14:18:43 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id OAA25359; Thu, 15 Feb 2001 14:18:42 +0100 Date: Thu, 15 Feb 2001 14:18:42 +0100 Message-Id: <200102151318.OAA25359@lxplus015.cern.ch> X-Authentication-Warning: lxplus015.cern.ch: jes set sender to jes@linuxcare.com using -f From: Jes Sorensen To: netdev@oss.sgi.com Subject: 2.2.19pre9 TCP changes? Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 538 Lines: 15 Hi Anyone noticed changes with TCP connection timeouts in recent 2.2.x kernels? I upgraded to 2.2.19pre9 (from 2.2.16pre5) this week as I needed to enable wireless support. However I am not seeing a lot of cases where sessions in particular irc sessions time out constantly if i don't output data over the channel regularly. I just changed the tcp keepalive settings to 500 secs from 7200 to see if that helps. I dunno if it is related to the Wavelan driver in some way either. Anyway I just wanted to hear if it's a known issue? Jes From owner-netdev@oss.sgi.com Thu Feb 15 07:20:58 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 07:20:38 -0800 Received: from router-100M.swansea.linux.org.uk ([194.168.151.17]:58886 "EHLO the-village.bc.nu") by oss.sgi.com with ESMTP id ; Thu, 15 Feb 2001 07:20:14 -0800 Received: from alan by the-village.bc.nu with local (Exim 2.12 #1) id 14TQCo-0008OC-00 for netdev@oss.sgi.com; Thu, 15 Feb 2001 15:20:26 +0000 Received: from vger.kernel.org ([199.183.24.194]) by the-village.bc.nu with esmtp (Exim 2.12 #1) id 14TQ92-0008Nf-00 for alan@lxorguk.ukuu.org.uk; Thu, 15 Feb 2001 15:16:33 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 15 Feb 2001 10:15:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 15 Feb 2001 10:15:08 -0500 Received: from kamov.deltanet.ro ([193.226.175.59]:33546 "HELO kamov.deltanet.ro") by vger.kernel.org with SMTP id ; Thu, 15 Feb 2001 10:14:56 -0500 Received: from home.ppetru.net (home.ppetru.net [193.230.129.57]) by kamov.deltanet.ro (Postfix) with ESMTP id 4FBD6FBE2 for ; Thu, 15 Feb 2001 17:14:47 +0200 (EET) Received: by home.ppetru.net (Postfix, from userid 500) id 42BA322E6E; Thu, 15 Feb 2001 17:14:45 +0200 (EET) Date: Thu, 15 Feb 2001 17:14:45 +0200 From: Petru Paler To: linux-kernel@vger.kernel.org Subject: 2.4.1: TCP assertion failed Message-ID: <20010215171445.A2327@ppetru.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i X-Mailing-List: linux-kernel@vger.kernel.org Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 749 Lines: 20 Moderately-high (couple hundred thousand hits a day) loaded web server running 2.4.1 (no other patches). I got this twice in the syslog after 15 days uptime: KERNEL: assertion (tp->lost_out == 0) failed at tcp_input.c(1202):tcp_remove_reno_sacks (between lots of "TCP: peer xxxx shrinks window xxxx:xxx:xxxxxx. Bad, what else can I say?" which I understand are harmless) Let me know if anyone needs more info/tests/etc. -- Petru Paler, mailto:ppetru@ppetru.net http://www.ppetru.net - ICQ: 41817235 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Thu Feb 15 09:58:48 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 09:58:39 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:45330 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 09:58:15 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA19454; Thu, 15 Feb 2001 20:57:50 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102151757.UAA19454@ms2.inr.ac.ru> Subject: Re: 2.2.19pre9 TCP changes? To: jes@linuxcare.COM (Jes Sorensen) Date: Thu, 15 Feb 2001 20:57:50 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <200102151318.OAA25359@lxplus015.cern.ch> from "Jes Sorensen" at Feb 15, 1 04:45:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 147 Lines: 9 Hello! > Anyway I just wanted to hear if it's a known issue? No, this is some news. Please, try to make tcpdump of misbehaving session. Alexey From owner-netdev@oss.sgi.com Thu Feb 15 10:27:39 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 10:27:29 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:2579 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 10:27:15 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA19838; Thu, 15 Feb 2001 21:25:49 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102151825.VAA19838@ms2.inr.ac.ru> Subject: Re: 2.4.1: TCP assertion failed To: ppetru@ppetru.NET (Petru Paler) Date: Thu, 15 Feb 2001 21:25:49 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010215171445.A2327@ppetru.net> from "Petru Paler" at Feb 15, 1 06:45:00 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 345 Lines: 14 Hello! > KERNEL: assertion (tp->lost_out == 0) failed at tcp_input.c(1202):tcp_remove_reno_sacks This is harmless. > (between lots of "TCP: peer xxxx shrinks window xxxx:xxx:xxxxxx. Bad, what else > can I say?" which I understand are harmless) And this is not harmless. If you can identify client shrinking window, please, do this. Alexey From owner-netdev@oss.sgi.com Thu Feb 15 10:30:09 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 10:29:50 -0800 Received: from kamov.deltanet.ro ([193.226.175.59]:28427 "HELO kamov.deltanet.ro") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 10:29:37 -0800 Received: from home.ppetru.net (home.ppetru.net [193.230.129.57]) by kamov.deltanet.ro (Postfix) with ESMTP id EEC1EF7FA; Thu, 15 Feb 2001 20:29:20 +0200 (EET) Received: by home.ppetru.net (Postfix, from userid 500) id ABF5022E6E; Thu, 15 Feb 2001 20:29:08 +0200 (EET) Date: Thu, 15 Feb 2001 20:29:08 +0200 From: Petru Paler To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: 2.4.1: TCP assertion failed Message-ID: <20010215202908.A3210@ppetru.net> References: <20010215171445.A2327@ppetru.net> <200102151825.VAA19838@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i In-Reply-To: <200102151825.VAA19838@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Thu, Feb 15, 2001 at 09:25:49PM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 622 Lines: 19 On Thu, Feb 15, 2001 at 09:25:49PM +0300, kuznet@ms2.inr.ac.ru wrote: > > KERNEL: assertion (tp->lost_out == 0) failed at tcp_input.c(1202):tcp_remove_reno_sacks > > This is harmless. Ok. > > (between lots of "TCP: peer xxxx shrinks window xxxx:xxx:xxxxxx. Bad, what else > > can I say?" which I understand are harmless) > > And this is not harmless. If you can identify client shrinking window, > please, do this. Identify, what ? Do you want me to paste the actual numbers (I've got plenty of them in all my web and mail server logs). -- Petru Paler, mailto:ppetru@ppetru.net http://www.ppetru.net - ICQ: 41817235 From owner-netdev@oss.sgi.com Thu Feb 15 10:35:39 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 10:35:19 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:7443 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 10:35:03 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA19995; Thu, 15 Feb 2001 21:33:21 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102151833.VAA19995@ms2.inr.ac.ru> Subject: Re: 2.4.1: TCP assertion failed To: ppetru@ppetru.net (Petru Paler) Date: Thu, 15 Feb 2001 21:33:21 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010215202908.A3210@ppetru.net> from "Petru Paler" at Feb 15, 1 08:29:08 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 159 Lines: 10 Hello! > Identify, what ? What OS do they use. Until now curcumstances when window is shrunk are unknown. Absolutely. No known OSes make such shit. Alexey From owner-netdev@oss.sgi.com Thu Feb 15 10:38:19 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 10:38:09 -0800 Received: from kamov.deltanet.ro ([193.226.175.59]:29707 "HELO kamov.deltanet.ro") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 10:38:00 -0800 Received: from home.ppetru.net (home.ppetru.net [193.230.129.57]) by kamov.deltanet.ro (Postfix) with ESMTP id 0CD6FF7FA; Thu, 15 Feb 2001 20:37:38 +0200 (EET) Received: by home.ppetru.net (Postfix, from userid 500) id D501C22E6E; Thu, 15 Feb 2001 20:37:37 +0200 (EET) Date: Thu, 15 Feb 2001 20:37:37 +0200 From: Petru Paler To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: 2.4.1: TCP assertion failed Message-ID: <20010215203737.B3210@ppetru.net> References: <20010215202908.A3210@ppetru.net> <200102151833.VAA19995@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i In-Reply-To: <200102151833.VAA19995@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Thu, Feb 15, 2001 at 09:33:21PM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 498 Lines: 16 On Thu, Feb 15, 2001 at 09:33:21PM +0300, kuznet@ms2.inr.ac.ru wrote: > > Identify, what ? > > What OS do they use. I have no way to do that, these servers do mail and web service for a couple hundred thousand domains so clients are all over the world.. > Until now curcumstances when window is shrunk are unknown. Absolutely. > No known OSes make such shit. Looks like theory is different from practice again :) -- Petru Paler, mailto:ppetru@ppetru.net http://www.ppetru.net - ICQ: 41817235 From owner-netdev@oss.sgi.com Thu Feb 15 10:43:09 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 10:42:50 -0800 Received: from mail.iwr.uni-heidelberg.de ([129.206.104.30]:17151 "EHLO mail.iwr.uni-heidelberg.de") by oss.sgi.com with ESMTP id ; Thu, 15 Feb 2001 10:42:39 -0800 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:root@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by mail.iwr.uni-heidelberg.de (8.11.1/8.11.1) with ESMTP id f1FIgYT20127; Thu, 15 Feb 2001 19:42:34 +0100 (MET) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id TAA17012; Thu, 15 Feb 2001 19:42:34 +0100 Date: Thu, 15 Feb 2001 19:42:34 +0100 (CET) From: Bogdan Costescu To: Petru Paler cc: , Subject: Re: 2.4.1: TCP assertion failed In-Reply-To: <20010215203737.B3210@ppetru.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 645 Lines: 20 On Thu, 15 Feb 2001, Petru Paler wrote: > > What OS do they use. > > I have no way to do that, these servers do mail and web service for a > couple hundred thousand domains so clients are all over the world.. You said that you have IPs. You can use nmap (http://www.nmap.org) to identify the OS. You do need however direct link to the hosts, it doesn't work if they are dialed-up... Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Thu Feb 15 10:53:09 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 10:52:59 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:20755 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 10:52:40 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA20184; Thu, 15 Feb 2001 21:52:17 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102151852.VAA20184@ms2.inr.ac.ru> Subject: Re: 2.4.1: TCP assertion failed To: ppetru@ppetru.net (Petru Paler) Date: Thu, 15 Feb 2001 21:52:17 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010215203737.B3210@ppetru.net> from "Petru Paler" at Feb 15, 1 08:37:37 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 348 Lines: 12 Hello! > I have no way to do that, these servers do mail and web service for a > couple hundred thousand domains so clients are all over the world.. Usual story. I hope soon or early we will be luck with this and someone will reply: "Hey, indeed, I see one address, which is in one hop of us! Wait awhile, I am making some calls..." 8) Alexey From owner-netdev@oss.sgi.com Thu Feb 15 11:50:30 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 11:50:21 -0800 Received: from colin.muc.de ([193.149.48.1]:33288 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 11:49:55 -0800 Received: by colin.muc.de id <140661-3>; Thu, 15 Feb 2001 20:49:27 +0100 Message-ID: <20010215204924.34431@colin.muc.de> From: Andi Kleen To: kuznet@ms2.inr.ac.ru Cc: Petru Paler , netdev@oss.sgi.com Subject: Re: 2.4.1: TCP assertion failed References: <20010215203737.B3210@ppetru.net> <200102151852.VAA20184@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <200102151852.VAA20184@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Thu, Feb 15, 2001 at 07:53:44PM +0100 Date: Thu, 15 Feb 2001 20:49:24 +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 424 Lines: 12 On Thu, Feb 15, 2001 at 07:53:44PM +0100, kuznet@ms2.inr.ac.ru wrote: > I hope soon or early we will be luck with this and someone will > reply: "Hey, indeed, I see one address, which is in one hop of us! > Wait awhile, I am making some calls..." 8) My personal guess is that it's one of the TCP bandwidth limiter proxy boxes again. These are unfortunately rather hard to detect, and it's not the client's fault. -Andi From owner-netdev@oss.sgi.com Thu Feb 15 12:07:40 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 12:07:30 -0800 Received: from foobar.napster.com ([64.124.41.10]:19727 "EHLO foobar.napster.com") by oss.sgi.com with ESMTP id ; Thu, 15 Feb 2001 12:07:22 -0800 Received: from wagner.napster.com (mail.napster.com [63.108.185.112]) by foobar.napster.com (8.9.3/8.9.3) with ESMTP id MAA19636 for ; Thu, 15 Feb 2001 12:07:21 -0800 Received: from napster.com (gw.napster.com [63.108.185.120]) by wagner.napster.com (8.9.3/8.9.3) with ESMTP id MAA13438 for ; Thu, 15 Feb 2001 12:07:21 -0800 Message-ID: <3A8C36FA.27BE7618@napster.com> Date: Thu, 15 Feb 2001 12:07:22 -0800 From: Jordan Mendelson Organization: Napster, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.0 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: sending pkt_too_big to self Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 470 Lines: 19 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Just noticed a few of these messages on several of our web servers: sending pkt_too_big to self sending pkt_too_big to self sending pkt_too_big to self sending pkt_too_big to self They don't happen too frequently. VA Linux boxes running 2.4.0 proper. Also, in case someone is super bored there is a mispelling of "we're" in the comment above where this message is printed out... Jordan From owner-netdev@oss.sgi.com Thu Feb 15 12:19:10 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 12:19:00 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:59652 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 12:18:42 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA20992; Thu, 15 Feb 2001 23:18:18 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102152018.XAA20992@ms2.inr.ac.ru> Subject: Re: =?Big5?B?pl7C0DogbmRpc2Nfc2VuZF9uczogYWx3YXlzIHVzZSBsaW5rLWxvY2Fs?= To: Moter_Du@stsl.siemens.com.tw (Moter Du) Date: Thu, 15 Feb 2001 23:18:18 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <92C0C0AC8AE8D411864300105A835CBB50144D@stslex.siemens.com.tw> from "Moter Du" at Feb 15, 1 09:39:25 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 742 Lines: 22 Hello! > always violates the specification, because it's none of the addresses > assigned to the outgoing interface (i.e., eth0). All global scope addresses are "assigned" to all the interfaces in this sense. You observe this fact, when seeing that packet is sent from interface eth1 using address, which is address of eth0. > show me more hints for "solicitation avoidance"? If you sent solicitation to target, target will want to talk to you and will have to solicit you if you did not give information about your address. This does not happen in your case, because you enforced situation of pathological asymmetrical routing. However, soliciting host has no way to know that target is not configured to reply symmetrically. Alexey From owner-netdev@oss.sgi.com Thu Feb 15 12:25:49 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 12:25:30 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:64516 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 12:25:10 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA21049; Thu, 15 Feb 2001 23:24:43 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102152024.XAA21049@ms2.inr.ac.ru> Subject: Re: 2.4.1: TCP assertion failed To: ak@muc.de (Andi Kleen) Date: Thu, 15 Feb 2001 23:24:43 +0300 (MSK) Cc: ppetru@ppetru.net, netdev@oss.sgi.com In-Reply-To: <20010215204924.34431@colin.muc.de> from "Andi Kleen" at Feb 15, 1 08:49:24 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 311 Lines: 11 Hello! > My personal guess is that it's one of the TCP bandwidth limiter proxy > boxes again. These are unfortunately rather hard to detect, and it's not > the client's fault. Yes. If it is true, this can be ignored. Transient shrinks are harmless. If this is client's stack or some mobile agent... Alexey From owner-netdev@oss.sgi.com Thu Feb 15 14:18:10 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 14:17:51 -0800 Received: from colin.muc.de ([193.149.48.1]:56588 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Thu, 15 Feb 2001 14:17:28 -0800 Received: by colin.muc.de id <140650-3>; Thu, 15 Feb 2001 23:17:16 +0100 Message-ID: <20010215231715.26269@colin.muc.de> From: Andi Kleen To: Jordan Mendelson Cc: netdev@oss.sgi.com Subject: Re: sending pkt_too_big to self References: <3A8C36FA.27BE7618@napster.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <3A8C36FA.27BE7618@napster.com>; from Jordan Mendelson on Thu, Feb 15, 2001 at 09:08:16PM +0100 Date: Thu, 15 Feb 2001 23:17:15 +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 640 Lines: 21 On Thu, Feb 15, 2001 at 09:08:16PM +0100, Jordan Mendelson wrote: > Just noticed a few of these messages on several of our web servers: > > sending pkt_too_big to self > sending pkt_too_big to self > sending pkt_too_big to self > sending pkt_too_big to self > > They don't happen too frequently. VA Linux boxes running 2.4.0 proper. This is a relatively harmless race, which can happen when a pmtu event happens while after new packets got queued, but haven't been send out yet. It is more a debugging message. Comment it out if it bothers you. It should probably move into a #ifdef NETDEBUG -Andi P.S.: does 2.4 work for you now? From owner-netdev@oss.sgi.com Thu Feb 15 16:09:41 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 16:09:31 -0800 Received: from m205-3-p14.warwick.net ([208.242.205.119]:2308 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Thu, 15 Feb 2001 16:09:23 -0800 Received: from localhost (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with ESMTP id TAA01319; Thu, 15 Feb 2001 19:09:52 -0500 Date: Thu, 15 Feb 2001 19:09:52 -0500 (EST) From: Statux X-Sender: To: Jes Sorensen cc: Subject: Re: 2.2.19pre9 TCP changes? In-Reply-To: <200102151318.OAA25359@lxplus015.cern.ch> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 925 Lines: 29 IRC generally uses pings to check the connection... so unless the server has these disabled or spread very far apart, it doesn't really apply to this problem. TCP, etc, has an option to keep the connection alive... which I think you noted on a user level (as opposed to the protocol level)... On Thu, 15 Feb 2001, Jes Sorensen wrote: > Hi > > Anyone noticed changes with TCP connection timeouts in recent 2.2.x > kernels? I upgraded to 2.2.19pre9 (from 2.2.16pre5) this week as I > needed to enable wireless support. However I am not seeing a lot of > cases where sessions in particular irc sessions time out constantly if i > don't output data over the channel regularly. > > I just changed the tcp keepalive settings to 500 secs from 7200 to see > if that helps. I dunno if it is related to the Wavelan driver in some > way either. > > Anyway I just wanted to hear if it's a known issue? > > Jes > -- -Statux From owner-netdev@oss.sgi.com Thu Feb 15 21:29:16 2001 Received: by oss.sgi.com id ; Thu, 15 Feb 2001 21:28:57 -0800 Received: from [192.72.45.189] ([192.72.45.189]:41452 "EHLO stsl.siemens.com.tw") by oss.sgi.com with ESMTP id ; Thu, 15 Feb 2001 21:28:34 -0800 Received: from stslex.siemens.com.tw (stslex [192.72.45.13]) by stsl.siemens.com.tw (8.9.1/8.9.1) with ESMTP id NAA16501; Fri, 16 Feb 2001 13:38:06 +0800 (CST) Received: by stslex.siemens.com.tw with Internet Mail Service (5.5.2448.0) id <1ZQAWL40>; Fri, 16 Feb 2001 13:29:22 +0800 Message-ID: <92C0C0AC8AE8D411864300105A835CBB501453@stslex.siemens.com.tw> From: Moter Du To: "'kuznet@ms2.inr.ac.ru'" Cc: netdev@oss.sgi.com Subject: =?Big5?B?pl7C0Dogpl7C0DogbmRpc2Nfc2VuZF9uczogYWx3YXlzIHVzZSBsaW5r?= =?Big5?B?LWxvY2Fs?= Date: Fri, 16 Feb 2001 13:29:15 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C097D9.6AE95D70" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 14060 Lines: 288 This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C097D9.6AE95D70 Content-Type: text/plain; charset="Big5" Content-Transfer-Encoding: quoted-printable Hi Alexey: =20 Thanks for the fine description. Although solicitation avoidance = doesn't happen to my case I agree it always needs to be taken into = consideration. =20 However I do not agree "all global scoped addresses are assigned to all = the interfaces in this sense". For my case, I configured RUT in this way: =20 RUT> ifconfig eth0 add 3ffe:501:ff:100::1/64 RUT> ifconfig eth1 add 3ffe:501:ff:101::2/64 =20 It's clear that the former global address was exactly assigned to eth0, = and the latter one was assigned exactly to eth1. Period. =20 The illustration in my first mail was done this way:=20 =20 TN> ping6 3ffe:501:ff:101::2 -I eth0 =20 TN RUT eth0 eth1 eth0 eth1 | | ECHO_REQUEST | | |----------------------->| |=20 | | | fw | | | |.....>| | | | | | | | fw | | | |<.....|=20 | | NS | | |<-----------------------| | | | NA | |=20 |----------------------->| |=20 | | ECHO_REPLY | | |<-----------------------| | | | | | =20 It's clear that ECHO_REQUEST and ECHO_REPLY will be the follows: =20 ECHO_REQUEST:=20 - Source Address=3Dany of addresses of TN eth0 - Destination Address=3D3ffe:501:ff:101::2 (global address of RUT = eth1) =20 ECHO_REPLY: - Source Address=3D3ffe:501:ff:101::2 (global address of RUT eth1, = the same as ECHO_REQUEST Destination Address) - Destination Address=3Dany of addresses of TN eth0 (the same as ECHO_REQUEST Source Address) =20 =20 My point is that here NS will contain 3ffe:501:ff:101::2 as its "Source Address", but RFC 2461 specifies totally against to this. =20 The patch uses always fe80::1 (link-local address of RUT eth0) as NS's Source Address, and should not cause more side-effect I think. =20 > Sincerely > Moter Du > > = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Moter Du D410, Siemens STSL > phone +886 2 25186011 > Fax +886 2 25053866 > = mailto:moter_du@stsl.siemens. com.tw = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D -----=AD=EC=A9l=B6l=A5=F3----- =B1H=A5=F3=AA=CC: kuznet@ms2.inr.ac.ru [ mailto:kuznet@ms2.inr.ac.ru] =B1H=A5=F3=A4=E9=B4=C1: 2001=A6~2=A4=EB16=A4=E9 AM 04:18 =A6=AC=A5=F3=AA=CC: Moter_Du@stsl.siemens.com.tw =B0=C6=A5=BB: netdev@oss.sgi.com =A5D=A6=AE: Re: =A6^=C2=D0: ndisc_send_ns: always use link-local Hello! > always violates the specification, because it's none of the addresses > assigned to the outgoing interface (i.e., eth0). All global scope addresses are "assigned" to all the interfaces in this sense. You observe this fact, when seeing that packet is sent from interface eth1 using address, which is address of eth0. > show me more hints for "solicitation avoidance"? If you sent solicitation to target, target will want to talk to you and will have to solicit you if you did not give information about your address. This does not happen in your case, because you enforced situation of pathological asymmetrical routing. However, soliciting host has no way to know that target is not configured to reply symmetrically. Alexey=20 ------_=_NextPart_001_01C097D9.6AE95D70 Content-Type: text/html; charset="Big5"
Hi Alexey:
 
Thanks for the fine description.  Although solicitation avoidance doesn't happen to my case I agree it always needs to be taken into consideration.
 
However I do not agree "all global scoped addresses are assigned to all the interfaces in this sense".  For my case, I configured RUT in this way:
 
RUT> ifconfig eth0 add 3ffe:501:ff:100::1/64
RUT> ifconfig eth1 add 3ffe:501:ff:101::2/64
 
It's clear that the former global address was exactly assigned to eth0, and the latter one was assigned exactly to eth1.  Period.
 
The illustration in my first mail was done this way:
 
TN> ping6 3ffe:501:ff:101::2 -I eth0
 
    TN                     RUT
eth0  eth1             eth0   eth1
|     | ECHO_REQUEST     |      |
|----------------------->|      |
|     |                  |  fw  |
|     |                  |.....>|
|     |                  |      |
|     |                  |  fw  |
|     |                  |<.....|
|     | NS               |      |
|<-----------------------|      |
|     | NA               |      |
|----------------------->|      |
|     | ECHO_REPLY       |      |
|<-----------------------|      |
|     |                  |      |
 
It's clear that ECHO_REQUEST and ECHO_REPLY will be the follows:
 
ECHO_REQUEST:
  - Source Address=any of addresses of TN eth0
  - Destination Address=3ffe:501:ff:101::2 (global address of RUT eth1)
 
ECHO_REPLY:
  - Source Address=3ffe:501:ff:101::2 (global address of RUT eth1, the same as ECHO_REQUEST Destination Address)
  - Destination Address=any of addresses of TN eth0 (the same as ECHO_REQUEST Source Address)
 
 
My point is that here NS will contain 3ffe:501:ff:101::2 as its "Source Address", but RFC 2461 specifies totally against to this.
 
The patch uses always fe80::1 (link-local address of RUT eth0) as NS's Source Address, and should not cause more side-effect I think.
 

>       Sincerely
>       Moter Du
>
>       ============================================================
>       Moter Du                                  D410, Siemens STSL
>       phone   +886 2 25186011
>       Fax     +886 2 25053866
>      
mailto:moter_du@stsl.siemens.com.tw
                ============================================================


-----­ì©l¶l¥ó-----
±H¥óªÌ: kuznet@ms2.inr.ac.ru [
mailto:kuznet@ms2.inr.ac.ru]
±H¥ó¤é´Á: 2001¦~2¤ë16¤é AM 04:18
¦¬¥óªÌ: Moter_Du@stsl.siemens.com.tw
°Æ¥»: netdev@oss.sgi.com
¥D¦®: Re: ¦^ÂÐ: ndisc_send_ns: always use link-local


Hello!

> always violates the specification, because it's none of the addresses
> assigned to the outgoing interface (i.e., eth0).

All global scope addresses are "assigned" to all the interfaces in this sense.

You observe this fact, when seeing that packet is sent from
interface eth1 using address, which is address of eth0.


> show me more hints for "solicitation avoidance"?

If you sent solicitation to target, target will want to talk to you
and will have to solicit you if you did not give information about
your address.

This does not happen in your case, because you enforced situation
of pathological asymmetrical routing. However, soliciting host has
no way to know that target is not configured to reply symmetrically.

Alexey

------_=_NextPart_001_01C097D9.6AE95D70-- From owner-netdev@oss.sgi.com Fri Feb 16 07:50:30 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 07:50:21 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:59595 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 16 Feb 2001 07:50:05 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id KAA22833; Fri, 16 Feb 2001 10:48:18 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Fri, 16 Feb 2001 10:48:18 -0500 (EST) From: jamal To: cc: Andi Kleen , , Subject: Re: 2.4.1: TCP assertion failed In-Reply-To: <200102152024.XAA21049@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 434 Lines: 21 On Thu, 15 Feb 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > My personal guess is that it's one of the TCP bandwidth limiter proxy > > boxes again. These are unfortunately rather hard to detect, and it's not > > the client's fault. > > Yes. If it is true, this can be ignored. Transient shrinks are harmless. > > If this is client's stack or some mobile agent... > I smell a packeteer somewhere along the path. cheers, jamal From owner-netdev@oss.sgi.com Fri Feb 16 11:01:11 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 11:00:51 -0800 Received: from kamov.deltanet.ro ([193.226.175.59]:40978 "HELO kamov.deltanet.ro") by oss.sgi.com with SMTP id ; Fri, 16 Feb 2001 11:00:38 -0800 Received: from home.ppetru.net (home.ppetru.net [193.230.129.57]) by kamov.deltanet.ro (Postfix) with ESMTP id D6823FBF0 for ; Fri, 16 Feb 2001 21:00:28 +0200 (EET) Received: by home.ppetru.net (Postfix, from userid 500) id 35B1222FEC; Fri, 16 Feb 2001 21:00:24 +0200 (EET) Date: Fri, 16 Feb 2001 21:00:24 +0200 From: Petru Paler To: netdev@oss.sgi.com Subject: syncookie monsters ? Message-ID: <20010216210024.B1900@ppetru.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 221 Lines: 8 http://cr.yp.to/syncookies.html (at the bottom of the page). So, what is the actual truth ? Are syncookies good to use for loaded servers ? -- Petru Paler, mailto:ppetru@ppetru.net http://www.ppetru.net - ICQ: 41817235 From owner-netdev@oss.sgi.com Fri Feb 16 11:38:10 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 11:38:01 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:9732 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 16 Feb 2001 11:37:42 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA07225; Fri, 16 Feb 2001 22:36:52 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102161936.WAA07225@ms2.inr.ac.ru> Subject: Re: syncookie monsters ? To: ppetru@ppetru.NET (Petru Paler) Date: Fri, 16 Feb 2001 22:36:52 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010216210024.B1900@ppetru.net> from "Petru Paler" at Feb 16, 1 10:15:00 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 373 Lines: 13 Hello! > So, what is the actual truth ? Are syncookies good to use for > loaded servers ? Provided server is configured so that sending cookies is never triggered and considering each event of triggering syncookies as sign of misconfiguration rather than an attack. So, they are good when used with care (despite of their inventor is either cretin or plain cad) Alexey From owner-netdev@oss.sgi.com Fri Feb 16 11:40:30 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 11:40:10 -0800 Received: from kamov.deltanet.ro ([193.226.175.59]:50194 "HELO kamov.deltanet.ro") by oss.sgi.com with SMTP id ; Fri, 16 Feb 2001 11:39:58 -0800 Received: from home.ppetru.net (home.ppetru.net [193.230.129.57]) by kamov.deltanet.ro (Postfix) with ESMTP id 9245DFBEC; Fri, 16 Feb 2001 21:39:52 +0200 (EET) Received: by home.ppetru.net (Postfix, from userid 500) id 0EC4C23050; Fri, 16 Feb 2001 21:39:48 +0200 (EET) Date: Fri, 16 Feb 2001 21:39:48 +0200 From: Petru Paler To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: syncookie monsters ? Message-ID: <20010216213948.F1900@ppetru.net> References: <20010216210024.B1900@ppetru.net> <200102161936.WAA07225@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i In-Reply-To: <200102161936.WAA07225@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Fri, Feb 16, 2001 at 10:36:52PM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 476 Lines: 13 On Fri, Feb 16, 2001 at 10:36:52PM +0300, kuznet@ms2.inr.ac.ru wrote: > > So, what is the actual truth ? Are syncookies good to use for > > loaded servers ? > > Provided server is configured so that sending cookies is never > triggered and considering each event of triggering syncookies > as sign of misconfiguration rather than an attack. So they have no use in defeating SYN flood attacks ? -- Petru Paler, mailto:ppetru@ppetru.net http://www.ppetru.net - ICQ: 41817235 From owner-netdev@oss.sgi.com Fri Feb 16 12:16:31 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 12:16:21 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:9222 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 16 Feb 2001 12:15:59 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA07506; Fri, 16 Feb 2001 23:15:45 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102162015.XAA07506@ms2.inr.ac.ru> Subject: Re: syncookie monsters ? To: ppetru@ppetru.net (Petru Paler) Date: Fri, 16 Feb 2001 23:15:45 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010216213948.F1900@ppetru.net> from "Petru Paler" at Feb 16, 1 09:39:48 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 154 Lines: 8 Hello! > So they have no use in defeating SYN flood attacks ? Sorry? They are useful for this, only for this. Otherwise, why to enable them? 8) Alexey From owner-netdev@oss.sgi.com Fri Feb 16 12:18:30 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 12:18:20 -0800 Received: from kamov.deltanet.ro ([193.226.175.59]:13331 "HELO kamov.deltanet.ro") by oss.sgi.com with SMTP id ; Fri, 16 Feb 2001 12:18:10 -0800 Received: from home.ppetru.net (home.ppetru.net [193.230.129.57]) by kamov.deltanet.ro (Postfix) with ESMTP id 1A4E3FBEC; Fri, 16 Feb 2001 22:18:03 +0200 (EET) Received: by home.ppetru.net (Postfix, from userid 500) id 0A2A4232E1; Fri, 16 Feb 2001 22:18:02 +0200 (EET) Date: Fri, 16 Feb 2001 22:18:02 +0200 From: Petru Paler To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: syncookie monsters ? Message-ID: <20010216221802.A4251@ppetru.net> References: <20010216213948.F1900@ppetru.net> <200102162015.XAA07506@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i In-Reply-To: <200102162015.XAA07506@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Fri, Feb 16, 2001 at 11:15:45PM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 399 Lines: 12 On Fri, Feb 16, 2001 at 11:15:45PM +0300, kuznet@ms2.inr.ac.ru wrote: > > So they have no use in defeating SYN flood attacks ? > > Sorry? They are useful for this, only for this. Otherwise, > why to enable them? 8) That's what I thought too; I didn't get your affirmation regarding signs of misconfiguration, though. -- Petru Paler, mailto:ppetru@ppetru.net http://www.ppetru.net - ICQ: 41817235 From owner-netdev@oss.sgi.com Fri Feb 16 12:28:31 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 12:28:21 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:37638 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 16 Feb 2001 12:28:06 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA07604; Fri, 16 Feb 2001 23:27:52 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102162027.XAA07604@ms2.inr.ac.ru> Subject: Re: syncookie monsters ? To: ppetru@ppetru.net (Petru Paler) Date: Fri, 16 Feb 2001 23:27:52 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010216221802.A4251@ppetru.net> from "Petru Paler" at Feb 16, 1 10:18:02 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 285 Lines: 12 Hello! > That's what I thought too; I didn't get your affirmation regarding > signs of misconfiguration, though. 8) 100%-10^-6 sign of misconfiguration is that syncookies are started, which is printed on console. _Real_ SYN floods are not more frequent than World Wars. 8) Alexey From owner-netdev@oss.sgi.com Fri Feb 16 12:31:50 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 12:31:31 -0800 Received: from kamov.deltanet.ro ([193.226.175.59]:2052 "HELO kamov.deltanet.ro") by oss.sgi.com with SMTP id ; Fri, 16 Feb 2001 12:31:16 -0800 Received: from home.ppetru.net (home.ppetru.net [193.230.129.57]) by kamov.deltanet.ro (Postfix) with ESMTP id 73E98FBE4; Fri, 16 Feb 2001 22:31:10 +0200 (EET) Received: by home.ppetru.net (Postfix, from userid 500) id 5465A232E1; Fri, 16 Feb 2001 22:31:10 +0200 (EET) Date: Fri, 16 Feb 2001 22:31:10 +0200 From: Petru Paler To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: syncookie monsters ? Message-ID: <20010216223110.B4251@ppetru.net> References: <20010216221802.A4251@ppetru.net> <200102162027.XAA07604@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i In-Reply-To: <200102162027.XAA07604@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Fri, Feb 16, 2001 at 11:27:52PM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 536 Lines: 17 On Fri, Feb 16, 2001 at 11:27:52PM +0300, kuznet@ms2.inr.ac.ru wrote: > > That's what I thought too; I didn't get your affirmation regarding > > signs of misconfiguration, though. > > 8) 100%-10^-6 sign of misconfiguration is that syncookies are started, > which is printed on console. Ok, I get it now. > _Real_ SYN floods are not more frequent than World Wars. 8) Well I get flooded a couple of times every day (working for a romanian ISP, though :) -- Petru Paler, mailto:ppetru@ppetru.net http://www.ppetru.net - ICQ: 41817235 From owner-netdev@oss.sgi.com Fri Feb 16 16:15:00 2001 Received: by oss.sgi.com id ; Fri, 16 Feb 2001 16:14:40 -0800 Received: from router-100M.swansea.linux.org.uk ([194.168.151.17]:62724 "EHLO the-village.bc.nu") by oss.sgi.com with ESMTP id ; Fri, 16 Feb 2001 16:14:23 -0800 Received: from alan by the-village.bc.nu with local (Exim 2.12 #1) id 14Tv1Y-0004jV-00 for netdev@oss.sgi.com; Sat, 17 Feb 2001 00:14:52 +0000 Received: from vger.kernel.org ([199.183.24.194]) by the-village.bc.nu with esmtp (Exim 2.12 #1) id 14Tuwf-0004ie-00 for alan@lxorguk.ukuu.org.uk; Sat, 17 Feb 2001 00:09:51 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 16 Feb 2001 19:08:40 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 16 Feb 2001 19:08:29 -0500 Received: from 041imtd176.chartermi.net ([24.247.41.176]:50842 "EHLO oof.netnation.com") by vger.kernel.org with ESMTP id ; Fri, 16 Feb 2001 19:08:13 -0500 Received: from sim by oof.netnation.com with local (Exim 3.22 #1 (Debian)) id 14Tuv0-0003oX-00 for ; Fri, 16 Feb 2001 19:08:06 -0500 Date: Fri, 16 Feb 2001 19:08:05 -0500 From: Simon Kirby To: linux-kernel@vger.kernel.org Subject: 2.4 TCP(?) timeouts Message-ID: <20010216190805.A14603@stormix.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.12i X-Mailing-List: linux-kernel@vger.kernel.org Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1513 Lines: 33 Hello, Today we put 2.4.1 on our mail server after having see it perform well on some other boxes. It seems now we are receiving a few calls every hour from customers reporting that the server tends to hang and eventually time out on them when downloading mail. All customers that have reported this problem so far are on a didalup connection. Apparently the server will stop transmitting data (or the client seems to think so), and then their mail client will time out. I noticed that the 2.4.1 on my desktop seems to time out SSH connections to servers that have become unreachable in about 10 seconds or so, which is many times faster than 2.2 which used to sit for hours before it timed out (if it all). I'm not sure if this is related. I would expect the client to attempt to retransmit some ACKs and eventually get some RSTs back if this were the case. Has anybody seen similar problems? The box was previously running 2.2.19pre8 and no customers reported such problems. We're using cucipop w/ldap on a dual PIII 800 MHz box with 1.5 GB of RAM. Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ sim@stormix.com ][ sim@netnation.com ] [ Opinions expressed are not necessarily those of my employers. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Sat Feb 17 10:42:57 2001 Received: by oss.sgi.com id ; Sat, 17 Feb 2001 10:42:38 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:1548 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 17 Feb 2001 10:42:26 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA27901; Sat, 17 Feb 2001 21:41:58 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102171841.VAA27901@ms2.inr.ac.ru> Subject: Re: Alexey Kuznetsov's FTP site down? To: james@UnLambda.COM (James A. Crippen) Date: Sat, 17 Feb 2001 21:41:58 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "James A. Crippen" at Feb 16, 1 11:15:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 126 Lines: 8 Hello! > Any idea what's up here? ftp sevice was disabled for several days. Sorry, hard disk groaned too loudly. 8) Alexey From owner-netdev@oss.sgi.com Sun Feb 18 00:21:00 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 00:20:40 -0800 Received: from f00f.stub.clear.net.nz ([203.167.224.51]:39431 "HELO metastasis.f00f.org") by oss.sgi.com with SMTP id ; Sun, 18 Feb 2001 00:20:27 -0800 Received: by metastasis.f00f.org (Postfix, from userid 1000) id 3720FA59F; Sun, 18 Feb 2001 21:20:24 +1300 (NZDT) Date: Sun, 18 Feb 2001 21:20:24 +1300 From: Chris Wedgwood To: Andi Kleen Cc: kuznet@ms2.inr.ac.ru, Petru Paler , netdev@oss.sgi.com Subject: Re: 2.4.1: TCP assertion failed Message-ID: <20010218212024.B28243@metastasis.f00f.org> References: <20010215203737.B3210@ppetru.net> <200102151852.VAA20184@ms2.inr.ac.ru> <20010215204924.34431@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010215204924.34431@colin.muc.de>; from ak@muc.de on Thu, Feb 15, 2001 at 08:49:24PM +0100 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 460 Lines: 15 On Thu, Feb 15, 2001 at 08:49:24PM +0100, Andi Kleen wrote: My personal guess is that it's one of the TCP bandwidth limiter proxy boxes again. These are unfortunately rather hard to detect, and it's not the client's fault. as far as i know, Packeteer are the _only_ company that make such beasts... i have several sitting around gathering dust if someone wants me to set one up with a host and give them access to it, then let me know --cw From owner-netdev@oss.sgi.com Sun Feb 18 09:31:12 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 09:31:03 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:50193 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 18 Feb 2001 09:30:48 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA24636; Sun, 18 Feb 2001 20:29:48 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102181729.UAA24636@ms2.inr.ac.ru> Subject: Re: 2.4.1: TCP assertion failed To: cw@f00f.org (Chris Wedgwood) Date: Sun, 18 Feb 2001 20:29:48 +0300 (MSK) Cc: ak@muc.de, ppetru@ppetru.net, netdev@oss.sgi.com In-Reply-To: <20010218212024.B28243@metastasis.f00f.org> from "Chris Wedgwood" at Feb 18, 1 09:20:24 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 397 Lines: 13 Hello! > as far as i know, Packeteer are the _only_ company that make such > beasts... i have several sitting around gathering dust > > if someone wants me to set one up with a host and give them access to > it, then let me know Yes! Actually it is enough that you made some ftp download from dust.inr.ac.ru from host sitting behind this device and tell me time to find message easier. Alexey From owner-netdev@oss.sgi.com Sun Feb 18 09:52:02 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 09:51:43 -0800 Received: from web119.mail.yahoo.com ([205.180.60.120]:3347 "HELO web119.yahoomail.com") by oss.sgi.com with SMTP id ; Sun, 18 Feb 2001 09:51:28 -0800 Received: (qmail 5009 invoked by uid 60001); 18 Feb 2001 17:51:28 -0000 Message-ID: <20010218175128.5008.qmail@web119.yahoomail.com> Received: from [156.153.255.250] by web119.yahoomail.com; Sun, 18 Feb 2001 09:51:28 PST Date: Sun, 18 Feb 2001 09:51:28 -0800 (PST) From: Cacophonix Subject: Re: 2.4.1: TCP assertion failed To: kuznet@ms2.inr.ac.ru, Chris Wedgwood Cc: netdev@oss.sgi.com In-Reply-To: <200102181729.UAA24636@ms2.inr.ac.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 934 Lines: 26 If possible you may want to test in the presence of asymmetric routing - that often causes problems with devices like these that rewrite tcp window advertisements in the middle of the network - i.e, traffic in the forward direction flows through the device, but in the reverse direction does not. --karthik --- kuznet@ms2.inr.ac.ru wrote: > Hello! > > > as far as i know, Packeteer are the _only_ company that make such > > beasts... i have several sitting around gathering dust > > > > if someone wants me to set one up with a host and give them access to > > it, then let me know > > Yes! Actually it is enough that you made some ftp download from > dust.inr.ac.ru from host sitting behind this device and tell me > time to find message easier. > > Alexey __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ From owner-netdev@oss.sgi.com Sun Feb 18 09:57:31 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 09:57:11 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:37836 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Sun, 18 Feb 2001 09:56:59 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id MAA26441; Sun, 18 Feb 2001 12:55:55 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sun, 18 Feb 2001 12:55:55 -0500 (EST) From: jamal To: Cacophonix cc: , Chris Wedgwood , Subject: Re: 2.4.1: TCP assertion failed In-Reply-To: <20010218175128.5008.qmail@web119.yahoomail.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 665 Lines: 18 On Sun, 18 Feb 2001, Cacophonix wrote: > If possible you may want to test in the presence of asymmetric routing - that > often causes problems with devices like these that rewrite tcp window advertisements > in the middle of the network - i.e, traffic in the forward direction flows through > the device, but in the reverse direction does not. Actually, route asymmetry may not be much of a factor here. Most of these devices are at the client's edge as opposed to the server end i.e they protect the client's network portion of the resources. So you will come and go via them. (in any case the above test would be sufficient if done one way). cheers, jamal From owner-netdev@oss.sgi.com Sun Feb 18 10:21:03 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 10:20:53 -0800 Received: from web119.mail.yahoo.com ([205.180.60.120]:53774 "HELO web119.yahoomail.com") by oss.sgi.com with SMTP id ; Sun, 18 Feb 2001 10:20:24 -0800 Received: (qmail 6278 invoked by uid 60001); 18 Feb 2001 18:20:24 -0000 Message-ID: <20010218182024.6277.qmail@web119.yahoomail.com> Received: from [156.153.255.250] by web119.yahoomail.com; Sun, 18 Feb 2001 10:20:24 PST Date: Sun, 18 Feb 2001 10:20:24 -0800 (PST) From: Cacophonix Subject: Re: 2.4.1: TCP assertion failed To: jamal Cc: kuznet@ms2.inr.ac.ru, Chris Wedgwood , netdev@oss.sgi.com In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1226 Lines: 34 Depends. I've seen networks where the device was on the ethernet prior to the routers attaching to the WAN - in larger networks there tend to be multiple such ethernet "hub LANs" and multiple WAN links for redundancy. (BTW, I'm referring to the case where the device is on the WAN edge of the client site, but could be in the server site as well) The workaround to get the packeteer in the path in both directions was to play around with ospf costs on the LAN interfaces of the router. Of course, as you mention, a device that does naughty things could have problems in the simple cases as well - just pointing out cases where I've seen failure in the past... --karthik --- jamal wrote: > > > Actually, route asymmetry may not be much of a factor here. > Most of these devices are at the client's edge as opposed to the server > end i.e they protect the client's network portion of the resources. So you > will come and go via them. > (in any case the above test would be sufficient if done one way). > > cheers, > jamal > __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail - only $35 a year! http://personal.mail.yahoo.com/ From owner-netdev@oss.sgi.com Sun Feb 18 10:34:02 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 10:33:53 -0800 Received: from f00f.stub.clear.net.nz ([203.167.224.51]:42759 "HELO metastasis.f00f.org") by oss.sgi.com with SMTP id ; Sun, 18 Feb 2001 10:33:31 -0800 Received: by metastasis.f00f.org (Postfix, from userid 1000) id 1139CA59F; Mon, 19 Feb 2001 07:33:26 +1300 (NZDT) Date: Mon, 19 Feb 2001 07:33:26 +1300 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: ak@muc.de, ppetru@ppetru.net, netdev@oss.sgi.com Subject: Re: 2.4.1: TCP assertion failed Message-ID: <20010219073325.A29113@metastasis.f00f.org> References: <20010218212024.B28243@metastasis.f00f.org> <200102181729.UAA24636@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200102181729.UAA24636@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Sun, Feb 18, 2001 at 08:29:48PM +0300 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 319 Lines: 10 On Sun, Feb 18, 2001 at 08:29:48PM +0300, kuznet@ms2.inr.ac.ru wrote: Yes! Actually it is enough that you made some ftp download from dust.inr.ac.ru from host sitting behind this device and tell me time to find message easier. I will go dig one up later today and let you know the details then. --cw From owner-netdev@oss.sgi.com Sun Feb 18 10:36:02 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 10:35:42 -0800 Received: from f00f.stub.clear.net.nz ([203.167.224.51]:43527 "HELO metastasis.f00f.org") by oss.sgi.com with SMTP id ; Sun, 18 Feb 2001 10:35:34 -0800 Received: by metastasis.f00f.org (Postfix, from userid 1000) id F1E03A5C0; Mon, 19 Feb 2001 07:35:31 +1300 (NZDT) Date: Mon, 19 Feb 2001 07:35:31 +1300 From: Chris Wedgwood To: Cacophonix Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: 2.4.1: TCP assertion failed Message-ID: <20010219073531.B29113@metastasis.f00f.org> References: <200102181729.UAA24636@ms2.inr.ac.ru> <20010218175128.5008.qmail@web119.yahoomail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010218175128.5008.qmail@web119.yahoomail.com>; from cacophonix@yahoo.com on Sun, Feb 18, 2001 at 09:51:28AM -0800 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 683 Lines: 16 On Sun, Feb 18, 2001 at 09:51:28AM -0800, Cacophonix wrote: If possible you may want to test in the presence of asymmetric routing - that often causes problems with devices like these that rewrite tcp window advertisements in the middle of the network - i.e, traffic in the forward direction flows through the device, but in the reverse direction does not. The packeteers detect asymmetric flows and revert to queuing; the only rate-shape when they can see the whole conversation. Because I work for a carrier and we have a non-negligible amount of asymmetric routing (it's normal I say, not argue) when we had packeteers they would get really upset. --cw From owner-netdev@oss.sgi.com Sun Feb 18 10:41:02 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 10:40:43 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:39372 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Sun, 18 Feb 2001 10:40:31 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id NAA26489; Sun, 18 Feb 2001 13:35:39 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sun, 18 Feb 2001 13:35:39 -0500 (EST) From: jamal To: Cacophonix cc: , Chris Wedgwood , Subject: Re: 2.4.1: TCP assertion failed In-Reply-To: <20010218182024.6277.qmail@web119.yahoomail.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1142 Lines: 31 Policy management sounds like hell in the scenario you describe (if you cant enforce single point of entry/exit into the network for a flow) Mostly because you have stateful policies that dont get propagated between the devices that provide entry into the network. Note that this will also be difficult to do for ordinary QoS but packeteer mucking with headers makes it a _lot_ more difficult. cheers, jamal On Sun, 18 Feb 2001, Cacophonix wrote: > Depends. I've seen networks where the device was on the ethernet prior to > the routers attaching to the WAN - in larger networks there tend to be multiple > such ethernet "hub LANs" and multiple WAN links for redundancy. (BTW, I'm > referring to the case where the device is on the WAN edge of the client site, > but could be in the server site as well) > > The workaround to get the packeteer in the path in both directions was to play > around with ospf costs on the LAN interfaces of the router. > > Of course, as you mention, a device that does naughty things could have problems > in the simple cases as well - just pointing out cases where I've seen failure in > the past... > From owner-netdev@oss.sgi.com Sun Feb 18 16:22:36 2001 Received: by oss.sgi.com id ; Sun, 18 Feb 2001 16:22:17 -0800 Received: from pizda.ninka.net ([216.101.162.242]:23424 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Sun, 18 Feb 2001 16:21:54 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA01385; Sun, 18 Feb 2001 16:20:00 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14992.26288.233276.641078@pizda.ninka.net> Date: Sun, 18 Feb 2001 16:20:00 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] Zerocopy BETA 1, against 2.4.2-pre4 X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 896 Lines: 25 I'm calling this "BETA 1" because I currently feel that all performance and other issues have been addressed and that the patch is up for serious consideration for inclusion into a future 2.4.x release: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2p4-1.diff.gz Besides merging to 2.4.2-pre4 the main change in this release is a totally revamped paged-SKB sendmsg implementation by Alexey. I truly believe now that bandwidth/latency is back to where we were before the zerocopy patches, and preliminary testing done by Andrew Morton supports this. (actually, in my own testing, latency over loopback seems to have improved) Some verbose TCP debugging is enabled in this release, most of the messages are harmless %99 of the time. If these messages bother you just set "FASTRETRANS_DEBUG" back to "1" in include/net/tcp.h Thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 19 03:34:51 2001 Received: by oss.sgi.com id ; Mon, 19 Feb 2001 03:34:41 -0800 Received: from internal.nci.com.au ([203.38.215.137]:12563 "EHLO internal.nci.com.au") by oss.sgi.com with ESMTP id ; Mon, 19 Feb 2001 03:34:20 -0800 Received: from w95vmware.ns.com (ppp1.nci.com.au [172.30.0.161]) by internal.nci.com.au (8.9.3/8.9.3) with SMTP id WAA01076 for ; Mon, 19 Feb 2001 22:04:56 +1030 Message-Id: <3.0.6.32.20010219215602.00856770@203.16.214.248> X-Sender: ns@203.16.214.248 X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32) Date: Mon, 19 Feb 2001 21:56:02 +1000 To: netdev@oss.sgi.com From: Richard Sharpe Subject: UDP and Dest Unreachable, Port Unreachables Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1295 Lines: 39 Hi Alan, I am doing some work with Samba which involves sending a UDP datagram and waiting for a response from a Samba or Windows server. Every now and then, I get a timeout on the socket I am waiting for input from (atcually in a select statement) and I see, in the trace, that the Linux machine responds with a Dest Unreach, Port Unreach when the timeout occurs. Now, the way I do this is: Open socket and specify a local address (INADDR_ANY, port=138), and remote address (some IP and port =138). Construct and send the packet using the socket above Call routine to wait for input on socked above. I wonder what the actual semantics are for receiving UDP datagams? Will the kernel receive datagrams if a socket is open with the appropriate SIP, SPort, DIP, DPORT combination, or, does there have to be a read scheduled, either via a select, or an actual read? I find this problem on 2.2.18 and 2.4.0. It seems unlikely to be a bug, but I cannot think of a set of circumstances that may cause this behaviour? Can anyone see what the problem might be? Regards ------- Richard Sharpe, sharpe@ns.aus.com Samba (Team member, www.samba.org), Ethereal (Team member, www.ethereal.com) Contributing author, SAMS Teach Yourself Samba in 24 Hours Author, Special Edition, Using Samba From owner-netdev@oss.sgi.com Mon Feb 19 10:52:25 2001 Received: by oss.sgi.com id ; Mon, 19 Feb 2001 10:52:06 -0800 Received: from admin.csn.ul.ie ([136.201.105.1]:7952 "HELO admin.csn.ul.ie") by oss.sgi.com with SMTP id ; Mon, 19 Feb 2001 10:51:45 -0800 Received: from holly.csn.ul.ie (holly.csn.ul.ie [136.201.105.4]) by admin.csn.ul.ie (Postfix) with ESMTP id 786D13006; Mon, 19 Feb 2001 18:51:38 +0000 (GMT) Received: from skynet.csn.ul.ie (skynet [136.201.105.2]) by holly.csn.ul.ie (Postfix) with ESMTP id 66CA32B2B8; Mon, 19 Feb 2001 18:51:38 +0000 (GMT) Received: by skynet.csn.ul.ie (Postfix, from userid 2139) id 3F122A806; Mon, 19 Feb 2001 18:51:34 +0000 (GMT) Received: from localhost (localhost [127.0.0.1]) by skynet.csn.ul.ie (Postfix) with ESMTP id 3CE33A802; Mon, 19 Feb 2001 18:51:34 +0000 (GMT) Date: Mon, 19 Feb 2001 18:51:34 +0000 (GMT) From: Dave Airlie X-X-Sender: To: Richard Sharpe Cc: Subject: Re: UDP and Dest Unreachable, Port Unreachables In-Reply-To: <3.0.6.32.20010219215602.00856770@203.16.214.248> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1788 Lines: 60 Hi Richard, This is just a guess from the outside, but Linux reports UDP errors on the next socket operation I think, so if the UDP datagram you send fails, the error will come back on the next select.. Does tcpdump see a reply? Dave. On Mon, 19 Feb 2001, Richard Sharpe wrote: > Hi Alan, > > I am doing some work with Samba which involves sending a UDP datagram and > waiting for a response from a Samba or Windows server. > > Every now and then, I get a timeout on the socket I am waiting for input > from (atcually in a select statement) and I see, in the trace, that the > Linux machine responds with a Dest Unreach, Port Unreach when the timeout > occurs. > > Now, the way I do this is: > > Open socket and specify a local address (INADDR_ANY, port=138), and > remote address (some IP and port =138). > > Construct and send the packet using the socket above > > Call routine to wait for input on socked above. > > I wonder what the actual semantics are for receiving UDP datagams? Will the > kernel receive datagrams if a socket is open with the appropriate SIP, > SPort, DIP, DPORT combination, or, does there have to be a read scheduled, > either via a select, or an actual read? > > I find this problem on 2.2.18 and 2.4.0. > > It seems unlikely to be a bug, but I cannot think of a set of circumstances > that may cause this behaviour? > > Can anyone see what the problem might be? > > Regards > ------- > Richard Sharpe, sharpe@ns.aus.com > Samba (Team member, www.samba.org), Ethereal (Team member, www.ethereal.com) > Contributing author, SAMS Teach Yourself Samba in 24 Hours > Author, Special Edition, Using Samba > > > -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied@skynet.ie pam_smb / Linux DecStation / Linux VAX / ILUG person From owner-netdev@oss.sgi.com Mon Feb 19 12:21:36 2001 Received: by oss.sgi.com id ; Mon, 19 Feb 2001 12:21:26 -0800 Received: from internal.nci.com.au ([203.38.215.137]:17927 "EHLO internal.nci.com.au") by oss.sgi.com with ESMTP id ; Mon, 19 Feb 2001 12:21:04 -0800 Received: from w95vmware.ns.com (ppp1.nci.com.au [172.30.0.161]) by internal.nci.com.au (8.9.3/8.9.3) with SMTP id GAA01797; Tue, 20 Feb 2001 06:48:38 +1030 Message-Id: <3.0.6.32.20010220062808.008676d0@203.16.214.248> X-Sender: ns@203.16.214.248 X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32) Date: Tue, 20 Feb 2001 06:28:08 +1000 To: Dave Airlie From: Richard Sharpe Subject: Re: UDP and Dest Unreachable, Port Unreachables Cc: netdev@oss.sgi.com In-Reply-To: References: <3.0.6.32.20010219215602.00856770@203.16.214.248> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2369 Lines: 77 At 06:51 PM 2/19/01 +0000, Dave Airlie wrote: > >Hi Richard, > >This is just a guess from the outside, but Linux reports UDP errors on the >next socket operation I think, so if the UDP datagram you send fails, the >error will come back on the next select.. No, I saw the packet go out, and the response come back, but I saw an ICMP dest unreachable, port unreachable from the response. I thought there was a read happening, and the port was in use, I believe, but perhaps there is a funny timing window. >Does tcpdump see a reply? > >Dave. > >On Mon, 19 Feb 2001, Richard Sharpe wrote: > >> Hi Alan, >> >> I am doing some work with Samba which involves sending a UDP datagram and >> waiting for a response from a Samba or Windows server. >> >> Every now and then, I get a timeout on the socket I am waiting for input >> from (atcually in a select statement) and I see, in the trace, that the >> Linux machine responds with a Dest Unreach, Port Unreach when the timeout >> occurs. >> >> Now, the way I do this is: >> >> Open socket and specify a local address (INADDR_ANY, port=138), and >> remote address (some IP and port =138). >> >> Construct and send the packet using the socket above >> >> Call routine to wait for input on socked above. >> >> I wonder what the actual semantics are for receiving UDP datagams? Will the >> kernel receive datagrams if a socket is open with the appropriate SIP, >> SPort, DIP, DPORT combination, or, does there have to be a read scheduled, >> either via a select, or an actual read? >> >> I find this problem on 2.2.18 and 2.4.0. >> >> It seems unlikely to be a bug, but I cannot think of a set of circumstances >> that may cause this behaviour? >> >> Can anyone see what the problem might be? >> >> Regards >> ------- >> Richard Sharpe, sharpe@ns.aus.com >> Samba (Team member, www.samba.org), Ethereal (Team member, www.ethereal.com) >> Contributing author, SAMS Teach Yourself Samba in 24 Hours >> Author, Special Edition, Using Samba >> >> >> > >-- >David Airlie, Software Engineer >http://www.skynet.ie/~airlied / airlied@skynet.ie >pam_smb / Linux DecStation / Linux VAX / ILUG person > > > Regards ------- Richard Sharpe, sharpe@ns.aus.com Samba (Team member, www.samba.org), Ethereal (Team member, www.ethereal.com) Contributing author, SAMS Teach Yourself Samba in 24 Hours Author, Special Edition, Using Samba From owner-netdev@oss.sgi.com Mon Feb 19 16:25:32 2001 Received: by oss.sgi.com id ; Mon, 19 Feb 2001 16:25:22 -0800 Received: from inspiron.swusa.com ([207.214.125.61]:12680 "HELO saw.sw.com.sg") by oss.sgi.com with SMTP id ; Mon, 19 Feb 2001 16:25:06 -0800 Received: (qmail 22965 invoked by uid 577); 20 Feb 2001 00:25:08 -0000 Message-ID: <20010219162508.A22917@saw.sw.com.sg> Date: Mon, 19 Feb 2001 16:25:08 -0800 From: Andrey Savochkin To: Richard Sharpe Cc: netdev@oss.sgi.com Subject: Re: UDP and Dest Unreachable, Port Unreachables References: <3.0.6.32.20010219215602.00856770@203.16.214.248> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <3.0.6.32.20010219215602.00856770@203.16.214.248>; from "Richard Sharpe" on Mon, Feb 19, 2001 at 09:56:02PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 462 Lines: 11 On Mon, Feb 19, 2001 at 09:56:02PM +1000, Richard Sharpe wrote: > I wonder what the actual semantics are for receiving UDP datagams? Will the > kernel receive datagrams if a socket is open with the appropriate SIP, > SPort, DIP, DPORT combination, or, does there have to be a read scheduled, > either via a select, or an actual read? The datagrams should be received and queued regardless of whether some process executed select or read. Best regards Andrey From owner-netdev@oss.sgi.com Mon Feb 19 21:55:53 2001 Received: by oss.sgi.com id ; Mon, 19 Feb 2001 21:55:43 -0800 Received: from mgw-x3.nokia.com ([131.228.20.26]:5854 "EHLO mgw-x3.nokia.com") by oss.sgi.com with ESMTP id ; Mon, 19 Feb 2001 21:55:24 -0800 Received: from esvir07nok.ntc.nokia.com (esvir07nokt.ntc.nokia.com [172.21.143.39]) by mgw-x3.nokia.com (Switch-2.1.0/Switch-2.1.0) with ESMTP id f1K5tou08993 for ; Tue, 20 Feb 2001 07:55:50 +0200 (EET) Received: from esebh12nok.ntc.nokia.com (unverified) by esvir07nok.ntc.nokia.com (Content Technologies SMTPRS 4.2.1) with ESMTP id ; Tue, 20 Feb 2001 07:55:21 +0200 Received: from tolnx04.europe.nokia.com ([172.24.106.61]) by esebh12nok.ntc.nokia.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.78) id 18LJHT2Z; Tue, 20 Feb 2001 07:55:20 +0200 Received: (from jepeters@localhost) by tolnx04.europe.nokia.com (8.11.2/8.11.2) id f1K5tFM10760; Tue, 20 Feb 2001 14:55:15 +0900 X-Authentication-Warning: tolnx04.europe.nokia.com: jepeters set sender to jens-ulrik.petersen@nokia.com using -f From: Jens-Ulrik Petersen To: usagi-users@linux-ipv6.org, kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: usagi ipv6 and linux source Date: 20 Feb 2001 14:55:14 +0900 Message-ID: User-Agent: Gnus/5.0807 (Gnus v5.8.7) XEmacs/21.2 =?ISO-8859-1?Q?(Peisino=1B,Ak=1B(B?=) MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1340 Lines: 32 [I am not a subscriber to either of the above lists from the above (neutral) address, so I don't know if it will reach all the recipients the first time.] It is my understanding that there are some serious flaws in the IPv6 implementation in linux-2.4. Perhaps I am mistaken, but from my own experience I find that Linux hosts seem to have problems with address autoconfiguration and don't seem to like router advertisements from FreeBSD-4 ipv6 routers. (Presumably there is a corresponding bug in the Linux "rtadvd"?) While I know that the Usagi project is still in active development and also involved in work on glibc's ipv6 support and ip6 tools, I think the issues with the kernel are so important for Linux's ipv6 compliance and interworking with other vendors, they should be addressed as soon as possible. So I write to ask if it isn't possible to have the most essential fixes from the Usagi patches merged into the main kernel source as soon as possible, rather than making most of Linux community using the mainstream kernel wait a year or whatever before a more complete merge hopefully takes place. For the good of Linux I encourage all of those involved to work to this end swiftly and amiciably, so that we have a fine working IPv6 implementation in Linux. Yours sincerely, Jens Petersen -- include usual_disclaimer From owner-netdev@oss.sgi.com Tue Feb 20 00:02:54 2001 Received: by oss.sgi.com id ; Tue, 20 Feb 2001 00:02:44 -0800 Received: from mail.bieringer.de ([195.226.187.51]:530 "HELO titan.bieringer.de") by oss.sgi.com with SMTP id ; Tue, 20 Feb 2001 00:02:27 -0800 Received: (qmail 8031 invoked from network); 20 Feb 2001 08:02:24 -0000 Received: from p3e9b8e7b.dip.t-dialin.net (HELO worker.bieringer.de) (62.155.142.123) by mail.bieringer.de with SMTP; 20 Feb 2001 08:02:24 -0000 Message-Id: <5.0.2.1.0.20010220085131.02768688@mail.bieringer.de> X-Sender: list4peter@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Tue, 20 Feb 2001 09:03:56 +0100 To: Jens-Ulrik Petersen , usagi-users@linux-ipv6.org, kuznet@ms2.inr.ac.ru From: Peter Bieringer Subject: Re: usagi ipv6 and linux source Cc: netdev@oss.sgi.com In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1173 Lines: 29 At 06:55 20.02.2001, Jens-Ulrik Petersen wrote: >... >So I write to ask if it isn't possible to have the most essential >fixes from the Usagi patches merged into the main kernel source as >soon as possible, rather than making most of Linux community using the >mainstream kernel wait a year or whatever before a more complete merge >hopefully takes place. I would also wish that (and I'm shure, there are "few" others around). Some additional reasons: RedHat's 7.1 beta (fisher) and Linux-Mandrake 8.0 beta (cooker) enables IPv6 in their kernel packages, initscripts and part of applications. But they do not use (sure for RedHat, probably for Mandrake) the USAGI patch for kernel building. Therefore in the next times many not really full compatible IPv6 enabled distributions are rolled out in mass. Every user who wants to be full IPv6 compatible have to apply patches to kernel source and recompile the kernel. Ok, the IPv6 users did this since some years now, but now we should try to go one step further and skip the "recompile kernel step", because it's possible now. And please merge it into both major releases: 2.2.x and 2.4.x TIA, Peter From owner-netdev@oss.sgi.com Tue Feb 20 02:52:55 2001 Received: by oss.sgi.com id ; Tue, 20 Feb 2001 02:52:46 -0800 Received: from fw134054.kitanet.ne.jp ([210.237.134.54]:50445 "EHLO fw134054.kitanet.ne.jp") by oss.sgi.com with ESMTP id ; Tue, 20 Feb 2001 02:52:39 -0800 Received: from dom.osaru.yi.org (fw134054.kitanet.ne.jp [210.237.134.54]) by fw134054.kitanet.ne.jp (8.9.3/8.9.3) with ESMTP id TAA00536; Tue, 20 Feb 2001 19:52:33 +0900 Date: Tue, 20 Feb 2001 19:44:16 +0900 Message-ID: From: KANDA Mitsuru / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: netdev@oss.sgi.com, davem@redhat.com Cc: kanda@nn.iij4u.or.jp Subject: typo ? in ip6_output() User-Agent: Wanderlust/2.4.1 (Stand By Me) Emacs/20.7 Mule/4.1 (AOI) X-GnuPG-fingerprint: 9A35 D378 F084 9EA4 EFBA 925B 1C93 B376 F0EF BE59 X-URL: http://www.osaru.yi.org/~mk/ X-My-AutoMobile: M2-1001 chassis#030 MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 419 Lines: 21 Hello, I found a typo in ip6_output() (net/ipv6/ip6_output.c.). --- ip6_output.c 2000/12/29 07:06:58 1.5 +++ ip6_output.c 2001/02/20 10:44:57 @@ -118,7 +118,7 @@ is not supported in any case. */ if (newskb) - NF_HOOK(PF_INET, NF_IP6_POST_ROUTING, newskb, NULL, + NF_HOOK(PF_INET6, NF_IP6_POST_ROUTING, newskb, NULL, newskb->dev, ip6_dev_loopback_xmit); regards, KANDA Mitsuru From owner-netdev@oss.sgi.com Tue Feb 20 03:53:44 2001 Received: by oss.sgi.com id ; Tue, 20 Feb 2001 03:53:34 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:60858 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Tue, 20 Feb 2001 03:53:16 -0800 Received: from fred.muc.de (noidentity@ns1041.munich.netsurf.de [195.180.235.41]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id MAA19292; Tue, 20 Feb 2001 12:53:08 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id D2D9DE3447; Tue, 20 Feb 2001 12:23:50 +0100 (CET) Date: Tue, 20 Feb 2001 12:23:50 +0100 From: Andi Kleen To: Andrey Savochkin Cc: Richard Sharpe , netdev@oss.sgi.com Subject: Re: UDP and Dest Unreachable, Port Unreachables Message-ID: <20010220122350.A7684@fred.local> References: <3.0.6.32.20010219215602.00856770@203.16.214.248> <20010219162508.A22917@saw.sw.com.sg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20010219162508.A22917@saw.sw.com.sg>; from saw@saw.sw.com.sg on Tue, Feb 20, 2001 at 01:26:26AM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 732 Lines: 19 On Tue, Feb 20, 2001 at 01:26:26AM +0100, Andrey Savochkin wrote: > On Mon, Feb 19, 2001 at 09:56:02PM +1000, Richard Sharpe wrote: > > I wonder what the actual semantics are for receiving UDP datagams? Will the > > kernel receive datagrams if a socket is open with the appropriate SIP, > > SPort, DIP, DPORT combination, or, does there have to be a read scheduled, > > either via a select, or an actual read? > > The datagrams should be received and queued regardless of whether some > process executed select or read. ... except when the socket buffer overflows, but even then it shouldn't send a ICMP. ICMP means that the socket is closed or was not bound to a matching address -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Tue Feb 20 04:29:24 2001 Received: by oss.sgi.com id ; Tue, 20 Feb 2001 04:29:15 -0800 Received: from hq.pm.waw.pl ([195.116.170.10]:47882 "EHLO hq.pm.waw.pl") by oss.sgi.com with ESMTP id ; Tue, 20 Feb 2001 04:28:55 -0800 Received: (from uucp@localhost) by hq.pm.waw.pl with UUCP id f1KCOmT18243 for netdev@oss.sgi.com; Tue, 20 Feb 2001 13:24:48 +0100 Received: (from khc@localhost) by intrepid.pm.waw.pl (8.11.0/8.11.0) id f1KCPNR26887; Tue, 20 Feb 2001 13:25:23 +0100 To: netdev@oss.sgi.com Subject: net packet queue scheduler, packet_type and proto handlers Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII From: Krzysztof Halasa Date: 20 Feb 2001 13:25:22 +0100 Message-ID: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 3288 Lines: 91 Hi, What do you think about the following change? We currently have the following structure used for registering protocol handlers as well as bridges, wiretaps etc. (include/linux/netdevice.h): struct packet_type { unsigned short type; /* This is really htons(ether_type). */ struct net_device *dev; /* NULL is wildcarded here */ int (*func) (struct sk_buff *, struct net_device *, struct packet_type *); void *data;/* Private to the packet type */ struct packet_type *next; }; The func() is the protocol handler. It's required to free an skb or to pass it elsewhere. Its return value is "int", but it's in fact unused, and the handlers return values at random. OTOH we have stacked protocol handlers which effectively strip headers from an skb, retrieve protocol# and pass it to the next protocol handler - either using netif_rx (which cause problems with a packet being sent to taps twice, and other problems) or doing things like this (net/ax25/ax25_in.c): switch (skb->data[1]) { #ifdef CONFIG_INET case AX25_P_IP: skb_pull(skb,2); /* drop PID/CTRL */ skb->h.raw = skb->data; skb->nh.raw = skb->data; skb->dev = dev; skb->pkt_type = PACKET_HOST; skb->protocol = htons(ETH_P_IP); ip_rcv(skb, dev, ptype); break; case AX25_P_ARP: skb_pull(skb,2); skb->h.raw = skb->data; skb->nh.raw = skb->data; skb->dev = dev; skb->pkt_type = PACKET_HOST; skb->protocol = htons(ETH_P_ARP); arp_rcv(skb, dev, ptype); ... As the number of protocols grow and some of them can be modular it doesn't seem wise to hardcode every protocol handler name in all such stacked handlers, especially when we have the same info in ptype_base[] table (net/core/dev.c). What I think would be better is we should make the handler (func()) return a meaningful value: 0 if the skb has been freed/accepted and non-0 if the handler has stripped a header from it and it should be re-examined. The above code would then read: switch (skb->data[1]) { #ifdef CONFIG_INET case AX25_P_IP: skb_pull(skb,2); /* drop PID/CTRL */ skb->protocol = htons(ETH_P_IP); break; case AX25_P_ARP: skb_pull(skb,2); skb->protocol = htons(ETH_P_ARP); break; ... } skb->h.raw = skb->data; skb->nh.raw = skb->data; skb->dev = dev; skb->pkt_type = PACKET_HOST; return 1; /* skb changed, returned to upper layer for re-inspection */ Of course, taps and bridges wouldn't be allowed to ask for re-inspection :-) What do you think about this change? Does anything depend on the current behavior? -- Krzysztof Halasa Network Administrator From owner-netdev@oss.sgi.com Tue Feb 20 20:28:11 2001 Received: by oss.sgi.com id ; Tue, 20 Feb 2001 20:27:51 -0800 Received: from asbestos.linuxcare.com.au ([203.17.0.30]:61680 "EHLO halfway") by oss.sgi.com with ESMTP id ; Tue, 20 Feb 2001 20:27:39 -0800 Received: from halfway ([127.0.0.1] helo=linuxcare.com.au ident=rusty) by halfway with esmtp (Exim 3.22 #1 (Debian)) id 14VQsF-0007fj-00; Wed, 21 Feb 2001 15:27:31 +1100 From: Rusty Russell To: Harald Welte Cc: netfilter-devel@us5.samba.org, netdev@oss.sgi.com Subject: Re: [PATCH] SO_ORIGINAL_DST and sockaddr_in In-reply-to: Your message of "Thu, 15 Feb 2001 09:21:40 BST." <20010215092140.Z27130@coruscant.gnumonks.org> Date: Wed, 21 Feb 2001 15:27:31 +1100 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 718 Lines: 18 In message <20010215092140.Z27130@coruscant.gnumonks.org> you write: > On Wed, Feb 07, 2001 at 06:30:07PM +0100, Balazs Scheidler wrote: > > Hi, > > > > SO_ORIGINAL_DST requires a sockaddr buffer with size equal to sizeof(struct > > sockaddr_in)), this is broken in my opinion, a buffer with at least > > sizeof(struct sockaddr_in)) bytes should be enough. Trivial patch is below: > > I think you're right. there's no point in rejecting a 'too big' buffer. Is there a point in allowing a too-big buffer? I know that getpeername() and getsockname() do, but it's an indication of an error on the user code, to me. Is there some convincing argument I am missing? Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Tue Feb 20 23:34:33 2001 Received: by oss.sgi.com id ; Tue, 20 Feb 2001 23:34:14 -0800 Received: from cpu2747.adsl.bellglobal.com ([207.236.55.216]:29935 "EHLO grendel.conscoop.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Tue, 20 Feb 2001 23:33:53 -0800 Received: (from rgb@localhost) by grendel.conscoop.ottawa.on.ca (8.11.1/8.11.1) id f1L7Ymo10907; Wed, 21 Feb 2001 02:34:48 -0500 Date: Wed, 21 Feb 2001 02:34:48 -0500 From: Richard Guy Briggs To: Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list Cc: Hugh Daniel , John Gilmore , Hugh Redelmeier , Henry Spencer Message-ID: <20010221023448.G9886@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 10874 Lines: 266 -----BEGIN PGP SIGNED MESSAGE----- Here is a third edition of the FreeS/WAN redesign plans. Please pick it apart. Some glaring errors have been fixed. FreeS/WAN IPSEC -- KLIPS2 DESIGN THOUGHTS ========================================= Wed Feb 21 02:17:58 EST 2001 This document was originally written 2.5 weeks after OLS2000, inspired from a meeting with Rusty and Marc in Montreal in November 1999 and two meetings at OLS2000. Current kernel version reference is 2.4.0 The idea is to redesign KLIPS (kernel parts of FreeS/WAN) to avoid all the 'stoopid routing tricks' (TM) to which we have had to resort over the last 2+ years by disassociating any ipsec devices from physical devices and to add a proper SPDB to do proper incoming IPSEC policy checks. We are hoping to use existing pattern-matching tools rather than invent our own. NetFilter appears to have all the pattern matching capabilities, but is limited in other ways. There is also a significant interest in enabling FreeS/WAN to communicate with routing daemons and be able to do load sharing and failover: http://www.quintillion.com/fdis/moat/ipsec+routing/ This is an exploratory document. Please comment, particularly if I have missed or mis-understood something, to the linux-ipsec, netfilter-devel or netdev lists. The basic architecture of NetFilter is: --->[1]--->(ROUTE)--->[3]--->[4]---> where: | ^ [1] NF_IP_PRE_ROUTING | | [2] NF_IP_LOCAL_IN | (ROUTE) [3] NF_IP_FORWARD v | [4] NF_IP_POST_ROUTING [2] [5] [5] NF_IP_LOCAL_OUT | ^ | | v | The basic path through the kernel as it concerns IPSEC for the three types of packets is as follows: IN: NIC sanity check NF_IP_PRE_ROUTING route-in ip-options processing defragment NF_IP_LOCAL_IN layer3demux application FORWARD: NIC sanity check NF_IP_PRE_ROUTING routing-in ip-options processing ttl decrement and check NF_IP_FORWARD fragment NF_IP_POST_ROUTING output() NIC OUT: application layer3mux NF_IP_LOCAL_OUT route-out NF_IP_POST_ROUTING output() NIC Destination NAT (port forwarding) gets applied in NF_IP_PRE_ROUTING, NF_IP_LOCAL_OUT and Source NAT (masquerading) gets applied in NF_IP_POST_ROUTING. Filtering is applied in NF_IP_LOCAL_IN, NF_IP_FORWARD and NF_IP_LOCAL_OUT. Hook processing order would generally be: NF_IP_PRI_IPSEC_IN? NF_IP_PRI_CONNTRACK NF_IP_PRI_IPSEC_IN? NF_IP_PRI_MANGLE NF_IP_PRI_NAT_DST NF_IP_PRI_FILTER NF_IP_PRI_NAT_SRC NF_IP_PRI_IPSEC_OUT Not all modules are present at each hook. I am uncertain still if IPSEC_IN should be before or after CONNTRACK. Any comments? - ----------- There is more than one possible approach. The following is not exhaustive. So far, the first is much better thought out and so far, preferred. --- 1 --- Treat incoming IPSEC encapsulation as a layer 3 protocol and decapsulate it at the Layer 3 demultiplexer. An incoming packet starts off with a sanity check. It then goes through all the NF_IP_PRE_ROUTING hooks starting with the SPDB checking. Since it is a fresh ESP or AH packet, it will not have any nfmarks and unless that outer IP header should have been processed by another SG in between, no policy will have been required, letting it through. The rest of the NF_IP_PRE_ROUTING hooks may cause it to be DNATed and defragmented. It then goes through routing which thinks it is a local packet, deals with any outer header IP options, then defragmentation and NF_IP_LOCAL_IN filter (allow ESP,AH) before getting to ipsec_rcv() where the outer bundle is authenticated and decrypted and nfmarked to indicate what decapsulation happenned before being passed back to netif_rx(). The next IP header is now visible. The packet now gets re-injected at the beginning. It goes through the incoming sanity check again, getting checked at NF_IP_PRE_ROUTING for policy using previously set nfmark from decryption. It may again be DNATed and defragmented. Routing looks at the now-visible next IP header and routes it locally or via the forward hook. If it is a local packet, IP options and defragmentation are processed. NF_IP_LOCAL_IN then gets to check filtering policy for other L3 protocols. If it is the endpoint for multiple bundles, it is sent back to netif_rx(), having exposed the next IP header. If it is not a local packet, routing has selected a route, potentially through an existing virtual IPSEC device, one per connection, not per physical I/F. IP options and TTL are processed before being filtered at NF_IP_FORWARD, fragmented, then sent to NF_IP_POST_ROUTING. If it is a locally generated packet, it would go through normal filtering at NF_IP_LOCAL_OUT, then go through routing, then go to NF_IP_POST_ROUTING. At NF_IP_POST_ROUTING, an IPSEC matching module would make a decision about the fate of the packet. It would have several possible targets: ACCEPT would allow the packet through with no processing. ENCRYPT would steal the packet. If the SA(s) do(es)n't exist(s), it would send up an ACQUIRE to all listening key management daemons and stash the last copy of the packet, waiting for the appropriate SA(s). If or once the SA(s) is/are available, it then ecrypts the packet, then re-injects the packet at NF_IP_LOCAL_OUT (since the packet now appears to originate from this host) and setting nfmark to indicate what processing happenned. The packet would then be routed and sent back to NF_IP_POST_ROUTING. If no new nfmark is generated, the IPSEC module would ACCEPT it. DROP would drop the packet if previous attempts to do opportunistic encryption failed and the default policy was to block non-IPSEC packets. A packet routed through an optional IPSEC virtual I/F simply gets assigned a specific source address and has the nfmark preloaded. Does this sound correct? The way that nfmark is used is rather vague. It is presently only 32 bits. Ideally, I would like to be able to indicate exactly which SAs were processed on the way in, which would most easily be represented by as many as 4 SAs (AH, ESP, IPCOMP, IPIP), each having an 8 bit protocol field (absolute minimum of 2-bits), 32-bit destination address field (for IPv4, IPv6 would be 128) and a 32-bit SPI. This is a potential maximum of 672 bits. A way of mapping 672 bits on to the 32 bits available would be required to use this. A lookup table could be used to map nfmarks to SAIDs, not the SAs themselves, since the SAs could disappear at any time the tdb table is not locked. It should be able to represent a bundle of SAs where one SA could be used in more than one bundle. There could also be more than one right answer for the incoming SPDB. I have an idea how to accomplish this by changing/extending nfmark by converting it to a list of nfmark structures that contain a pointer to the next item on the list, a cookie for the specific netfilter function that owns the data and a pointer to a data structure. nfmark may not be the right tool for this. Another possible solution is to add a member to the struct sk_buff to point to this information. This has the benefit of not depending on anyone else, but the drawback of needing to patch a header file *and recompiling the entire kernel*. The SADB would be managed via the PF_KEYv2 socket I/F. The SPDB would be managed via a combination of PF_KEYv2 socket I/F extensions and iptables. A separate NetFilter table called 'ipsec' (as opposed to 'filter' or 'nat') would have the first hook at NF_IP_PRE_ROUTING and the last hook at NF_IP_POST_ROUTING. iptables uses the AF_NETLINK socket family. - ----------- --- 2 --- Treat incoming IPSEC encapsulation as an enhancement of the layer 2 protocol and decapsulate it at the NF_IP_PRE_ROUTING hook. This option is less favourable as it stands since it involves creating our own SPDB engine. An incoming packet starts off with a sanity check. It then goes through the NF_IP_PRE_ROUTING match hook for IPSEC, which would be the first in priority, matching every single packet to force it through a policy check. If it was an ESP or AH packet with a local destination address, it would then be sent to ipsec_rcv() and the first bundle would be processed, keeping state until that bundle is completely processed. At this point the incoming SPDB would be checked to ensure that the proper policy had been applied to it. If there is another bundle inside with an ESP or AH header, that bundle is processed, storing the new and old state. This SPDB check would not be iptables-based since we have already gone through the match and target hooks and would have too much state to store in nfmark. The result of the SPDB check would be ACCEPT or DROP (It could also be STOLEN or QUEUEd at this point for opportunistic encryption). The SADB and SPDB entries would be managed via the extended PF_KEYv2 socket I/F. The rest of the NF_IP_PRE_ROUTING hooks may cause it to be DNATed and defragmented. It then gets routed. For local packets, inner IP options and defragmentation are processed. NF_IP_LOCAL_IN then gets to check filtering policy for layer 3 protocols. For non-local packets, IP options and TTL are processed before being filtered at NF_IP_FORWARD then fragmented. Packets are then go through the NF_IP_POST_ROUTING hooks potentially for SNAT, after which the last hook would force all packets to go through the IPSEC outgoing processing module. Here outgoing policy would be checked, again not necessarily by iptables. A result could be ACCEPT, DROP or STOLEN. The last would result in encryption and authentication would be applied as available, then the result would be re-injected at NF_IP_LOCAL_IN, since it would now have a local address, a potentially different destination address and need to be re-routed. A mechanism would need to be used here to prevent recursion. - ------------------ If there are any other directions we should be considering, please suggest... slainte mhath, RGB - -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: -----BEGIN PGP SIGNATURE----- Version: 2.6.3i Charset: noconv iQCVAwUBOpNvmN+sBuIhFagtAQGYZAP+ObOC6iYIggyxPzCZxvy8ymI34BStAPW9 OwjVjT0sDOXo5qg3gYHvKDyv/CAKQIyL2YjKV8HRP0H1JgeFB/gt+OGekW+/zf11 efIDvWOcDLT86pTPF0gn8c6+bk/U8SXjLpBCbI7otMwPGfoXo4bF/ipiiR49bkoR 2zVQLydIwDo= =3DOZ -----END PGP SIGNATURE----- From owner-netdev@oss.sgi.com Tue Feb 20 23:41:33 2001 Received: by oss.sgi.com id ; Tue, 20 Feb 2001 23:41:14 -0800 Received: from cpu2747.adsl.bellglobal.com ([207.236.55.216]:32495 "EHLO grendel.conscoop.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Tue, 20 Feb 2001 23:41:07 -0800 Received: (from rgb@localhost) by grendel.conscoop.ottawa.on.ca (8.11.1/8.11.1) id f1L7g3710953; Wed, 21 Feb 2001 02:42:03 -0500 Date: Wed, 21 Feb 2001 02:42:03 -0500 From: Richard Guy Briggs To: Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list Cc: Hugh Daniel , John Gilmore , Hugh Redelmeier , Henry Spencer Subject: FreeS/WAN redesign thoughts (KLIPS, IPSEC) Message-ID: <20010221024203.H9886@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 11000 Lines: 268 -----BEGIN PGP SIGNED MESSAGE----- Here is a third edition of the FreeS/WAN redesign plans. Please pick it apart. Some glaring errors have been fixed. A thousand pardons for the potentially multiple posting, I feel like a complete boob forgetting to include a subject line... FreeS/WAN IPSEC -- KLIPS2 DESIGN THOUGHTS ========================================= Wed Feb 21 02:17:58 EST 2001 This document was originally written 2.5 weeks after OLS2000, inspired from a meeting with Rusty and Marc in Montreal in November 1999 and two meetings at OLS2000. Current kernel version reference is 2.4.0 The idea is to redesign KLIPS (kernel parts of FreeS/WAN) to avoid all the 'stoopid routing tricks' (TM) to which we have had to resort over the last 2+ years by disassociating any ipsec devices from physical devices and to add a proper SPDB to do proper incoming IPSEC policy checks. We are hoping to use existing pattern-matching tools rather than invent our own. NetFilter appears to have all the pattern matching capabilities, but is limited in other ways. There is also a significant interest in enabling FreeS/WAN to communicate with routing daemons and be able to do load sharing and failover: http://www.quintillion.com/fdis/moat/ipsec+routing/ This is an exploratory document. Please comment, particularly if I have missed or mis-understood something, to the linux-ipsec, netfilter-devel or netdev lists. The basic architecture of NetFilter is: --->[1]--->(ROUTE)--->[3]--->[4]---> where: | ^ [1] NF_IP_PRE_ROUTING | | [2] NF_IP_LOCAL_IN | (ROUTE) [3] NF_IP_FORWARD v | [4] NF_IP_POST_ROUTING [2] [5] [5] NF_IP_LOCAL_OUT | ^ | | v | The basic path through the kernel as it concerns IPSEC for the three types of packets is as follows: IN: NIC sanity check NF_IP_PRE_ROUTING route-in ip-options processing defragment NF_IP_LOCAL_IN layer3demux application FORWARD: NIC sanity check NF_IP_PRE_ROUTING routing-in ip-options processing ttl decrement and check NF_IP_FORWARD fragment NF_IP_POST_ROUTING output() NIC OUT: application layer3mux NF_IP_LOCAL_OUT route-out NF_IP_POST_ROUTING output() NIC Destination NAT (port forwarding) gets applied in NF_IP_PRE_ROUTING, NF_IP_LOCAL_OUT and Source NAT (masquerading) gets applied in NF_IP_POST_ROUTING. Filtering is applied in NF_IP_LOCAL_IN, NF_IP_FORWARD and NF_IP_LOCAL_OUT. Hook processing order would generally be: NF_IP_PRI_IPSEC_IN? NF_IP_PRI_CONNTRACK NF_IP_PRI_IPSEC_IN? NF_IP_PRI_MANGLE NF_IP_PRI_NAT_DST NF_IP_PRI_FILTER NF_IP_PRI_NAT_SRC NF_IP_PRI_IPSEC_OUT Not all modules are present at each hook. I am uncertain still if IPSEC_IN should be before or after CONNTRACK. Any comments? - ----------- There is more than one possible approach. The following is not exhaustive. So far, the first is much better thought out and so far, preferred. --- 1 --- Treat incoming IPSEC encapsulation as a layer 3 protocol and decapsulate it at the Layer 3 demultiplexer. An incoming packet starts off with a sanity check. It then goes through all the NF_IP_PRE_ROUTING hooks starting with the SPDB checking. Since it is a fresh ESP or AH packet, it will not have any nfmarks and unless that outer IP header should have been processed by another SG in between, no policy will have been required, letting it through. The rest of the NF_IP_PRE_ROUTING hooks may cause it to be DNATed and defragmented. It then goes through routing which thinks it is a local packet, deals with any outer header IP options, then defragmentation and NF_IP_LOCAL_IN filter (allow ESP,AH) before getting to ipsec_rcv() where the outer bundle is authenticated and decrypted and nfmarked to indicate what decapsulation happenned before being passed back to netif_rx(). The next IP header is now visible. The packet now gets re-injected at the beginning. It goes through the incoming sanity check again, getting checked at NF_IP_PRE_ROUTING for policy using previously set nfmark from decryption. It may again be DNATed and defragmented. Routing looks at the now-visible next IP header and routes it locally or via the forward hook. If it is a local packet, IP options and defragmentation are processed. NF_IP_LOCAL_IN then gets to check filtering policy for other L3 protocols. If it is the endpoint for multiple bundles, it is sent back to netif_rx(), having exposed the next IP header. If it is not a local packet, routing has selected a route, potentially through an existing virtual IPSEC device, one per connection, not per physical I/F. IP options and TTL are processed before being filtered at NF_IP_FORWARD, fragmented, then sent to NF_IP_POST_ROUTING. If it is a locally generated packet, it would go through normal filtering at NF_IP_LOCAL_OUT, then go through routing, then go to NF_IP_POST_ROUTING. At NF_IP_POST_ROUTING, an IPSEC matching module would make a decision about the fate of the packet. It would have several possible targets: ACCEPT would allow the packet through with no processing. ENCRYPT would steal the packet. If the SA(s) do(es)n't exist(s), it would send up an ACQUIRE to all listening key management daemons and stash the last copy of the packet, waiting for the appropriate SA(s). If or once the SA(s) is/are available, it then ecrypts the packet, then re-injects the packet at NF_IP_LOCAL_OUT (since the packet now appears to originate from this host) and setting nfmark to indicate what processing happenned. The packet would then be routed and sent back to NF_IP_POST_ROUTING. If no new nfmark is generated, the IPSEC module would ACCEPT it. DROP would drop the packet if previous attempts to do opportunistic encryption failed and the default policy was to block non-IPSEC packets. A packet routed through an optional IPSEC virtual I/F simply gets assigned a specific source address and has the nfmark preloaded. Does this sound correct? The way that nfmark is used is rather vague. It is presently only 32 bits. Ideally, I would like to be able to indicate exactly which SAs were processed on the way in, which would most easily be represented by as many as 4 SAs (AH, ESP, IPCOMP, IPIP), each having an 8 bit protocol field (absolute minimum of 2-bits), 32-bit destination address field (for IPv4, IPv6 would be 128) and a 32-bit SPI. This is a potential maximum of 672 bits. A way of mapping 672 bits on to the 32 bits available would be required to use this. A lookup table could be used to map nfmarks to SAIDs, not the SAs themselves, since the SAs could disappear at any time the tdb table is not locked. It should be able to represent a bundle of SAs where one SA could be used in more than one bundle. There could also be more than one right answer for the incoming SPDB. I have an idea how to accomplish this by changing/extending nfmark by converting it to a list of nfmark structures that contain a pointer to the next item on the list, a cookie for the specific netfilter function that owns the data and a pointer to a data structure. nfmark may not be the right tool for this. Another possible solution is to add a member to the struct sk_buff to point to this information. This has the benefit of not depending on anyone else, but the drawback of needing to patch a header file *and recompiling the entire kernel*. The SADB would be managed via the PF_KEYv2 socket I/F. The SPDB would be managed via a combination of PF_KEYv2 socket I/F extensions and iptables. A separate NetFilter table called 'ipsec' (as opposed to 'filter' or 'nat') would have the first hook at NF_IP_PRE_ROUTING and the last hook at NF_IP_POST_ROUTING. iptables uses the AF_NETLINK socket family. - ----------- --- 2 --- Treat incoming IPSEC encapsulation as an enhancement of the layer 2 protocol and decapsulate it at the NF_IP_PRE_ROUTING hook. This option is less favourable as it stands since it involves creating our own SPDB engine. An incoming packet starts off with a sanity check. It then goes through the NF_IP_PRE_ROUTING match hook for IPSEC, which would be the first in priority, matching every single packet to force it through a policy check. If it was an ESP or AH packet with a local destination address, it would then be sent to ipsec_rcv() and the first bundle would be processed, keeping state until that bundle is completely processed. At this point the incoming SPDB would be checked to ensure that the proper policy had been applied to it. If there is another bundle inside with an ESP or AH header, that bundle is processed, storing the new and old state. This SPDB check would not be iptables-based since we have already gone through the match and target hooks and would have too much state to store in nfmark. The result of the SPDB check would be ACCEPT or DROP (It could also be STOLEN or QUEUEd at this point for opportunistic encryption). The SADB and SPDB entries would be managed via the extended PF_KEYv2 socket I/F. The rest of the NF_IP_PRE_ROUTING hooks may cause it to be DNATed and defragmented. It then gets routed. For local packets, inner IP options and defragmentation are processed. NF_IP_LOCAL_IN then gets to check filtering policy for layer 3 protocols. For non-local packets, IP options and TTL are processed before being filtered at NF_IP_FORWARD then fragmented. Packets are then go through the NF_IP_POST_ROUTING hooks potentially for SNAT, after which the last hook would force all packets to go through the IPSEC outgoing processing module. Here outgoing policy would be checked, again not necessarily by iptables. A result could be ACCEPT, DROP or STOLEN. The last would result in encryption and authentication would be applied as available, then the result would be re-injected at NF_IP_LOCAL_IN, since it would now have a local address, a potentially different destination address and need to be re-routed. A mechanism would need to be used here to prevent recursion. - ------------------ If there are any other directions we should be considering, please suggest... slainte mhath, RGB - -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: -----BEGIN PGP SIGNATURE----- Version: 2.6.3i Charset: noconv iQCVAwUBOpNxS9+sBuIhFagtAQE8UAP/WF4OwXopq7HhJPSuK5a8XyiZSUJpQcbC IHefyFMFzswQAJDAu4JrRIWevwHPWTrm5PZ7zsALkQM0WwbcRCz8uueItcg2sKmS aMfp1dbbMlmgPk1HTwIDBaeHOEIf8yyyy4S6W0gIyb8x4mdI4nx0zbEbNPXkjG/H gB9G69Fod+M= =wdwQ -----END PGP SIGNATURE----- From owner-netdev@oss.sgi.com Wed Feb 21 07:23:56 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 07:23:36 -0800 Received: from coruscant.franken.de ([193.174.159.226]:22533 "EHLO coruscant.gnumonks.org") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 07:23:12 -0800 Received: from laforge by coruscant.gnumonks.org with local (Exim 3.22 #1) id 14Vb6T-0006T9-00; Wed, 21 Feb 2001 16:22:53 +0100 Date: Wed, 21 Feb 2001 16:22:53 +0100 From: Harald Welte To: Rusty Russell Cc: netfilter-devel@us5.samba.org, netdev@oss.sgi.com Subject: Re: [PATCH] SO_ORIGINAL_DST and sockaddr_in Message-ID: <20010221162253.B17431@coruscant.gnumonks.org> References: <20010215092140.Z27130@coruscant.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from rusty@linuxcare.com.au on Wed, Feb 21, 2001 at 03:27:31PM +1100 X-Operating-System: 2.4.0-test11p4 X-Date: Today is Boomtime, the 47th day of Chaos in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1528 Lines: 36 On Wed, Feb 21, 2001 at 03:27:31PM +1100, Rusty Russell wrote: > In message <20010215092140.Z27130@coruscant.gnumonks.org> you write: > > On Wed, Feb 07, 2001 at 06:30:07PM +0100, Balazs Scheidler wrote: > > > Hi, > > > > > > SO_ORIGINAL_DST requires a sockaddr buffer with size equal to sizeof(struct > > > sockaddr_in)), this is broken in my opinion, a buffer with at least > > > sizeof(struct sockaddr_in)) bytes should be enough. Trivial patch is below: > > > > I think you're right. there's no point in rejecting a 'too big' buffer. > > Is there a point in allowing a too-big buffer? I know that > getpeername() and getsockname() do, but it's an indication of an error > on the user code, to me. Hm. This sounds like an issue of interpretation. I have the following opinion: As long as there's enough space for netfilter/iptables to write its data in: don't care. The reason of this check is to know we have enough space.. isn't it? > Is there some convincing argument I am missing? As somebody else pointed out: If you want to use one allocated buffer for several things (no, I don't want to talk about programming style). > Rusty. > -- > Premature optmztion is rt of all evl. --DK > -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Wed Feb 21 12:28:16 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 12:27:56 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:23822 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 21 Feb 2001 12:27:44 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA17266; Wed, 21 Feb 2001 23:27:19 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102212027.XAA17266@ms2.inr.ac.ru> Subject: Re: [PATCH] SO_ORIGINAL_DST and sockaddr_in To: rusty@linuxcare.COM.AU (Rusty Russell) Date: Wed, 21 Feb 2001 23:27:19 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Rusty Russell" at Feb 21, 1 07:45:00 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 444 Lines: 14 Hello! > Is there a point in allowing a too-big buffer? I know that > getpeername() and getsockname() do, but it's an indication of an error > on the user code, to me. Please, look into specs. F.e. unix98. Austin group has some draft too. getsockopt() must work not depending on buffer size. If buffer is too short, it must truncate. In length filed it must return size of object (not size of copied data), if I remember correctly. Alexey From owner-netdev@oss.sgi.com Wed Feb 21 13:43:47 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 13:43:27 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:12696 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 13:43:09 -0800 Received: from fred.muc.de (noidentity@ns1097.munich.netsurf.de [195.180.235.97]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id WAA00280; Wed, 21 Feb 2001 22:42:48 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id B4CCFE3BB9; Wed, 21 Feb 2001 22:35:32 +0100 (CET) Date: Wed, 21 Feb 2001 22:35:32 +0100 From: Andi Kleen To: Jens-Ulrik Petersen Cc: usagi-users@linux-ipv6.org, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: usagi ipv6 and linux source Message-ID: <20010221223532.C1758@fred.local> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from juhp@users.sourceforge.net on Tue, Feb 20, 2001 at 06:56:36AM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 594 Lines: 14 On Tue, Feb 20, 2001 at 06:56:36AM +0100, Jens-Ulrik Petersen wrote: > It is my understanding that there are some serious flaws in the IPv6 > implementation in linux-2.4. Perhaps I am mistaken, but from my own > experience I find that Linux hosts seem to have problems with address > autoconfiguration and don't seem to like router advertisements from > FreeBSD-4 ipv6 routers. (Presumably there is a corresponding bug in > the Linux "rtadvd"?) When you see any specific problems you should send tcpdumps of the incidents to the list. Then they can be fixed if they are real bugs. -Andi From owner-netdev@oss.sgi.com Wed Feb 21 13:43:47 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 13:43:28 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:43928 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 13:43:17 -0800 Received: from fred.muc.de (noidentity@ns1097.munich.netsurf.de [195.180.235.97]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id WAA00270; Wed, 21 Feb 2001 22:42:37 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id 6C62FE3BB8; Wed, 21 Feb 2001 22:33:38 +0100 (CET) Date: Wed, 21 Feb 2001 22:33:38 +0100 From: Andi Kleen To: Krzysztof Halasa Cc: netdev@oss.sgi.com Subject: Re: net packet queue scheduler, packet_type and proto handlers Message-ID: <20010221223338.B1758@fred.local> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from khc@intrepid.pm.waw.pl on Tue, Feb 20, 2001 at 01:30:08PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 685 Lines: 26 On Tue, Feb 20, 2001 at 01:30:08PM +0100, Krzysztof Halasa wrote: > > switch (skb->data[1]) { > #ifdef CONFIG_INET > case AX25_P_IP: [...] Ugh. Looks like a bug in the AX25 layer. > What do you think about this change? Does anything depend on the current > behavior? I think it's not nice to let the upper layer handle the retry in all cases, although it's only needed in a few special cases. Using netif_rx for the uncommon case of reexamining the packet is better, because you keep it out of the fast path. I don't see the twice examining by taps as a problem, it doesn't break anything as far as I know. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Wed Feb 21 15:49:58 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 15:49:48 -0800 Received: from foobar.napster.com ([64.124.41.10]:36111 "EHLO foobar.napster.com") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 15:49:19 -0800 Received: from wagner.napster.com (mail.napster.com [63.108.185.112]) by foobar.napster.com (8.9.3/8.9.3) with ESMTP id PAA28281; Wed, 21 Feb 2001 15:49:13 -0800 Received: from napster.com (gw.napster.com [63.108.185.120]) by wagner.napster.com (8.9.3/8.9.3) with ESMTP id PAA11827; Wed, 21 Feb 2001 15:49:09 -0800 Message-ID: <3A9453F4.993A9A74@napster.com> Date: Wed, 21 Feb 2001 15:49:08 -0800 From: Jordan Mendelson Organization: Napster, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-ac17 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: ookhoi@dds.nl, Vibol Hou , Linux-Kernel , sim@stormix.com, netdev@oss.sgi.com Subject: Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) References: <20010221104723.C1714@humilis> <14995.40701.818777.181432@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1101 Lines: 29 "David S. Miller" wrote: > > Ookhoi writes: > > We have exactly the same problem but in our case it depends on the > > following three conditions: 1, kernel 2.4 (2.2 is fine), 2, windows ip > > header compression turned on, 3, a free internet access provider in > > Holland called 'Wish' (which seemes to stand for 'I Wish I had a faster > > connection'). > > If we remove one of the three conditions, the connection is oke. It is > > only tcp which is affected. > > A packet on its way from linux server to windows client seems to get > > dropped once and retransmitted. This makes the connection _very_ slow. > > :-( I hate these buggy systems. > > Does this patch below fix the performance problem and are the windows > clients win2000 or win95? Just a note however... this patch did fix the problem we were seeing with retransmits and Win95 compressed PPP and dialup over earthlink in the bay area. Now, if it didn't have the side effect of dropping packets left and right after ~4000 open connections (simultaneously), I could finally move our production system to 2.4.x. Jordan From owner-netdev@oss.sgi.com Wed Feb 21 15:55:27 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 15:55:18 -0800 Received: from pizda.ninka.net ([216.101.162.242]:30080 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 15:55:11 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA01487; Wed, 21 Feb 2001 15:52:37 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14996.21701.542448.49413@pizda.ninka.net> Date: Wed, 21 Feb 2001 15:52:37 -0800 (PST) To: Jordan Mendelson Cc: ookhoi@dds.nl, Vibol Hou , Linux-Kernel , sim@stormix.com, netdev@oss.sgi.com Subject: Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) In-Reply-To: <3A9453F4.993A9A74@napster.com> References: <20010221104723.C1714@humilis> <14995.40701.818777.181432@pizda.ninka.net> <3A9453F4.993A9A74@napster.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 748 Lines: 21 Jordan Mendelson writes: > Now, if it didn't have the side effect of dropping packets left and > right after ~4000 open connections (simultaneously), I could finally > move our production system to 2.4.x. There is no reason my patch should have this effect. All of this is what appears to be a bug in Windows TCP header compression, if the ID field of the IPv4 header does not change then it drops every other packet. The change I posted as-is, is unacceptable because it adds unnecessary cost to a fast path. The final change I actually use will likely involve using the TCP sequence numbers to calculate an "always changing" ID number in the IPv4 headers to placate these broken windows machines. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Feb 21 16:11:07 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 16:10:57 -0800 Received: from foobar.napster.com ([64.124.41.10]:48901 "EHLO foobar.napster.com") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 16:10:40 -0800 Received: from wagner.napster.com (mail.napster.com [63.108.185.112]) by foobar.napster.com (8.9.3/8.9.3) with ESMTP id QAA31544; Wed, 21 Feb 2001 16:10:39 -0800 Received: from napster.com (gw.napster.com [63.108.185.120]) by wagner.napster.com (8.9.3/8.9.3) with ESMTP id QAA13609; Wed, 21 Feb 2001 16:10:39 -0800 Message-ID: <3A9458FD.A87205FC@napster.com> Date: Wed, 21 Feb 2001 16:10:37 -0800 From: Jordan Mendelson Organization: Napster, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-ac17 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: ookhoi@dds.nl, Vibol Hou , Linux-Kernel , sim@stormix.com, netdev@oss.sgi.com Subject: Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) References: <20010221104723.C1714@humilis> <14995.40701.818777.181432@pizda.ninka.net> <3A9453F4.993A9A74@napster.com> <14996.21701.542448.49413@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 459 Lines: 15 "David S. Miller" wrote: > > Jordan Mendelson writes: > > Now, if it didn't have the side effect of dropping packets left and > > right after ~4000 open connections (simultaneously), I could finally > > move our production system to 2.4.x. > > There is no reason my patch should have this effect. My guess is that the fast path prevented the need for looking up the destination in some structure which is limited to ~4K entries (route table?). Jordan From owner-netdev@oss.sgi.com Wed Feb 21 16:50:58 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 16:50:39 -0800 Received: from foobar.napster.com ([64.124.41.10]:17680 "EHLO foobar.napster.com") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 16:50:15 -0800 Received: from wagner.napster.com (mail.napster.com [63.108.185.112]) by foobar.napster.com (8.9.3/8.9.3) with ESMTP id QAA04522; Wed, 21 Feb 2001 16:50:09 -0800 Received: from napster.com (gw.napster.com [63.108.185.120]) by wagner.napster.com (8.9.3/8.9.3) with ESMTP id QAA16793; Wed, 21 Feb 2001 16:50:09 -0800 Message-ID: <3A94623F.FB65BDF5@napster.com> Date: Wed, 21 Feb 2001 16:50:07 -0800 From: Jordan Mendelson Organization: Napster, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-ac17 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: ookhoi@dds.nl, Vibol Hou , Linux-Kernel , sim@stormix.com, netdev@oss.sgi.com Subject: Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) References: <20010221104723.C1714@humilis> <14995.40701.818777.181432@pizda.ninka.net> <3A9453F4.993A9A74@napster.com> <14996.21701.542448.49413@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 886 Lines: 24 "David S. Miller" wrote: > > Jordan Mendelson writes: > > Now, if it didn't have the side effect of dropping packets left and > > right after ~4000 open connections (simultaneously), I could finally > > move our production system to 2.4.x. > > The change I posted as-is, is unacceptable because it adds unnecessary > cost to a fast path. The final change I actually use will likely > involve using the TCP sequence numbers to calculate an "always > changing" ID number in the IPv4 headers to placate these broken > windows machines. Just for kicks I modified the fast path to use a globally incremented count to see if it would fix both Win9x problem and my 4K connection problem and it appears to be working just fine. What probably happened was the sheer number of packets at 4K connections without the fast path just slowed everything down to a crawl. Thanks Dave, Jordan From owner-netdev@oss.sgi.com Wed Feb 21 19:11:38 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 19:11:18 -0800 Received: from asbestos.linuxcare.com.au ([203.17.0.30]:1020 "EHLO halfway") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 19:11:04 -0800 Received: from halfway ([127.0.0.1] helo=linuxcare.com.au ident=rusty) by halfway with esmtp (Exim 3.22 #1 (Debian)) id 14Vm9L-0001iS-00; Thu, 22 Feb 2001 14:10:35 +1100 From: Rusty Russell To: Harald Welte Cc: netfilter-devel@us5.samba.org, netdev@oss.sgi.com Subject: Re: [PATCH] SO_ORIGINAL_DST and sockaddr_in In-reply-to: Your message of "Wed, 21 Feb 2001 16:22:53 BST." <20010221162253.B17431@coruscant.gnumonks.org> Date: Thu, 22 Feb 2001 14:10:35 +1100 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 903 Lines: 22 In message <20010221162253.B17431@coruscant.gnumonks.org> you write: > > Is there a point in allowing a too-big buffer? I know that > > getpeername() and getsockname() do, but it's an indication of an error > > on the user code, to me. > > Hm. This sounds like an issue of interpretation. I have the following > opinion: As long as there's enough space for netfilter/iptables to write > its data in: don't care. > > The reason of this check is to know we have enough space.. isn't it? Not really. You could just copy, and if it fails, return -EFAULT. I feel the point of that argument is to indicate the size of the buffer. We have a chance to catch coding errors; I feel the getsockname/getpeername approach is wrong (truncate results if too short, don't care if too long). Unless someone can come up with a compelling reason, why change? Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Wed Feb 21 20:46:59 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 20:46:39 -0800 Received: from gessami-r.puntoar.net.ar ([200.47.36.222]:28921 "HELO convert rfc822-to-8bit tinuviel.compendium.net.ar") by oss.sgi.com with SMTP id ; Wed, 21 Feb 2001 20:46:20 -0800 Received: by tinuviel.compendium.net.ar (Postfix, from userid 1000) id 0BA58196797; Thu, 22 Feb 2001 01:41:41 -0300 (ART) Date: Thu, 22 Feb 2001 01:41:41 -0300 To: Rusty Russell Cc: Harald Welte , netfilter-devel@us5.samba.org, netdev@oss.sgi.com Subject: Re: [PATCH] SO_ORIGINAL_DST and sockaddr_in Message-ID: <20010222014141.A7501@tinuviel.compendium.net.ar> References: <20010221162253.B17431@coruscant.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: 8BIT User-Agent: Mutt/1.3.12i In-Reply-To: ; from rusty@linuxcare.com.au on Thu, Feb 22, 2001 at 02:10:35PM +1100 x-attribution: HoraPe From: horape@tinuviel.compendium.net.ar Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1135 Lines: 33 ¡Hola! > > > Is there a point in allowing a too-big buffer? I know that > > > getpeername() and getsockname() do, but it's an indication of an error > > > on the user code, to me. > > Hm. This sounds like an issue of interpretation. I have the following > > opinion: As long as there's enough space for netfilter/iptables to write > > its data in: don't care. > > The reason of this check is to know we have enough space.. isn't it? > Not really. You could just copy, and if it fails, return -EFAULT. > I feel the point of that argument is to indicate the size of the > buffer. We have a chance to catch coding errors; I feel the > getsockname/getpeername approach is wrong (truncate results if too > short, don't care if too long). Unless someone can come up with a > compelling reason, why change? About truncating, i think like you, but for longer than needed it's ok to don't care and set namelen, because how is else the user know how big it is beforehand? (ie, different PF == different lens) > Rusty. HoraPe --- Horacio J. Peña horape@compendium.com.ar horape@uninet.edu bofh@puntoar.net.ar horape@hcdn.gov.ar From owner-netdev@oss.sgi.com Wed Feb 21 22:04:19 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 22:04:09 -0800 Received: from pizda.ninka.net ([216.101.162.242]:11648 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 22:03:56 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id WAA01411; Wed, 21 Feb 2001 22:01:53 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14996.43857.22794.280050@pizda.ninka.net> Date: Wed, 21 Feb 2001 22:01:53 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] Zerocopy BETA 2 against 2.4.2 final. X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 840 Lines: 24 Usual place: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2-1.diff.gz Besides merging to the 2.4.2-final release there are two bug fixes: 1) New TCP receive queue collapser could trigger assertion failures in tcp_recvmsg(), reason: uninitialized skb->used field in fresh SKB allocated for collapsing. 2) IP header IDs are generated differently on big vs. little endian systems, added htons() to fix. Some have asked why this isn't pushed to Alan for his AC patches yet, the reason is that I want to fully resolve the final few performance issues that remain (1.5K mtu on gbit still has some warts). Once those are cleared and everyone involved is satisfied that there are no performance regressions against vanilla 2.4.2, I will ask Alan to consider including it. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Feb 21 22:43:09 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 22:43:00 -0800 Received: from asbestos.linuxcare.com.au ([203.17.0.30]:49136 "EHLO halfway") by oss.sgi.com with ESMTP id ; Wed, 21 Feb 2001 22:42:55 -0800 Received: from halfway ([127.0.0.1] helo=linuxcare.com.au ident=rusty) by halfway with esmtp (Exim 3.22 #1 (Debian)) id 14VpST-0003TN-00; Thu, 22 Feb 2001 17:42:33 +1100 From: Rusty Russell To: horape@tinuviel.compendium.net.ar Subject: Re: [PATCH] SO_ORIGINAL_DST and sockaddr_in Cc: Harald Welte , netfilter-devel@us5.samba.org, netdev@oss.sgi.com In-reply-to: Your message of "Thu, 22 Feb 2001 01:41:41 -0300." <20010222014141.A7501@tinuviel.compendium.net.ar> Date: Thu, 22 Feb 2001 17:42:32 +1100 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 720 Lines: 17 In message <20010222014141.A7501@tinuviel.compendium.net.ar> you write: > > I feel the point of that argument is to indicate the size of the > > buffer. We have a chance to catch coding errors; I feel the > > getsockname/getpeername approach is wrong (truncate results if too > > short, don't care if too long). Unless someone can come up with a > > compelling reason, why change? > > About truncating, i think like you, but for longer than needed it's > ok to don't care and set namelen, because how is else the user know > how big it is beforehand? (ie, different PF == different lens) If you don't know what PF the socket is, how do you interpret the result? Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Wed Feb 21 22:47:39 2001 Received: by oss.sgi.com id ; Wed, 21 Feb 2001 22:47:20 -0800 Received: from gessami-r.puntoar.net.ar ([200.47.36.222]:64499 "HELO convert rfc822-to-8bit tinuviel.compendium.net.ar") by oss.sgi.com with SMTP id ; Wed, 21 Feb 2001 22:47:09 -0800 Received: by tinuviel.compendium.net.ar (Postfix, from userid 1000) id 53292196769; Thu, 22 Feb 2001 03:40:32 -0300 (ART) Date: Thu, 22 Feb 2001 03:40:32 -0300 To: Rusty Russell Cc: Harald Welte , netfilter-devel@us5.samba.org, netdev@oss.sgi.com Subject: Re: [PATCH] SO_ORIGINAL_DST and sockaddr_in Message-ID: <20010222034032.A9151@tinuviel.compendium.net.ar> References: <20010222014141.A7501@tinuviel.compendium.net.ar> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: 8BIT User-Agent: Mutt/1.3.12i In-Reply-To: ; from rusty@linuxcare.com.au on Thu, Feb 22, 2001 at 05:42:32PM +1100 x-attribution: HoraPe From: horape@tinuviel.compendium.net.ar Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 958 Lines: 26 ¡Hola! > > > I feel the point of that argument is to indicate the size of the > > > buffer. We have a chance to catch coding errors; I feel the > > > getsockname/getpeername approach is wrong (truncate results if too > > > short, don't care if too long). Unless someone can come up with a > > > compelling reason, why change? > > About truncating, i think like you, but for longer than needed it's > > ok to don't care and set namelen, because how is else the user know > > how big it is beforehand? (ie, different PF == different lens) > If you don't know what PF the socket is, how do you interpret the > result? getnameinfo... Normal user level code should not know what protocol it runs over. You should program AF independent code and let it run today on IPv4, tomorrow on IPv6 and in a remote time IPv7, CNLP+ or DecNet X. > Rusty. HoraPe --- Horacio J. Peña horape@compendium.com.ar horape@uninet.edu bofh@puntoar.net.ar horape@hcdn.gov.ar From owner-netdev@oss.sgi.com Thu Feb 22 00:04:11 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 00:04:02 -0800 Received: from gessami-r.puntoar.net.ar ([200.47.36.222]:2803 "HELO convert rfc822-to-8bit tinuviel.compendium.net.ar") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 00:03:36 -0800 Received: by tinuviel.compendium.net.ar (Postfix, from userid 1000) id B6EC5196795; Thu, 22 Feb 2001 04:56:19 -0300 (ART) Date: Thu, 22 Feb 2001 04:56:19 -0300 To: Rusty Russell Cc: Harald Welte , netfilter-devel@us5.samba.org, netdev@oss.sgi.com Subject: Re: [PATCH] SO_ORIGINAL_DST and sockaddr_in Message-ID: <20010222045619.A10048@tinuviel.compendium.net.ar> References: <20010222014141.A7501@tinuviel.compendium.net.ar> <20010222034032.A9151@tinuviel.compendium.net.ar> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: 8BIT User-Agent: Mutt/1.3.12i In-Reply-To: <20010222034032.A9151@tinuviel.compendium.net.ar>; from horape on Thu, Feb 22, 2001 at 03:40:32AM -0300 x-attribution: HoraPe From: horape@tinuviel.compendium.net.ar Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1161 Lines: 28 ¡Hola! > > > > I feel the point of that argument is to indicate the size of the > > > > buffer. We have a chance to catch coding errors; I feel the > > > > getsockname/getpeername approach is wrong (truncate results if too > > > > short, don't care if too long). Unless someone can come up with a > > > > compelling reason, why change? > > > About truncating, i think like you, but for longer than needed it's > > > ok to don't care and set namelen, because how is else the user know > > > how big it is beforehand? (ie, different PF == different lens) > > If you don't know what PF the socket is, how do you interpret the > > result? > getnameinfo... Normal user level code should not know what protocol it > runs over. You should program AF independent code and let it run today > on IPv4, tomorrow on IPv6 and in a remote time IPv7, CNLP+ or DecNet X. BTW, that's reasonable behaviour for get[sock/peer]name, but not for connect, sendto, etc... where it should check that namelen is the correct (linux doesn't do that, fbsd does) Saludos, HoraPe --- Horacio J. Peña horape@compendium.com.ar horape@uninet.edu bofh@puntoar.net.ar horape@hcdn.gov.ar From owner-netdev@oss.sgi.com Thu Feb 22 07:45:46 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 07:45:37 -0800 Received: from dns2.hardaker.davis.ca.us ([168.150.190.2]:522 "EHLO wanderer.hardakers.net") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 07:45:20 -0800 Received: (from hardaker@localhost) by wanderer.hardakers.net (8.9.3/8.9.3) id HAA06365; Thu, 22 Feb 2001 07:46:17 -0800 X-Authentication-Warning: wanderer.hardakers.net: hardaker set sender to wes@hardakers.net using -f To: Richard Guy Briggs Cc: Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list , Hugh Daniel , John Gilmore , Hugh Redelmeier , Henry Spencer Subject: Re: FreeS/WAN redesign thoughts (KLIPS, IPSEC) References: <20010221024203.H9886@grendel.conscoop.ottawa.on.ca> From: Wes Hardaker X-URL: http://dcas.ucdavis.edu/~hardaker Organization: Network Associates - NAI Labs X-Face: #qW^}a%m*T^{A:Cp}$R\"38+d}41-Z}uU8,r%F#c#s:~Nzp0G9](s?,K49KJ]s"*7gvRgA SrAvQc4@/}L7Qc=w{)]ACO\R{LF@S{pXfojjjGg6c;q6{~C}CxC^^&~(F]`1W)%9j/iS/ IM",B1M.?{w8ckLTYD'`|kTr\i\cgY)P4 Date: 22 Feb 2001 07:46:17 -0800 In-Reply-To: <20010221024203.H9886@grendel.conscoop.ottawa.on.ca> (Richard Guy Briggs's message of "Wed, 21 Feb 2001 02:42:03 -0500") Message-ID: User-Agent: Gnus/5.090001 (Oort Gnus v0.01) XEmacs/21.2 (Terspichore) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 585 Lines: 16 [lots of stuff deleted] Richard> Treat incoming IPSEC encapsulation as an enhancement of the Richard> layer 2 protocol and decapsulate it at the NF_IP_PRE_ROUTING Richard> hook. This option is less favourable as it stands since it Richard> involves creating our own SPDB engine. As long as the filtering rules of the linux kernel meet the minimum requirements put forth in section 4.4.1 of RFC2401 (Which describes the SPDB), then reusing the existing kernel infrastructure is probably a very good thing from purely a reuse standpoint. -- Wes Hardaker NAI Labs Network Associates From owner-netdev@oss.sgi.com Thu Feb 22 07:47:06 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 07:46:46 -0800 Received: from hq.pm.waw.pl ([195.116.170.10]:45834 "EHLO hq.pm.waw.pl") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 07:46:44 -0800 Received: (from uucp@localhost) by hq.pm.waw.pl with UUCP id f1MFkLP32461; Thu, 22 Feb 2001 16:46:21 +0100 Received: (from khc@localhost) by intrepid.pm.waw.pl (8.11.0/8.11.0) id f1MFfhx04343; Thu, 22 Feb 2001 16:41:43 +0100 To: netdev@oss.sgi.com Cc: Andi Kleen Subject: Re: net packet queue scheduler, packet_type and proto handlers References: <20010221223338.B1758@fred.local> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII From: Krzysztof Halasa Date: 22 Feb 2001 16:41:43 +0100 In-Reply-To: Andi Kleen's message of "Wed, 21 Feb 2001 22:33:38 +0100" Message-ID: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1677 Lines: 56 Andi Kleen writes: > > switch (skb->data[1]) { > > #ifdef CONFIG_INET > > case AX25_P_IP: > > Ugh. > > Looks like a bug in the AX25 layer. Well... Not a classic bug - misdesign probably. However, current packet scheduler just asks for things like that. > I think it's not nice to let the upper layer handle the retry in all cases, > although it's only needed in a few special cases. Using netif_rx for the > uncommon case of reexamining the packet is better, because you keep it out > of the fast path. Why is it better? It's slower, requires more operations as a whole (packets being handled twice). It probably decreases latency marginally for other devices. BTW: AX25 does exactly what I propose, except that it's done internally, and it's ugly. > I don't see the twice examining by taps as a problem, > it doesn't break anything as far as I know. Except taps which get packets twice, while there is just one packet - it takes time as well, doesn't it? However we could do that another way - probably better and certainly giving more options: inline void process_skb(skb) { ... look at ptype_base[] and call func(); } and then net_rx_action() would call that function to dispatch packets to individual handlers, which would have an option to: - netif_rx() the packet, or - call process_skb() It doesn't change net_rx_action() (except that we make a separate inline function from a part of it), and we could have a clean lower layer design. We could even change return value of packet_type->func() to 'void', and get rid of deliver_to_old_ones() to clean things up. Objections? -- Krzysztof Halasa Network Administrator From owner-netdev@oss.sgi.com Thu Feb 22 08:12:36 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 08:12:16 -0800 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:27915 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 08:11:54 -0800 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id BAA10478; Fri, 23 Feb 2001 01:11:23 +0900 To: netdev@oss.sgi.com CC: itojun@iijlab.net Subject: [SECURITY] Overrun in ipv4 option parsing (Fw: (usagi-users 00222) IPv4 option handling) X-Mailer: Mew version 1.94 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Fri_Feb_23_01:11:15_2001_563)--" Content-Transfer-Encoding: 7bit Message-Id: <20010223011122P.yoshfuji@ecei.tohoku.ac.jp> Date: Fri, 23 Feb 2001 01:11:22 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 990905(IM130) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 3612 Lines: 99 ----Next_Part(Fri_Feb_23_01:11:15_2001_563)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, itojun informed us that current linux 2.2.x and 2.4.x kernels have buffer-overrun bug in net/ipv4/ip_options.c. Here's the fix. Index: net/ipv4/ip_options.c =================================================================== RCS file: /cvsroot/usagi/usagi/kernel/linux24/net/ipv4/ip_options.c,v retrieving revision 1.1.1.3 diff -u -r1.1.1.3 ip_options.c --- net/ipv4/ip_options.c 2000/08/25 03:29:24 1.1.1.3 +++ net/ipv4/ip_options.c 2001/02/22 15:36:47 @@ -220,6 +220,8 @@ optptr++; continue; } + if (l < 2) + return; optlen = optptr[1]; if (optlen<2 || optlen>l) return; @@ -277,6 +279,10 @@ l--; optptr++; continue; + } + if (l < 2) { + pp_ptr = optptr; + goto error; } optlen = optptr[1]; if (optlen<2 || optlen>l) { -- Hideaki YOSHIFUJI @ USAGI Project PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 ----Next_Part(Fri_Feb_23_01:11:15_2001_563)-- Content-Type: Message/Rfc822 Content-Transfer-Encoding: 7bit Return-Path: Return-Path: Received: from linux6.nezu.wide.ad.jp (linux6.nezu.wide.ad.jp [203.178.142.218]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id MAA04961 for ; Wed, 21 Feb 2001 12:29:40 +0900 Received: from nezu.linux-ipv6.org (localhost [127.0.0.1]) by linux6.nezu.wide.ad.jp (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id MAA12613; Wed, 21 Feb 2001 12:25:39 +0900 X-Authentication-Warning: linux6.nezu.wide.ad.jp: Host localhost [127.0.0.1] claimed to be nezu.linux-ipv6.org Received: from coconut.itojun.org (coconut.itojun.org [210.160.95.97]) by linux6.nezu.wide.ad.jp (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id MAA12608 for ; Wed, 21 Feb 2001 12:25:38 +0900 Received: from kiwi.itojun.org (localhost.itojun.org [127.0.0.1]) by coconut.itojun.org (8.9.3+3.2W/3.7W) with ESMTP id MAA29800 for ; Wed, 21 Feb 2001 12:26:25 +0900 (JST) Date: Wed, 21 Feb 2001 12:26:25 +0900 From: itojun@iijlab.net Reply-To: usagi-users@linux-ipv6.org Subject: (usagi-users 00222) IPv4 option handling Sender: itojun@itojun.org To: usagi-users@linux-ipv6.org Message-Id: <29798.982725985@coconut.itojun.org> X-ML-Name: usagi-users X-Mail-Count: 00222 X-MLServer: fml [fml 3.0pl#17]; post only (anyone can post) X-ML-Info: If you have a question, please contact usagi-users-admin@linux-ipv6.org; X-Template-Reply-To: itojun@itojun.org X-Template-Return-Receipt-To: itojun@itojun.org X-PGP-Fingerprint: F8 24 B4 2C 8C 98 57 FD 90 5F B4 60 79 54 16 E2 Precedence: bulk Lines: 17 MIME-Version: 1.0 i'm not sure if this is the right forum to raise this, but anyway I have almost no idea about how linux community works... so this is the best thing I can try. if necessary please forward it to someone more appropriate. due to the complexity of IPv4 option specification, lots of systems (even openbsd!) makes buffer overrun while parsing it. i've checked usagi cvs repository (for linux 2.4.0) and it has mistakes too. we should also check for AH logic, but i could not find where is it. itojun net/ipv4/ip_options.c:ip_options_compile() net/ipv4/ip_options.c:ip_options_fragment() need to make sure that l > 1 before touching optptr[1]. ----Next_Part(Fri_Feb_23_01:11:15_2001_563)---- From owner-netdev@oss.sgi.com Thu Feb 22 08:30:37 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 08:30:27 -0800 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:29963 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 08:30:08 -0800 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id BAA10530 for ; Fri, 23 Feb 2001 01:29:56 +0900 To: netdev@oss.sgi.com Subject: [SECURITY] Overrun in net/ipv6/exthdrs.c X-Mailer: Mew version 1.94 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010223012955T.yoshfuji@ecei.tohoku.ac.jp> Date: Fri, 23 Feb 2001 01:29:55 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 990905(IM130) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1746 Lines: 76 Hi, We've found buffer overrun bug while parsing ipv6 extension headers in linux2{2,4}/net/ipv6/exthdrs.c. Here's the patch we've applied against our tree. Index: net/ipv6/exthdrs.c =================================================================== RCS file: /cvsroot/usagi/usagi/kernel/linux24/net/ipv6/exthdrs.c,v retrieving revision 1.6 retrieving revision 1.8 diff -u -r1.6 -r1.8 --- net/ipv6/exthdrs.c 2001/01/08 14:42:41 1.6 +++ net/ipv6/exthdrs.c 2001/01/10 23:45:34 1.8 @@ -1,4 +1,4 @@ -/* $USAGI: exthdrs.c,v 1.6 2001/01/08 14:42:41 yoshfuji Exp $ */ +/* $USAGI: exthdrs.c,v 1.8 2001/01/10 23:45:34 yoshfuji Exp $ */ /* * Extension Header handling for IPv6 @@ -106,6 +106,7 @@ struct tlvtype_proc *curr; u8 *ptr = skb->h.raw; int len = ((ptr[1]+1)<<3) - 2; + int optlen; ptr += 2; @@ -115,19 +116,31 @@ } while (len > 0) { - int optlen = ptr[1]+2; - switch (ptr[0]) { case IPV6_TLV_PAD0: optlen = 1; break; case IPV6_TLV_PADN: + if (len < 2) + goto bad; + optlen = ptr[1]+2; + if (len < optlen) + goto bad; break; - default: /* Other TLV code so scan list */ + default: + /* Other TLV code so scan list */ + if (len < 2) + goto bad; + optlen = ptr[1]+2; + if (len < optlen) + goto bad; for (curr=procs; curr->type >= 0; curr++) { if (curr->type == ptr[0]) { + /* type specific length/alignment + checks will be perfomed in the + func(). */ if (curr->func(skb, ptr) == 0) return 0; break; @@ -144,6 +157,7 @@ } if (len == 0) return 1; +bad: kfree_skb(skb); return 0; } -- Hideaki YOSHIFUJI @ USAGI Project PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Thu Feb 22 08:43:56 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 08:43:37 -0800 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:31755 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 08:43:13 -0800 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id BAA10567; Fri, 23 Feb 2001 01:43:01 +0900 To: netdev@oss.sgi.com CC: Hiroyuki YAMAMORI Subject: [OOPS] kernel panic due to bug in tcp_ipv6.c X-Mailer: Mew version 1.94 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010223014300W.yoshfuji@ecei.tohoku.ac.jp> Date: Fri, 23 Feb 2001 01:43:00 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 990905(IM130) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 3866 Lines: 104 Hi, We've found a rather silly bug in net/ipv6/tcp_ipv6.c in linux-2.4.x that causes kernel panic (reporeted by Hiroyuki YAMAMORI ) like this (this ksymopps log was created by ): ***** cut here ***** ksymoops 2.3.4 on i686 2.4.0-bSTABLE200102. Options used -v /usr/src/linux24-bSTABLE200102/vmlinux (specified) -K (specified) -L (specified) -O (specified) -m /usr/src/linux24-bSTABLE200102/System.map (specified) Deactivating swap... done. invalid operand: 0000 CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00000282 eax: 0000001b ebx: c1eb2000 ecx: c2aba000 edx: 00000001 esi: c2577554 edi: 7fffffff ebp: c1eb3e7c esp: c1eb3e58 ds: 0018 es: 0018 ss: 0018 Process ftpd (pid: 271, stackpage=c1eb3000) Stack: c01c9f35 c01ca096 000002b0 7fffffff c2577554 7fffffff 00000000 c2577520 c2bfb280 c1eb3ea0 c010fc5f c2577520 c2577554 c2577600 c2577650 c1eb3f14 00000286 c2577520 c2313ae4 c019fc18 c2577520 ffffff8d 7fffffff 00000000 Call Trace: [] [] [] [] [] [] [] [] [] [] Code: 0f 0b 8d 65 e8 5b 5e 5f 89 ec 5d c3 55 89 e5 83 ec 18 57 56 >>EIP; c0110054 <===== Trace; c010fc5f Trace; c019fc18 Trace; c019fdb7 Trace; c017881e Trace; c011dbf2 Trace; c011dbfb Trace; c011ce22 Trace; c010d125 Trace; c0179144 Trace; c0108e0f Code; c0110054 00000000 <_EIP>: Code; c0110054 <===== 0: 0f 0b ud2a <===== Code; c0110056 2: 8d 65 e8 lea 0xffffffe8(%ebp),%esp Code; c0110059 5: 5b pop %ebx Code; c011005a 6: 5e pop %esi Code; c011005b 7: 5f pop %edi Code; c011005c 8: 89 ec mov %ebp,%esp Code; c011005e a: 5d pop %ebp Code; c011005f b: c3 ret Code; c0110060 <__wake_up+0/130> c: 55 push %ebp Code; c0110061 <__wake_up+1/130> d: 89 e5 mov %esp,%ebp Code; c0110063 <__wake_up+3/130> f: 83 ec 18 sub $0x18,%esp Code; c0110066 <__wake_up+6/130> 12: 57 push %edi Code; c0110067 <__wake_up+7/130> 13: 56 push %esi Kernel panic: Aiee, killing interrupt handler! ***** cut here ***** Our fix was: Index: net/ipv6/tcp_ipv6.c =================================================================== RCS file: /cvsroot/usagi/usagi/kernel/linux24/net/ipv6/tcp_ipv6.c,v retrieving revision 1.22 retrieving revision 1.23 diff -u -r1.22 -r1.23 --- net/ipv6/tcp_ipv6.c 2001/01/05 06:00:59 1.22 +++ net/ipv6/tcp_ipv6.c 2001/02/18 01:44:46 1.23 @@ -1,4 +1,4 @@ -/* $USAGI: tcp_ipv6.c,v 1.22 2001/01/05 06:00:59 yoshfuji Exp $ */ +/* $USAGI: tcp_ipv6.c,v 1.23 2001/02/18 01:44:46 yoshfuji Exp $ */ /* * TCP over IPv6 @@ -458,7 +458,7 @@ struct sock *sk2, **skp; struct tcp_tw_bucket *tw; - write_lock(&head->lock); + write_lock_bh(&head->lock); for(skp = &(head + tcp_ehash_size)->chain; (sk2=*skp)!=NULL; skp = &sk2->next) { tw = (struct tcp_tw_bucket*)sk2; -- Hideaki YOSHIFUJI @ USAGI Project PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Thu Feb 22 08:56:16 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 08:55:56 -0800 Received: from ikar.t17.ds.pwr.wroc.pl ([156.17.210.253]:56591 "HELO ikar.t17.ds.pwr.wroc.pl") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 08:55:35 -0800 Received: by ikar.t17.ds.pwr.wroc.pl (Postfix+IPv6, from userid 1002) id 60525C8003; Thu, 22 Feb 2001 17:55:23 +0100 (CET) Date: Thu, 22 Feb 2001 17:55:23 +0100 From: Arkadiusz Miskiewicz To: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: why autoconfiguration work's while shouldn't ? :) Message-ID: <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> Mail-Followup-To: Arkadiusz Miskiewicz , netdev@oss.sgi.com, usagi-users@linux-ipv6.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i X-URL: http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ X-Operating-System: Linux dark 4.0.20 #119 Tue Jan 16 12:21:53 MET 2001 i986 pld Organization: Polish(ed) Linux Distribution Team Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1785 Lines: 56 I have small problem with linux kernels up to 2.4.1-pre1 (didn't tested never versions). My rc-scripts setup such routes: [root@arm conf]# /etc/rc.d/init.d/network start > /dev/null [root@arm conf]# ip -6 r 3ffe:8010:34:19::/64 dev eth0 proto kernel metric 256 mtu 1500 fe80::/10 dev eth0 proto kernel metric 256 mtu 1500 fe80::/10 via :: dev t17 proto kernel metric 256 mtu 1480 ff00::/8 dev eth0 proto kernel metric 256 mtu 1500 ff00::/8 dev t17 proto kernel metric 256 mtu 1480 default via fe80::9c11:d7f3 dev t17 metric 1024 mtu 1480 unreachable default dev lo metric -1 error -101 after a while when kernel tries autoconfiguration (Feb 22 17:51:30 arm kernel: eth0: no IPv6 routers present) [root@arm conf]# ip -6 r 3ffe:8010:34:19::/64 dev eth0 proto kernel metric 256 mtu 1500 fe80::/10 dev eth0 proto kernel metric 256 mtu 1500 fe80::/10 via :: dev t17 proto kernel metric 256 mtu 1480 ff00::/8 dev eth0 proto kernel metric 256 mtu 1500 ff00::/8 dev t17 proto kernel metric 256 mtu 1480 default dev eth0 proto kernel metric 256 mtu 1500 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ default via fe80::9c11:d7f3 dev t17 metric 1024 mtu 1480 unreachable default dev lo metric -1 error -101 [root@arm conf]# It adds few one default route which blows everything. Can someone explain me why such route is added? Another thing is that it tries autoconfiguration even when: [root@arm conf]# pwd; ls /proc/sys/net/ipv6/conf all default eth0 lo t17 [root@arm conf]# cat {all,default,eth0,t17}/{accept_ra,autoconf,router_solicitations} 0 0 0 0 0 0 0 0 0 0 0 0 [root@arm conf]# Bug or I'm missing something? -- Arkadiusz Mi¶kiewicz, AM2-6BONE [ PLD GNU/Linux IPv6 ] http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ [ enabled ] From owner-netdev@oss.sgi.com Thu Feb 22 09:11:56 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 09:11:47 -0800 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:35595 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 09:11:22 -0800 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id CAA10651; Fri, 23 Feb 2001 02:10:58 +0900 To: misiek@pld.ORG.PL Cc: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: why autoconfiguration work's while shouldn't ? :) In-Reply-To: <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> References: <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> X-Mailer: Mew version 1.94 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010223021057L.yoshfuji@ecei.tohoku.ac.jp> Date: Fri, 23 Feb 2001 02:10:57 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 990905(IM130) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 743 Lines: 21 In article <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> (at Thu, 22 Feb 2001 17:55:23 +0100), Arkadiusz Miskiewicz says: > after a while when kernel tries autoconfiguration > (Feb 22 17:51:30 arm kernel: eth0: no IPv6 routers present) : > default dev eth0 proto kernel metric 256 mtu 1500 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > It adds few one default route which blows everything. > Can someone explain me why such route is added? : > Bug or I'm missing something? This is not a bug. If there're no routers, we must assume all destination are on-link. (RFC2461 5.2) -- Hideaki YOSHIFUJI @ USAGI Project PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Thu Feb 22 09:28:16 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 09:28:07 -0800 Received: from ikar.t17.ds.pwr.wroc.pl ([156.17.210.253]:16656 "HELO ikar.t17.ds.pwr.wroc.pl") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 09:27:48 -0800 Received: by ikar.t17.ds.pwr.wroc.pl (Postfix+IPv6, from userid 1002) id 52E28C8003; Thu, 22 Feb 2001 18:27:32 +0100 (CET) Date: Thu, 22 Feb 2001 18:27:32 +0100 From: Arkadiusz Miskiewicz To: netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: why autoconfiguration work's while shouldn't ? :) Message-ID: <20010222182732.A17672@ikar.t17.ds.pwr.wroc.pl> Mail-Followup-To: Arkadiusz Miskiewicz , netdev@oss.sgi.com, usagi-users@linux-ipv6.org References: <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> <20010223021057L.yoshfuji@ecei.tohoku.ac.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: <20010223021057L.yoshfuji@ecei.tohoku.ac.jp>; from yoshfuji@ecei.tohoku.ac.jp on Fri, Feb 23, 2001 at 02:10:57AM +0900 X-URL: http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ X-Operating-System: Linux dark 4.0.20 #119 Tue Jan 16 12:21:53 MET 2001 i986 pld Organization: Polish(ed) Linux Distribution Team Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 716 Lines: 17 On/Dnia Fri, Feb 23, 2001 at 02:10:57AM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B wrote/napisa³(a) > > default dev eth0 proto kernel metric 256 mtu 1500 > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Bug or I'm missing something? > This is not a bug. > If there're no routers, we must assume all destination are on-link. > (RFC2461 5.2) Ah! :) And what about my second problem... it sends one RS even when conf/all/autoconf is 0 ? (18:23:14.263265 fe80::2e0:7dff:fe8a:4c79 > ff02::2: icmp6: router solicitation) > Hideaki YOSHIFUJI @ USAGI Project -- Arkadiusz Mi¶kiewicz, AM2-6BONE [ PLD GNU/Linux IPv6 ] http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ [ enabled ] From owner-netdev@oss.sgi.com Thu Feb 22 09:55:28 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 09:55:18 -0800 Received: from sith.mimuw.edu.pl ([193.0.97.1]:50948 "HELO sith.mimuw.edu.pl") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 09:54:58 -0800 Received: (qmail 3724 invoked by uid 1645); 22 Feb 2001 17:58:41 -0000 Date: Thu, 22 Feb 2001 18:58:41 +0100 From: Jan Rekorajski To: "YOSHIFUJI Hideaki / ?$B5HF#1QL@?\(B" Cc: misiek@pld.ORG.PL, netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Re: why autoconfiguration work's while shouldn't ? :) Message-ID: <20010222185841.A2563@sith.mimuw.edu.pl> Mail-Followup-To: Jan Rekorajski , "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" , misiek@pld.ORG.PL, netdev@oss.sgi.com, usagi-users@linux-ipv6.org References: <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> <20010223021057L.yoshfuji@ecei.tohoku.ac.jp> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="fUYQa+Pmc3FrFX/N" Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: <20010223021057L.yoshfuji@ecei.tohoku.ac.jp>; from yoshfuji@ecei.tohoku.ac.jp on Fri, Feb 23, 2001 at 02:10:57AM +0900 X-Operating-System: Linux 2.4.2-pre4 i686 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 5890 Lines: 185 --fUYQa+Pmc3FrFX/N Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Fri, 23 Feb 2001, YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B wrote: > In article <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> (at Thu, 22 Feb 2001 17:55:23 +0100), Arkadiusz Miskiewicz says: > > > after a while when kernel tries autoconfiguration > > (Feb 22 17:51:30 arm kernel: eth0: no IPv6 routers present) > : > > default dev eth0 proto kernel metric 256 mtu 1500 > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > It adds few one default route which blows everything. > > Can someone explain me why such route is added? > : > > Bug or I'm missing something? > > This is not a bug. > > If there're no routers, we must assume all destination are on-link. > (RFC2461 5.2) As far as I remember the RFC and my discussion about this topic with ANK, there is no MUST. We should or may add this default route, but also we may give the ability to turn this behaviour off. Some time ago I made a patch that adds /proc/sys/net/ipv6/conf/*/autoconf_route sysctl just for this purpose. As the patch is small I attach it here. Jan -- Jan Rêkorajski | ALL SUSPECTS ARE GUILTY. PERIOD! bagginsmimuw.edu.pl | OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY? BOFH, MANIAC | -- TROOPS by Kevin Rubio --fUYQa+Pmc3FrFX/N Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="autoconf_route.patch" diff -ur linux.orig/include/linux/sysctl.h linux.v6/include/linux/sysctl.h --- linux.orig/include/linux/sysctl.h Fri Sep 22 23:21:22 2000 +++ linux.v6/include/linux/sysctl.h Sat Oct 14 21:41:18 2000 @@ -342,10 +342,11 @@ NET_IPV6_ACCEPT_RA=4, NET_IPV6_ACCEPT_REDIRECTS=5, NET_IPV6_AUTOCONF=6, - NET_IPV6_DAD_TRANSMITS=7, - NET_IPV6_RTR_SOLICITS=8, - NET_IPV6_RTR_SOLICIT_INTERVAL=9, - NET_IPV6_RTR_SOLICIT_DELAY=10 + NET_IPV6_AUTOCONF_ROUTE=7, + NET_IPV6_DAD_TRANSMITS=8, + NET_IPV6_RTR_SOLICITS=9, + NET_IPV6_RTR_SOLICIT_INTERVAL=10, + NET_IPV6_RTR_SOLICIT_DELAY=11 }; /* /proc/sys/net//neigh/ */ diff -ur linux.orig/include/net/if_inet6.h linux.v6/include/net/if_inet6.h --- linux.orig/include/net/if_inet6.h Tue Sep 19 00:04:13 2000 +++ linux.v6/include/net/if_inet6.h Sat Oct 14 21:38:40 2000 @@ -82,6 +82,7 @@ int accept_ra; int accept_redirects; int autoconf; + int autoconf_route; int dad_transmits; int rtr_solicits; int rtr_solicit_interval; diff -ur linux.orig/net/ipv6/addrconf.c linux.v6/net/ipv6/addrconf.c --- linux.orig/net/ipv6/addrconf.c Wed May 3 10:48:03 2000 +++ linux.v6/net/ipv6/addrconf.c Sun Oct 15 00:59:36 2000 @@ -21,6 +21,8 @@ * Andi Kleen : kill doube kfree on module * unload. * Maciej W. Rozycki : FDDI support + * Jan Rekorajski : added autoconf_route sysctl + * */ #include @@ -96,30 +98,32 @@ struct ipv6_devconf ipv6_devconf = { - 0, /* forwarding */ - IPV6_DEFAULT_HOPLIMIT, /* hop limit */ - IPV6_MIN_MTU, /* mtu */ - 1, /* accept RAs */ - 1, /* accept redirects */ - 1, /* autoconfiguration */ - 1, /* dad transmits */ - MAX_RTR_SOLICITATIONS, /* router solicits */ - RTR_SOLICITATION_INTERVAL, /* rtr solicit interval */ - MAX_RTR_SOLICITATION_DELAY, /* rtr solicit delay */ + forwarding: 0, + hop_limit: IPV6_DEFAULT_HOPLIMIT, + mtu6: IPV6_MIN_MTU, + accept_ra: 1, + accept_redirects: 1, + autoconf: 1, + autoconf_route: 1, + dad_transmits: 1, + rtr_solicits: MAX_RTR_SOLICITATIONS, + rtr_solicit_interval: RTR_SOLICITATION_INTERVAL, + rtr_solicit_delay: MAX_RTR_SOLICITATION_DELAY, }; static struct ipv6_devconf ipv6_devconf_dflt = { - 0, /* forwarding */ - IPV6_DEFAULT_HOPLIMIT, /* hop limit */ - IPV6_MIN_MTU, /* mtu */ - 1, /* accept RAs */ - 1, /* accept redirects */ - 1, /* autoconfiguration */ - 1, /* dad transmits */ - MAX_RTR_SOLICITATIONS, /* router solicits */ - RTR_SOLICITATION_INTERVAL, /* rtr solicit interval */ - MAX_RTR_SOLICITATION_DELAY, /* rtr solicit delay */ + forwarding: 0, + hop_limit: IPV6_DEFAULT_HOPLIMIT, + mtu6: IPV6_MIN_MTU, + accept_ra: 1, + accept_redirects: 1, + autoconf: 1, + autoconf_route: 1, + dad_transmits: 1, + rtr_solicits: MAX_RTR_SOLICITATIONS, + rtr_solicit_interval: RTR_SOLICITATION_INTERVAL, + rtr_solicit_delay: MAX_RTR_SOLICITATION_DELAY, }; int ipv6_addr_type(struct in6_addr *addr) @@ -1430,15 +1434,17 @@ printk(KERN_DEBUG "%s: no IPv6 routers present\n", ifp->idev->dev->name); - memset(&rtmsg, 0, sizeof(struct in6_rtmsg)); - rtmsg.rtmsg_type = RTMSG_NEWROUTE; - rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; - rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_ADDRCONF | - RTF_DEFAULT | RTF_UP); + if (ifp->idev->cnf.autoconf_route) { + memset(&rtmsg, 0, sizeof(struct in6_rtmsg)); + rtmsg.rtmsg_type = RTMSG_NEWROUTE; + rtmsg.rtmsg_metric = IP6_RT_PRIO_ADDRCONF; + rtmsg.rtmsg_flags = (RTF_ALLONLINK | RTF_ADDRCONF | + RTF_DEFAULT | RTF_UP); - rtmsg.rtmsg_ifindex = ifp->idev->dev->ifindex; + rtmsg.rtmsg_ifindex = ifp->idev->dev->ifindex; - ip6_route_add(&rtmsg); + ip6_route_add(&rtmsg); + } } out: @@ -1883,7 +1889,7 @@ static struct addrconf_sysctl_table { struct ctl_table_header *sysctl_header; - ctl_table addrconf_vars[11]; + ctl_table addrconf_vars[12]; ctl_table addrconf_dev[2]; ctl_table addrconf_conf_dir[2]; ctl_table addrconf_proto_dir[2]; @@ -1912,6 +1918,10 @@ {NET_IPV6_AUTOCONF, "autoconf", &ipv6_devconf.autoconf, sizeof(int), 0644, NULL, + &proc_dointvec}, + + {NET_IPV6_AUTOCONF_ROUTE, "autoconf_route", + &ipv6_devconf.autoconf_route, sizeof(int), 0644, NULL, &proc_dointvec}, {NET_IPV6_DAD_TRANSMITS, "dad_transmits", --fUYQa+Pmc3FrFX/N-- From owner-netdev@oss.sgi.com Thu Feb 22 10:02:17 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 10:01:58 -0800 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:15628 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 10:01:43 -0800 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id DAA11996; Fri, 23 Feb 2001 03:01:01 +0900 To: usagi-users@linux-ipv6.org, baggins@sith.mimuw.edu.pl Cc: misiek@pld.org.pl, netdev@oss.sgi.com Subject: Re: (usagi-users 00236) Re: why autoconfiguration work's while shouldn't ? :) In-Reply-To: <20010222185841.A2563@sith.mimuw.edu.pl> References: <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> <20010223021057L.yoshfuji@ecei.tohoku.ac.jp> <20010222185841.A2563@sith.mimuw.edu.pl> X-Mailer: Mew version 1.94 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010223030100F.yoshfuji@linux-ipv6.org> Date: Fri, 23 Feb 2001 03:01:00 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 990905(IM130) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 505 Lines: 11 In article <20010222185841.A2563@sith.mimuw.edu.pl> (at Thu, 22 Feb 2001 18:58:41 +0100), Jan Rekorajski says: > Some time ago I made a patch that adds /proc/sys/net/ipv6/conf/*/autoconf_route > sysctl just for this purpose. As the patch is small I attach it here. Don't change values. And we uses 11 for other purpose (Node Information Queries). -- Hideaki YOSHIFUJI @ USAGI Project PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Thu Feb 22 10:15:58 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 10:15:48 -0800 Received: from sith.mimuw.edu.pl ([193.0.97.1]:64772 "HELO sith.mimuw.edu.pl") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 10:15:33 -0800 Received: (qmail 4682 invoked by uid 1645); 22 Feb 2001 18:19:15 -0000 Date: Thu, 22 Feb 2001 19:19:15 +0100 From: Jan Rekorajski To: "YOSHIFUJI Hideaki / ?$B5HF#1QL@?\(B" Cc: usagi-users@linux-ipv6.org, misiek@pld.org.pl, netdev@oss.sgi.com Subject: Re: (usagi-users 00236) Re: why autoconfiguration work's while shouldn't ? :) Message-ID: <20010222191915.B2563@sith.mimuw.edu.pl> Mail-Followup-To: Jan Rekorajski , "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" , usagi-users@linux-ipv6.org, misiek@pld.org.pl, netdev@oss.sgi.com References: <20010222175523.A17144@ikar.t17.ds.pwr.wroc.pl> <20010223021057L.yoshfuji@ecei.tohoku.ac.jp> <20010222185841.A2563@sith.mimuw.edu.pl> <20010223030100F.yoshfuji@linux-ipv6.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: <20010223030100F.yoshfuji@linux-ipv6.org>; from yoshfuji@linux-ipv6.org on Fri, Feb 23, 2001 at 03:01:00AM +0900 X-Operating-System: Linux 2.4.2-pre4 i686 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 874 Lines: 23 On Fri, 23 Feb 2001, YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B wrote: > In article <20010222185841.A2563@sith.mimuw.edu.pl> (at Thu, 22 Feb 2001 18:58:41 +0100), Jan Rekorajski says: > > > Some time ago I made a patch that adds /proc/sys/net/ipv6/conf/*/autoconf_route > > sysctl just for this purpose. As the patch is small I attach it here. > > Don't change values. Yes, that has been pointed to me. > And we uses 11 for other purpose (Node Information Queries). That's the problem of the order of patches :) I made mine against vanilla 2.4. Jan PS fixed patch at: ftp://sith.mimuw.edu.pl/pub/users/baggins/autoconf_route.patch -- Jan Rêkorajski | ALL SUSPECTS ARE GUILTY. PERIOD! bagginsmimuw.edu.pl | OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY? BOFH, MANIAC | -- TROOPS by Kevin Rubio From owner-netdev@oss.sgi.com Thu Feb 22 10:20:48 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 10:20:28 -0800 Received: from do-smtp.nortel-dasa.de ([193.141.241.40]:49052 "EHLO convert rfc822-to-8bit do-smtp.nortel-dasa.de") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 10:20:27 -0800 Received: from cw-smtp.nortel-dasa.de (mail [193.141.76.175]) by do-smtp.nortel-dasa.de (8.9.3+Sun/8.9.3) with ESMTP id TAA07104 for ; Thu, 22 Feb 2001 19:20:20 +0100 (MET) From: BERND.STURM@NDSatcom.com Received: (from smap@localhost) by cw-smtp.nortel-dasa.de (8.9.3+Sun/8.9.3) id TAA28846 for ; Thu, 22 Feb 2001 19:20:17 +0100 (MET) Received: from localhost(127.0.0.1) by cw-smtp via smap (V2.1) id xma028829; Thu, 22 Feb 01 19:20:12 +0100 Received: by SATCOMNT11 with Internet Mail Service (5.5.2653.19) id ; Thu, 22 Feb 2001 19:20:12 +0100 Message-ID: To: netdev@oss.sgi.com Subject: Linux TCP over Satellite Date: Thu, 22 Feb 2001 19:20:09 +0100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1751 Lines: 40 At the moment i´m doing my diploma thesis at Nortel Networks and we´re testing the performance of TCP over Satellite with both Linux Kernel Version 2.2.16 and 2.4.1. After long reading which TCP parameters and extensions are significant for the satellite performance of Linux i did a test today with the following parameters enabled in the proc-filesystem. tcp_sack enabled, tcp_window_scaling enabled, tcp_timestamps enabled (i suppose a value equal to 1 denotes that an option is enabled, right ?) Furthermore for 2.4.1: tcp_dsack enabled, tcp_fack enabled, tcp_ecn enabled. The window sizes were set as following: for 2.2.16: /proc/sys/net/core/rmem-default, rmem_max, wmem_default, wmem_max all of them : 262140 for 2.4.1: /proc/sys/net/ipv4/tcp_wmem, tcp_rmem all of them: 262140 262140 262140 (min, default, max) (with echo 262140 262140 262140 > /proc/sys...., i hope this is right) and /proc/sys/net/core/rmem-default, rmem_max, wmem_default, wmem_max all of them : 262140 Nevertheless when we did ftp-sessions between the 2 Linux-machines we never achieved a transfer rate of considerably more than 40 kByte over a 2 Mbit-satellite channel !!! A colleague of mine who did the same transfers with a Windows2000-machine was able to achieve transfer rates of allmost 150kByte. So what is wrong with my setup ? Actually with SACK enabled and Window Scaling enabled and the huge TCP window sizes I´ve specified performance should be much better. I would have expected about 1...1.5 MBit instead of 300 kBit ! I hope you find a failure of mine and will be able to help me with my problem. Thank you very much in advance ! Yours sincerely, Bernd Sturm Bernd Sturm ND Satcom phone: 0049-7545 / 96-8847 mailto:Bernd.Sturm@NDSatcom.com From owner-netdev@oss.sgi.com Thu Feb 22 11:09:39 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 11:09:30 -0800 Received: from mail.zmailer.org ([194.252.70.162]:35078 "EHLO zmailer.org") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 11:09:15 -0800 Received: (from localhost user: 'mea' uid#500 fake: STDIN (mea@zmailer.org)) by mail.zmailer.org id ; Thu, 22 Feb 2001 21:09:02 +0200 Date: Thu, 22 Feb 2001 21:09:02 +0200 From: Matti Aarnio To: BERND.STURM@NDSatcom.com Cc: netdev@oss.sgi.com Subject: Re: Linux TCP over Satellite Message-ID: <20010222210902.V15688@mea-ext.zmailer.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: ; from BERND.STURM@NDSatcom.com on Thu, Feb 22, 2001 at 07:20:09PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 3473 Lines: 83 On Thu, Feb 22, 2001 at 07:20:09PM +0100, BERND.STURM@NDSatcom.com wrote: > At the moment i´m doing my diploma thesis at Nortel Networks and we´re > testing the performance of TCP over Satellite with both Linux Kernel Version > 2.2.16 and 2.4.1. May I suggest that you pick the fresh number of USENIX ;login: magazine, and read its TCP-tuning article. (2001, number 1) > After long reading which TCP parameters and extensions are significant for > the satellite performance of Linux i did a test today with the following > parameters enabled in the proc-filesystem. > tcp_sack enabled, tcp_window_scaling enabled, tcp_timestamps enabled (i > suppose a value equal to 1 denotes that an option is enabled, right ?) > Furthermore for 2.4.1: tcp_dsack enabled, tcp_fack enabled, tcp_ecn enabled. ECN you don't need there. "1" is "enabled", "0" is disabled". > The window sizes were set as following: > for 2.2.16: /proc/sys/net/core/rmem-default, rmem_max, wmem_default, > wmem_max all of them : 262140 > for 2.4.1: /proc/sys/net/ipv4/tcp_wmem, tcp_rmem all of them: 262140 > 262140 262140 (min, default, max) (with echo 262140 262140 262140 > > /proc/sys...., i hope this is right) and /proc/sys/net/core/rmem-default, > rmem_max, wmem_default, wmem_max all of them : 262140 > > Nevertheless when we did ftp-sessions between the 2 Linux-machines we never > achieved a transfer rate of considerably more than 40 kByte over a 2 > Mbit-satellite channel !!! 2 Mbits over how long a delay ? You need to have the buffering to approach 2 * bandwidth * delay BYTES. There delay is the round-trip time for e.g. ping. Bandwidth is given as 2 000 000 bits/sec. Roughly 256 000 B/sec Lets presume 300 ms delay (of which some 250 ms are light-speed delay from earth to geosync orbit and back). The delay-bandwidth product is thus some 77 000 bytes. Buffering at sending _and_at_receiving_ systems must thus be at least some 77 kB per socket at the TCP level, and taking Linux socket space accounting rules into count, 154 000 is the minimum value. Now *both* of those values are over 64 kB, which is original TCP's maximum outstanding acknowledgement's window size, and indeed there are reasons why Linux is conservative to limit to mere 32 kB. It may be that the softwares you used did set the window too low with explicite int sndsize = 8192; setsockopt(skt, SOL_SOCKET, SO_SNDBUF, &sndsize, sizeof(sndsize) call -- or that the defaults really didn't take hold that easily. If you would do some tcpdump of the flowing traffic at each end of the link, you might get some additional insight on what is going on. > A colleague of mine who did the same transfers with a Windows2000-machine > was able to achieve transfer rates of allmost 150kByte. > So what is wrong with my setup ? Actually with SACK enabled and Window > Scaling enabled and the huge TCP window sizes I´ve specified performance > should be much better. I would have expected about 1...1.5 MBit instead of > 300 kBit ! With 2M link and Linux at both ends I would expect 2Mbit/sec speeds, presuming WSCALE really is active, and sender (and receiver) are in agreenment about the modes. > I hope you find a failure of mine and will be able to help me with my > problem. > Thank you very much in advance ! > > Yours sincerely, > > Bernd Sturm > ND Satcom > phone: 0049-7545 / 96-8847 > mailto:Bernd.Sturm@NDSatcom.com /Matti Aarnio From owner-netdev@oss.sgi.com Thu Feb 22 11:17:09 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 11:16:49 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:42765 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 11:16:38 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA03888; Thu, 22 Feb 2001 22:16:09 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102221916.WAA03888@ms2.inr.ac.ru> Subject: Re: [SECURITY] Overrun in ipv4 option parsing (Fw: (usagi-users 00222) To: yoshfuji@ecei.tohoku.ac.JP (YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) Date: Thu, 22 Feb 2001 22:16:09 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010223011122P.yoshfuji@ecei.tohoku.ac.jp> from "YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=" at Feb 22, 1 07:45:04 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 239 Lines: 9 Hello! > itojun informed us that current > linux 2.2.x and 2.4.x kernels have buffer-overrun bug in > net/ipv4/ip_options.c. Here's the fix. The fix apparently adds useless check. I see no buffer overrrun. Alexey From owner-netdev@oss.sgi.com Thu Feb 22 11:24:20 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 11:24:10 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:48397 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 11:23:55 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA03944; Thu, 22 Feb 2001 22:23:41 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102221923.WAA03944@ms2.inr.ac.ru> Subject: Re: [SECURITY] Overrun in net/ipv6/exthdrs.c To: yoshfuji@ecei.tohoku.ac.JP (YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) Date: Thu, 22 Feb 2001 22:23:41 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010223012955T.yoshfuji@ecei.tohoku.ac.jp> from "YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=" at Feb 22, 1 07:45:03 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 780 Lines: 34 Hello! > We've found buffer overrun bug while parsing ipv6 extension headers > in linux2{2,4}/net/ipv6/exthdrs.c. The patch, which you have sent some time ago (it contained also some fixes to mld etc.) has been merged. (Sorry, it is still not in main 2.4.2 tree). Does this new patch have some differences of older one? Alexey PS: > + if (len < 2) > + goto bad; > + optlen = ptr[1]+2; > + if (len < optlen) > + goto bad; The first check is useless, it is identity. We use the trick that each skb has space of 16 bytes behind its tail and allow references beyond end of packet to simplify parsing of objects containing length encoded in the first octets. objlen = ptr[N]; if (objlen < MIN_OBJLEN || objlen > TRUE_LEN) parse_error; is legal. Alexey From owner-netdev@oss.sgi.com Thu Feb 22 12:12:10 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 12:11:50 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:6670 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 12:11:24 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA04263; Thu, 22 Feb 2001 23:10:59 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102222010.XAA04263@ms2.inr.ac.ru> Subject: Re: [OOPS] kernel panic due to bug in tcp_ipv6.c To: yoshfuji@ecei.tohoku.ac.JP (YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) Date: Thu, 22 Feb 2001 23:10:59 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20010223014300W.yoshfuji@ecei.tohoku.ac.jp> from "YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=" at Feb 22, 1 07:45:04 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 197 Lines: 11 Hello! > - write_lock(&head->lock); > + write_lock_bh(&head->lock); Mama mia! The whole branch was never tested! Apparently, hash collisions never happened with Ipv6 here... Thank you! Alexey From owner-netdev@oss.sgi.com Thu Feb 22 13:39:40 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 13:39:22 -0800 Received: from cpu2747.adsl.bellglobal.com ([207.236.55.216]:13564 "EHLO grendel.conscoop.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 13:39:13 -0800 Received: (from rgb@localhost) by grendel.conscoop.ottawa.on.ca (8.11.1/8.11.1) id f1MLdPK21423; Thu, 22 Feb 2001 16:39:25 -0500 Date: Thu, 22 Feb 2001 16:39:25 -0500 From: Richard Guy Briggs To: Wes Hardaker Cc: Richard Guy Briggs , Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list , Hugh Daniel , John Gilmore , Hugh Redelmeier , Henry Spencer Subject: Re: FreeS/WAN redesign thoughts (KLIPS, IPSEC) Message-ID: <20010222163925.A21378@grendel.conscoop.ottawa.on.ca> References: <20010221024203.H9886@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from wes@hardakers.net on Thu, Feb 22, 2001 at 07:46:17AM -0800 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1510 Lines: 43 -----BEGIN PGP SIGNED MESSAGE----- On Thu, Feb 22, 2001 at 07:46:17AM -0800, Wes Hardaker wrote: > > [lots of stuff deleted] > > Richard> Treat incoming IPSEC encapsulation as an enhancement of the > Richard> layer 2 protocol and decapsulate it at the NF_IP_PRE_ROUTING > Richard> hook. This option is less favourable as it stands since it > Richard> involves creating our own SPDB engine. > > As long as the filtering rules of the linux kernel meet the minimum > requirements put forth in section 4.4.1 of RFC2401 (Which describes > the SPDB), then reusing the existing kernel infrastructure is probably > a very good thing from purely a reuse standpoint. The only matcher which is not yet implemented is 'security level', which is easy to do as a separate module when Linux actually understands the concept. Thanks! > Wes Hardaker > NAI Labs > Network Associates slainte mhath, RGB - -- Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: -----BEGIN PGP SIGNATURE----- Version: 2.6.3i Charset: noconv iQCVAwUBOpWHDd+sBuIhFagtAQFkYQQAia2F2XdshYMo+w9xx/J/RAWeymwkic+u 2f7nPVUWDAutkh+t49ok0+IqA4ImChjuYGMBTVViXE0U/0RyOFceSiknnZL3QbXa RFGFXKxgbHEZgmt6Yqj5DlqbR8LA+rK9tERYWZOO2/LtJvcCAqROVBhxJJBzTz2z TOyqlfF1odo= =yCD0 -----END PGP SIGNATURE----- From owner-netdev@oss.sgi.com Thu Feb 22 13:58:51 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 13:58:41 -0800 Received: from msgbas1tx.cos.agilent.com ([192.6.9.34]:23291 "HELO msgbas1t.cos.agilent.com") by oss.sgi.com with SMTP id ; Thu, 22 Feb 2001 13:58:32 -0800 Received: from msgrel1.cos.agilent.com (msgrel1.cos.agilent.com [130.29.152.77]) by msgbas1t.cos.agilent.com (Postfix) with ESMTP id 2B422116 for ; Thu, 22 Feb 2001 14:58:31 -0700 (MST) Received: from axcsbh4.cos.agilent.com (axcsbh4.cos.agilent.com [130.29.152.145]) by msgrel1.cos.agilent.com (Postfix) with SMTP id EBD5FF2 for ; Thu, 22 Feb 2001 14:58:30 -0700 (MST) Received: from 130.29.152.145 by axcsbh4.cos.agilent.com (InterScan E-Mail VirusWall NT); Thu, 22 Feb 2001 14:58:30 -0700 (Mountain Standard Time) Received: by axcsbh4.cos.agilent.com with Internet Mail Service (5.5.2653.19) id ; Thu, 22 Feb 2001 14:58:30 -0700 Message-ID: From: yiding_wang@agilent.com To: netdev@oss.sgi.com Subject: FW: Some questions regarding Linux kernel part of ethernet Date: Thu, 22 Feb 2001 14:58:29 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1368 Lines: 39 Hi Dave, Alan has recommended to get your advise on the possible additional path for TCP/IP under Linux to support our TCP/IP offload engine iSCSI approach. Following two questions are something I am looking for and hope you can help me on the issue. Basically we have designed TCP/IP offload engine which will by-pass standard OS TCP/IP layer. The engine suppose to do all task from TCP to NIC, including copy/checksum/sending/receiving, etc. My questions are specific related to Linux environment: 1, What could be the point we can plug in a module to let standard TCP/IP application goes though our special MAC driver instead of going through OS TCP/IP layer? What should I do to accomplish the goal? 2, If there is a way to add an additional path to make our offload TCP/IP engine solution work, please advice what I should do and what should the packet path flow be (compare with traditional user application -> socket -> TCP/IP -> MAC driver -> NIC) ? Following is Alan's suggestion: > If your engine can do the copy/checksum on receive/send that will be a win with > Dave Millers zero copy patches. Best place to ask about this lot is > Many thanks! Eddie (Yiding) Wang SND, Agilent Technologies 3175 Bowers Ave., MS 88C Santa Clara, Ca 95054 Tel: (408) 970-3059 Fax: (408) 970-3099 E-Mail: yiding_wang@agilent.com From owner-netdev@oss.sgi.com Thu Feb 22 20:14:13 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 20:14:04 -0800 Received: from mgw-x2.nokia.com ([131.228.20.22]:36529 "EHLO mgw-x2.nokia.com") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 20:13:52 -0800 Received: from esvir06nok.ntc.nokia.com (esvir06nokt.ntc.nokia.com [172.21.143.38]) by mgw-x2.nokia.com (Switch-2.1.0/Switch-2.1.0) with ESMTP id f1N4Dnf23710 for ; Fri, 23 Feb 2001 06:13:49 +0200 (EET) Received: from esebh03nok.ntc.nokia.com (unverified) by esvir06nok.ntc.nokia.com (Content Technologies SMTPRS 4.2.1) with ESMTP id ; Fri, 23 Feb 2001 06:13:34 +0200 Received: from tolnx04.europe.nokia.com ([172.24.106.61]) by esebh03nok.ntc.nokia.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.78) id FPK93557; Fri, 23 Feb 2001 06:13:32 +0200 Received: (from jepeters@localhost) by tolnx04.europe.nokia.com (8.11.2/8.11.2) id f1N4DPn01855; Fri, 23 Feb 2001 13:13:25 +0900 X-Authentication-Warning: tolnx04.europe.nokia.com: jepeters set sender to jens-ulrik.petersen@nokia.com using -f To: usagi-users@linux-ipv6.org Cc: Jens-Ulrik Petersen , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: usagi ipv6 and linux source References: <20010221223532.C1758@fred.local> From: Jens-Ulrik Petersen Date: 23 Feb 2001 13:13:24 +0900 Message-ID: User-Agent: Gnus/5.0807 (Gnus v5.8.7) XEmacs/21.2 =?ISO-8859-1?Q?(Peisino=1B,Ak=1B(B?=) MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2082 Lines: 49 Hi Andi, How you read the report of the TAHI conformance tests ? (It is apparently for linux-2.2, but probably some of the failures still persist in 2.4) "Andi Kleen" writes: > On Tue, Feb 20, 2001 at 06:56:36AM +0100, Jens-Ulrik Petersen wrote: > > It is my understanding that there are some serious flaws in the IPv6 > > implementation in linux-2.4. Perhaps I am mistaken, but from my own > > experience I find that Linux hosts seem to have problems with address > > autoconfiguration and don't seem to like router advertisements from > > FreeBSD-4 ipv6 routers. (Presumably there is a corresponding bug in > > the Linux "rtadvd"?) > When you see any specific problems you should send tcpdumps of the > incidents to the list. Then they can be fixed if they are real bugs. [I don't know if my own problems are related to the problems mentioned in the above document.] Hmmm, my comments above refer to my experience with earlier 2.4.0-test kernels (built by RH for their rawhide devel distribution). With the current Red Hat rawhide-20010206 2.4.0 kernel, I just see this in dmesg's output: IPv6 v0.8 for NET4.0 IPv6 over IPv4 tunneling driver IPv6 addrconf: prefix with wrong length 64 eth0: 1 multicast blocks dropped. inet6_ifa_finish_destroy My linux host is connected to a FreeBSD-4.1R IPv6 router: ping6 works fine between them with link-local addresses and they exchange neighbour solicitations and advertisements fine. However when I "ping ff02::1" from the router, there is no response from the linux host. Seems ipv6 multicast is not working/configured? (I have to check RH's kernel build configuration.) The Intel network card in the linux box is using the eepro100 driver. I can only comment that with Usagi's patch on the standard linux-2.4.0 source, router advertisements are accepted fine. (Is it possible to get an rpm of a built vanilla (unpatched) linux-2.4 kernel with working ipv6 support from somewhere? -- it would save some time.) Jens From owner-netdev@oss.sgi.com Thu Feb 22 23:02:14 2001 Received: by oss.sgi.com id ; Thu, 22 Feb 2001 23:02:04 -0800 Received: from pizda.ninka.net ([216.101.162.242]:16000 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 22 Feb 2001 23:01:41 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id WAA01278; Thu, 22 Feb 2001 22:59:16 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14998.2628.144784.585248@pizda.ninka.net> Date: Thu, 22 Feb 2001 22:59:16 -0800 (PST) To: linux-kernel@vger.kernel.org CC: netdev@oss.sgi.com Subject: [UPDATE] zerocopy BETA 3 X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 673 Lines: 24 Usual spot: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2-2.diff.gz Changes since last installment: 1) More errors in TCP receive queue collapser are discovered and fixed. 2) Several URG handling details on receive side are made more consistent and sane. 3) Workaround for win2000/95 VJ header compression bugs is implemented. 4) Update to latest 3c59x driver from Andrew, this should cure some link type detection problems. 5) IP conntrack fix from Rusty. Please test, to my knowledge the only issue remaining now are the gbit performance issues, which are being discussed by Pekka and Alexey. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Feb 23 02:39:45 2001 Received: by oss.sgi.com id ; Fri, 23 Feb 2001 02:39:25 -0800 Received: from sith.mimuw.edu.pl ([193.0.97.1]:11783 "HELO sith.mimuw.edu.pl") by oss.sgi.com with SMTP id ; Fri, 23 Feb 2001 02:39:05 -0800 Received: (qmail 27628 invoked by uid 1645); 23 Feb 2001 10:42:49 -0000 Date: Fri, 23 Feb 2001 11:42:49 +0100 From: Jan Rekorajski To: "David S. Miller" Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] zerocopy BETA 3 Message-ID: <20010223114249.A27608@sith.mimuw.edu.pl> Mail-Followup-To: Jan Rekorajski , "David S. Miller" , linux-kernel@vger.kernel.org, netdev@oss.sgi.com References: <14998.2628.144784.585248@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: <14998.2628.144784.585248@pizda.ninka.net>; from davem@redhat.com on Thu, Feb 22, 2001 at 10:59:16PM -0800 X-Operating-System: Linux 2.4.2-pre4 i686 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 563 Lines: 20 On Thu, 22 Feb 2001, David S. Miller wrote: > > Usual spot: > > ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.2-2.diff.gz > > Changes since last installment: > > 3) Workaround for win2000/95 VJ header compression bugs is > implemented. Could you please make a patch with this fix only? Or is it available somewhere? Jan -- Jan Rêkorajski | ALL SUSPECTS ARE GUILTY. PERIOD! bagginsmimuw.edu.pl | OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY? BOFH, MANIAC | -- TROOPS by Kevin Rubio From owner-netdev@oss.sgi.com Fri Feb 23 15:00:01 2001 Received: by oss.sgi.com id ; Fri, 23 Feb 2001 14:59:41 -0800 Received: from citadel.myri.com ([199.120.212.1]:9856 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 23 Feb 2001 14:59:28 -0800 Received: from frisbee.myri.com (frisbee.myri.com [199.120.212.209]) by myri.com (8.9.3+Sun/8.9.1) with ESMTP id OAA21424 for ; Fri, 23 Feb 2001 14:59:27 -0800 (PST) Received: (from feldy@localhost) by frisbee.myri.com (8.9.3/8.9.1) id OAA20943 for netdev@oss.sgi.com; Fri, 23 Feb 2001 14:59:25 -0800 Date: Fri, 23 Feb 2001 14:59:25 -0800 From: Bob Felderman Message-Id: <200102232259.OAA20943@frisbee.myri.com> To: netdev@oss.sgi.com Subject: possible bug x86 2.4.2 SMP in IP receive stack Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 3843 Lines: 151 To: linux-kernel@vger.kernel.org Subject: possible bug x86 2.4.2 SMP in IP receive stack With dual x86 processors running 2.4.2, if I blast a UDP stream at the machine using netperf, I can easily cause the kernel to panic with the message below. Feb 23 12:42:30 rcc2 kernel: Warning: kfree_skb passed an skb still on a list (from c01f58dc). I'm going to pop out one processor on the receiver and see if that makes the problem go away. [I have since done this and I can not make a single processor system fail. It is still running exactly the same kernel.] Note that this is using a Myrinet network that is able to get more than 1.5Gigabit/sec UDP transfers on single-processor x86 2.4.0 linux. Perhaps this is reproducible with good GigE cards with jumbo MTU turned on. I'm also upping the socket limits echo "1048576" > /proc/sys/net/core/rmem_max echo "1048576" > /proc/sys/net/core/wmem_max echo "1048576" > /proc/sys/net/core/wmem_default echo "1048576" > /proc/sys/net/core/rmem_default echo "1048576" > /proc/sys/net/core/optmem_max Feb 23 12:42:30 rcc2 kernel: Warning: kfree_skb passed an skb still on a list (from c01f58dc). Looking up the "from c01f58dc" in the ksyms shows that ip_rcv is the caller. c01f3d38 ip_route_input_Rsmp_0a99f032 c01f44f8 ip_route_output_key_Rsmp_4ce6fe49 c01f5170 inet_add_protocol_Rsmp_a27098bd c01f51f0 inet_del_protocol_Rsmp_0c8ae503 c01f5538 ip_rcv_Rsmp_587335e5 c01f58dc ERROR LOCATION (kfree_skb passed an skb still on a list (from c01f58dc)) c01f61dc ip_defrag_Rsmp_5532f3a2 c01f6b34 ip_options_compile_Rsmp_b8621391 c01f70ec ip_options_undo_Rsmp_9721f12f c01f8650 ip_fragment_Rsmp_41bc67d3 c01f8bb0 ip_send_check_Rsmp_a37b7441 c01f8bf8 ip_finish_output_Rsmp_5b565e28 On a different machine I have seen this. Feb 23 12:32:20 rcc kernel: KERNEL: assertion (del_timer(&qp->timer) == 0) failed at ip_fragment.c(163):ip_frag_destroy CONFIG_X86=y CONFIG_ISA=y CONFIG_UID16=y CONFIG_EXPERIMENTAL=y CONFIG_MODULES=y CONFIG_MODVERSIONS=y CONFIG_KMOD=y CONFIG_MPENTIUMIII=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_CMPXCHG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_PGE=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_NOHIGHMEM=y CONFIG_SMP=y CONFIG_HAVE_DEC_LOCK=y CONFIG_NET=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_NAMES=y CONFIG_HOTPLUG=y CONFIG_SYSVIPC=y CONFIG_SYSCTL=y CONFIG_KCORE_ELF=y CONFIG_BINFMT_AOUT=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=y CONFIG_PM=y CONFIG_ACPI=y CONFIG_PNP=y CONFIG_ISAPNP=y CONFIG_BLK_DEV_FD=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y CONFIG_BLK_DEV_IDECD=y CONFIG_BLK_DEV_CMD640=y CONFIG_BLK_DEV_RZ1000=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_IDE_MODES=y CONFIG_NETDEVICES=y CONFIG_NET_ETHERNET=y CONFIG_NET_PCI=y CONFIG_EEPRO100=y CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_SERIAL=y CONFIG_UNIX98_PTYS=y CONFIG_MOUSE=y CONFIG_PSMOUSE=y CONFIG_DRM=y CONFIG_DRM_TDFX=y CONFIG_AUTOFS_FS=y CONFIG_AUTOFS4_FS=y CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_ISO9660_FS=y CONFIG_PROC_FS=y CONFIG_DEVPTS_FS=y CONFIG_EXT2_FS=y CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_NFSD=y CONFIG_NFSD_V3=y CONFIG_SUNRPC=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_MSDOS_PARTITION=y CONFIG_NLS=y CONFIG_VGA_CONSOLE=y ------------------------------------------------------------------ Bob Felderman (626) 821-5555 Director of Software Development (626) 821-5316 fax Myricom Inc. feldy@myri.com 325 N. Santa Anita Ave. http://www.myri.com Arcadia, CA 91006 ------------------------------------------------------------------ From owner-netdev@oss.sgi.com Sat Feb 24 15:25:50 2001 Received: by oss.sgi.com id ; Sat, 24 Feb 2001 15:25:40 -0800 Received: from mail3.atl.bellsouth.net ([205.152.0.38]:59623 "EHLO mail3.atl.bellsouth.net") by oss.sgi.com with ESMTP id ; Sat, 24 Feb 2001 15:25:18 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by mail3.atl.bellsouth.net (3.3.5alt/0.75.2) with ESMTP id SAA24455; Sat, 24 Feb 2001 18:25:17 -0500 (EST) Message-ID: <3A9842DC.B42ECD7A@mandrakesoft.com> Date: Sat, 24 Feb 2001 18:25:16 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: Linux Knernel Mailing List Subject: New net features for added performance Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2692 Lines: 72 Disclaimer: This is 2.5, repeat, 2.5 material. I've talked about the following items with a couple people on this list in private. I wanted to bring these up again, to see if anyone has comments on the following suggested netdevice changes for the upcoming 2.5 development series of kernels. 1) Rx Skb recycling. It would be nice to have skbs returned to the driver after the net core is done with them, rather than have netif_rx free the skb. Many drivers pre-allocate a number of maximum-sized skbs into which the net card DMA's data. If netif_rx returned the SKB instead of freeing it, the driver could simply flip the DescriptorOwned bit for that buffer, giving it immediately back to the net card. Advantages: A de-allocation immediately followed by a reallocation is eliminated, less L1 cache pollution during interrupt handling. Potentially less DMA traffic between card and host. Disadvantages? 2) Tx packet grouping. If the net core has knowledge that more packets will be following the current one being sent to dev->hard_start_xmit(), it should pass that knowledge on to dev->hard_start_xmit(), either as an estimated number yet-to-be-sent, or just as a flag that "more is coming." Advantages: This lets the net driver make smarter decisions about Tx interrupt mitigation, Tx buffer queueing, etc. Disadvantages? Can this sort of knowledge be obtained by a netdevice right now, without any kernel modifications? 3) Slabbier packet allocation. Even though skb allocation is decently fast, you are still looking at an skb buffer head grab and a kmalloc, for each [dev_]alloc_skb call. I was wondering if it would be possible to create a helper function for drivers which would improve the hot-path considerably: static struct skbuff *ether_alloc_skb (int size) { if (size >= preallocated_skb_list->skb->size) { dequeue_skb_from_list() if (preallocate_size < low_water_limit) schedule_tasklet(refill_skb_list); return skb; } return dev_alloc_skb(size); } The skbs from this list would be allocated by a tasklet in the background to the maximum size requested by the ethernet driver. If you wanted to waste even more memory, you could allocate from per-CPU lists.. Disadvantages? Doing this might increase cache pollution due to increased code and data size, but I think the hot path is much improved (dequeue a properly sized, initialized, skb-reserved'd skb off a list) and would help mitigate the impact of sudden bursts of traffic. -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Sat Feb 24 16:13:00 2001 Received: by oss.sgi.com id ; Sat, 24 Feb 2001 16:12:40 -0800 Received: from ns.suse.de ([213.95.15.193]:58893 "HELO Cantor.suse.de") by oss.sgi.com with SMTP id ; Sat, 24 Feb 2001 16:12:20 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 8D4F81E111; Sun, 25 Feb 2001 01:12:15 +0100 (MET) Date: Sun, 25 Feb 2001 01:12:11 +0100 From: Andi Kleen To: Jeff Garzik Cc: Andi Kleen , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: New net features for added performance Message-ID: <20010225011211.A23853@gruyere.muc.suse.de> References: <3A9842DC.B42ECD7A@mandrakesoft.com> <3A984BDA.190B4D8E@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A984BDA.190B4D8E@mandrakesoft.com>; from jgarzik@mandrakesoft.com on Sat, Feb 24, 2001 at 07:03:38PM -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2049 Lines: 48 On Sat, Feb 24, 2001 at 07:03:38PM -0500, Jeff Garzik wrote: > Andi Kleen wrote: > > > > Jeff Garzik writes: > > > > > Advantages: A de-allocation immediately followed by a reallocation is > > > eliminated, less L1 cache pollution during interrupt handling. > > > Potentially less DMA traffic between card and host. > > > > > > Disadvantages? > > > > You need a new mechanism to cope with low memory situations because the > > drivers can tie up quite a bit of memory (in fact you gave up unified > > memory management). > > I think you misunderstand.. netif_rx frees the skb. In this example: > > netif_rx(skb); /* free skb of size PKT_BUF_SZ */ > skb = dev_alloc_skb(PKT_BUF_SZ) > > an alloc of a PKT_BUF_SZ'd skb immediately follows a free of a > same-sized skb. 100% of the time. Free/Alloc gives the mm the chance to throttle it by failing, and also to recover from fragmentation by packing the slabs. If you don't do it you need to add a hook somewhere that gets triggered on low memory situations and frees the buffers. > > 4) Better support for aligned RX by only copying the header, no the whole > > packet, to end up with an aligned IP header. Unless the driver knows about > > all protocol lengths this means the stack needs to support "parse header > > in this buffer, then switch to other buffer with computed offset for data" > > This requires scatter-gather hardware support, right? If so, would this > support only exist for checksumming hardware -- like the current > zerocopy -- or would non-checksumming SG hardware like tulip be > supported too? It doesn't need any hardware support. In fact it is especially helpful for the tulip. The idea is that instead of copying the whole packet to get an aligned header (e.g. on the alpha or other boxes where unaligned accesses are very expensive) you just copy the first 128 byte that probably contain the header. For the data it doesn't matter much if it's unaligned; copy_to_user and csum_copy_to_user can deal with that fine. -Andi From owner-netdev@oss.sgi.com Sat Feb 24 16:17:00 2001 Received: by oss.sgi.com id ; Sat, 24 Feb 2001 16:16:50 -0800 Received: from ns.suse.de ([213.95.15.193]:25870 "HELO Cantor.suse.de") by oss.sgi.com with SMTP id ; Sat, 24 Feb 2001 16:16:43 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id C27C61E111; Sun, 25 Feb 2001 01:16:41 +0100 (MET) Date: Sun, 25 Feb 2001 01:16:40 +0100 From: Andi Kleen To: Jeff Garzik Cc: Andi Kleen , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: New net features for added performance Message-ID: <20010225011640.A23953@gruyere.muc.suse.de> References: <3A9842DC.B42ECD7A@mandrakesoft.com> <3A984BDA.190B4D8E@mandrakesoft.com> <3A984E1A.DF67E730@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A984E1A.DF67E730@mandrakesoft.com>; from jgarzik@mandrakesoft.com on Sat, Feb 24, 2001 at 07:13:14PM -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 744 Lines: 17 On Sat, Feb 24, 2001 at 07:13:14PM -0500, Jeff Garzik wrote: > Sorry... I should also point out that I was thinking of tulip > architecture and similar architectures, where you have a fixed number of > Skbs allocated at all times, and that number doesn't change for the > lifetime of the driver. > > Clearly not all cases would benefit from skb recycling, but there are a > number of rx-ring-based systems where this would be useful, and (AFAICS) > reduce the work needed to be done by the system, and reduce the amount > of overall DMA traffic by a bit. A simple way to do it currently is just to compare the new skb with the old one. If it is the same, do a shortcut. That should usually work out when the system has enough memory. -Andi From owner-netdev@oss.sgi.com Sat Feb 24 18:37:44 2001 Received: by oss.sgi.com id ; Sat, 24 Feb 2001 18:37:35 -0800 Received: from tomts7.bellnexxia.net ([209.226.175.40]:59576 "EHLO tomts7-srv.bellnexxia.net") by oss.sgi.com with ESMTP id ; Sat, 24 Feb 2001 18:37:17 -0800 Received: from coplanar.net ([64.230.144.142]) by tomts7-srv.bellnexxia.net (InterMail vM.4.01.03.16 201-229-121-116-20010115) with ESMTP id <20010225023711.KBNR757.tomts7-srv.bellnexxia.net@coplanar.net>; Sat, 24 Feb 2001 21:37:11 -0500 Message-ID: <3A986EDB.363639E7@coplanar.net> Date: Sat, 24 Feb 2001 21:32:59 -0500 From: Jeremy Jackson X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance References: <3A9842DC.B42ECD7A@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 580 Lines: 14 Jeff Garzik wrote: (about optimizing kernel network code for busmastering NIC's) > Disclaimer: This is 2.5, repeat, 2.5 material. Related question: are there any 100Mbit NICs with cpu's onboard? Something mainstream/affordable?(i.e. not 1G ethernet) Just recently someone posted asking some technical question about ARMlinux for and intel card with 2 1G ports, 8 100M ports, an onboard ARM cpu and 4 other uControllers... seems to me that ultimately the networking code should go in that direction: immagine having the *NIC* do most of this... no cache pollution problems... From owner-netdev@oss.sgi.com Sat Feb 24 18:38:45 2001 Received: by oss.sgi.com id ; Sat, 24 Feb 2001 18:38:34 -0800 Received: from mx1.eskimo.com ([204.122.16.48]:45319 "EHLO mx1.eskimo.com") by oss.sgi.com with ESMTP id ; Sat, 24 Feb 2001 18:38:18 -0800 Received: from eskimo.com (klevin@eskimo.com [204.122.16.13]) by mx1.eskimo.com (8.9.1a/8.8.8) with ESMTP id SAA24670; Sat, 24 Feb 2001 18:38:12 -0800 Received: from localhost (klevin@localhost) by eskimo.com (8.9.1a/8.9.1) with SMTP id SAA03720; Sat, 24 Feb 2001 18:38:12 -0800 (PST) X-Authentication-Warning: eskimo.com: klevin owned process doing -bs Date: Sat, 24 Feb 2001 18:38:12 -0800 (PST) From: Noah Romer To: Jeff Garzik cc: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance In-Reply-To: <3A9842DC.B42ECD7A@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2812 Lines: 56 On Sat, 24 Feb 2001, Jeff Garzik wrote: > Disclaimer: This is 2.5, repeat, 2.5 material. [snip] > 1) Rx Skb recycling. It would be nice to have skbs returned to the > driver after the net core is done with them, rather than have netif_rx > free the skb. Many drivers pre-allocate a number of maximum-sized skbs > into which the net card DMA's data. If netif_rx returned the SKB > instead of freeing it, the driver could simply flip the DescriptorOwned > bit for that buffer, giving it immediately back to the net card. > > Advantages: A de-allocation immediately followed by a reallocation is > eliminated, less L1 cache pollution during interrupt handling. > Potentially less DMA traffic between card and host. This could be quite useful for the network driver I maintain (it's made it to the -ac patch set for 2.4, but not yet into the main kernel tarball). At the momement, it allocates 127 "buckets" (skb's under linux) at start of day and posts them to the card. After that, it maintains a minimum of 80 data buffers available to the card at any one time. There's a noticable performance hit when the driver has to reallocate new skbs to keep above the threshold. I try to recycle as much as possible w/in the driver (i.e. really small incoming packets get a new skb allocated for them and the original buffer is put back on the queue), but it would be nice to be able to recycle even more of the skbs. > Disadvantages? As has been pointed out, there's a certain loss of control over allocation of memory (could check for low memory conditions before sending the skb back to the driver, but . . .). I do see a failure to allocate all 127 skbs, occasionally, when the driver is first loaded (only way to get around this is to reboot the system). > 2) Tx packet grouping. If the net core has knowledge that more packets > will be following the current one being sent to dev->hard_start_xmit(), > it should pass that knowledge on to dev->hard_start_xmit(), either as an > estimated number yet-to-be-sent, or just as a flag that "more is > coming." > > Advantages: This lets the net driver make smarter decisions about Tx > interrupt mitigation, Tx buffer queueing, etc. > > Disadvantages? Can this sort of knowledge be obtained by a netdevice > right now, without any kernel modifications? In my experience, Tx interrupt mitigation is of little benefit. I actually saw a performance increase of ~20% when I turned off Tx interrupt mitigation in my driver (could have been poor implementation on my part). -- Noah Romer |"Calm down, it's only ones and zeros." - this message klevin@eskimo.com |brought to you by The Network PGP key available |"Time will have its say, it always does." - Celltrex by finger or email |from Flying to Valhalla by Charles Pellegrino From owner-netdev@oss.sgi.com Sat Feb 24 19:24:55 2001 Received: by oss.sgi.com id ; Sat, 24 Feb 2001 19:24:45 -0800 Received: from f00f.stub.clear.net.nz ([203.167.224.51]:57101 "HELO metastasis.f00f.org") by oss.sgi.com with SMTP id ; Sat, 24 Feb 2001 19:24:27 -0800 Received: by metastasis.f00f.org (Postfix, from userid 1000) id EB9B7A59F; Sun, 25 Feb 2001 16:23:57 +1300 (NZDT) Date: Sun, 25 Feb 2001 16:23:57 +1300 From: Chris Wedgwood To: Jeremy Jackson Cc: Jeff Garzik , netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance Message-ID: <20010225162357.A12123@metastasis.f00f.org> References: <3A9842DC.B42ECD7A@mandrakesoft.com> <3A986EDB.363639E7@coplanar.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A986EDB.363639E7@coplanar.net>; from jerj@coplanar.net on Sat, Feb 24, 2001 at 09:32:59PM -0500 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 857 Lines: 23 On Sat, Feb 24, 2001 at 09:32:59PM -0500, Jeremy Jackson wrote: Related question: are there any 100Mbit NICs with cpu's onboard? Yes, but the only ones I've seen to date are magic and do special things (like VPN or hardware crypto). I'm not sure without 'magic' requirements there is much point for 100M on modern hardware. Not affordable and whilst moving some of the IP stack onto the card (I think this is what are alluding to) would be extremely non-trivial especially if you want all the components (host OS, multiple networks cards) to talk to each other asynchronously and you would all have to deal with buggy hardware that doesn't like doing PCI-PCI transfers and such like. That said, it would be an extemely neat thing to do from a technical perspective, but I don't know if you would ever get really good performance from it. --cw From owner-netdev@oss.sgi.com Sat Feb 24 19:39:25 2001 Received: by oss.sgi.com id ; Sat, 24 Feb 2001 19:39:15 -0800 Received: from f00f.stub.clear.net.nz ([203.167.224.51]:57869 "HELO metastasis.f00f.org") by oss.sgi.com with SMTP id ; Sat, 24 Feb 2001 19:38:49 -0800 Received: by metastasis.f00f.org (Postfix, from userid 1000) id 5F97DA59F; Sun, 25 Feb 2001 16:38:36 +1300 (NZDT) Date: Sun, 25 Feb 2001 16:38:36 +1300 From: Chris Wedgwood To: Jan Rekorajski , "David S. Miller" , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] zerocopy BETA 3 Message-ID: <20010225163836.A12173@metastasis.f00f.org> References: <14998.2628.144784.585248@pizda.ninka.net> <20010223114249.A27608@sith.mimuw.edu.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010223114249.A27608@sith.mimuw.edu.pl>; from baggins@sith.mimuw.edu.pl on Fri, Feb 23, 2001 at 11:42:49AM +0100 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1283 Lines: 38 On Fri, Feb 23, 2001 at 11:42:49AM +0100, Jan Rekorajski wrote: Could you please make a patch with this fix only? Or is it available somewhere? --- linux-2.4.2/include/net/ip.h Sun Feb 25 01:15:19 2001 +++ linux-2.4.2+zc-2/include/net/ip.h Sun Feb 25 01:53:52 2001 @@ -188,11 +188,16 @@ extern void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst); -static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst) +static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst, struct sock *sk) { - if (iph->frag_off&__constant_htons(IP_DF)) - iph->id = 0; - else + if (iph->frag_off&__constant_htons(IP_DF)) { + /* This is only to work around buggy Windows95/2000 + * VJ compression implementations. If the ID field + * does not change, they drop every other packet in + * a TCP stream using header compression. + */ + iph->id = (sk ? sk->protinfo.af_inet.id++ : 0); + } else __ip_select_ident(iph, dst); } FWIW; I am still seeing _really_ bad throughput on a 10M ethernet segment between 2.4.2+zc-2 and Windows98 SE. Nobody else has complained so I guess it is something local (mii-tool for Windows wouldn't be a bad idea), but if the above doesn't work for you I'd been keen to know about it. --cw From owner-netdev@oss.sgi.com Sat Feb 24 19:50:55 2001 Received: by oss.sgi.com id ; Sat, 24 Feb 2001 19:50:46 -0800 Received: from sith.mimuw.edu.pl ([193.0.97.1]:1542 "HELO sith.mimuw.edu.pl") by oss.sgi.com with SMTP id ; Sat, 24 Feb 2001 19:50:33 -0800 Received: (qmail 10354 invoked by uid 1645); 25 Feb 2001 03:54:20 -0000 Date: Sun, 25 Feb 2001 04:54:20 +0100 From: Jan Rekorajski To: Chris Wedgwood Cc: "David S. Miller" , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] zerocopy BETA 3 Message-ID: <20010225045420.B10281@sith.mimuw.edu.pl> Mail-Followup-To: Jan Rekorajski , Chris Wedgwood , "David S. Miller" , linux-kernel@vger.kernel.org, netdev@oss.sgi.com References: <14998.2628.144784.585248@pizda.ninka.net> <20010223114249.A27608@sith.mimuw.edu.pl> <20010225163836.A12173@metastasis.f00f.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: <20010225163836.A12173@metastasis.f00f.org>; from cw@f00f.org on Sun, Feb 25, 2001 at 04:38:36PM +1300 X-Operating-System: Linux 2.4.2-pre4 i686 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 5787 Lines: 178 On Sun, 25 Feb 2001, Chris Wedgwood wrote: > On Fri, Feb 23, 2001 at 11:42:49AM +0100, Jan Rekorajski wrote: > > Could you please make a patch with this fix only? Or is it > available somewhere? > [cut incomplete patch ;)] There are more changes, I hacked'em out of vger CVS: diff -urN linux/include/net/ip.h linux.fixed/include/net/ip.h --- linux/include/net/ip.h Thu Feb 22 01:10:38 2001 +++ linux.fixed/include/net/ip.h Fri Feb 23 14:40:40 2001 @@ -188,11 +188,16 @@ extern void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst); -static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst) +static inline void ip_select_ident(struct iphdr *iph, struct dst_entry *dst, struct sock *sk) { - if (iph->frag_off&__constant_htons(IP_DF)) - iph->id = 0; - else + if (iph->frag_off&__constant_htons(IP_DF)) { + /* This is only to work around buggy Windows95/2000 + * VJ compression implementations. If the ID field + * does not change, they drop every other packet in + * a TCP stream using header compression. + */ + iph->id = (sk ? sk->protinfo.af_inet.id++ : 0); + } else __ip_select_ident(iph, dst); } diff -urN linux/include/net/ipip.h linux.fixed/include/net/ipip.h --- linux/include/net/ipip.h Sat Aug 5 03:18:49 2000 +++ linux.fixed/include/net/ipip.h Fri Feb 23 14:40:43 2001 @@ -30,7 +30,7 @@ int pkt_len = skb->len; \ \ iph->tot_len = htons(skb->len); \ - ip_select_ident(iph, &rt->u.dst); \ + ip_select_ident(iph, &rt->u.dst, NULL); \ ip_send_check(iph); \ \ err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, do_ip_send); \ diff -urN linux/include/net/sock.h linux.fixed/include/net/sock.h --- linux/include/net/sock.h Thu Feb 22 01:10:24 2001 +++ linux.fixed/include/net/sock.h Fri Feb 23 14:40:49 2001 @@ -204,6 +204,7 @@ __u8 mc_loop; /* Loopback */ unsigned recverr : 1, freebind : 1; + __u16 id; /* ID counter for DF pkts */ __u8 pmtudisc; int mc_index; /* Multicast device index */ __u32 mc_addr; diff -urN linux/net/ipv4/af_inet.c linux.fixed/net/ipv4/af_inet.c --- linux/net/ipv4/af_inet.c Fri Dec 29 23:07:24 2000 +++ linux.fixed/net/ipv4/af_inet.c Fri Feb 23 14:40:34 2001 @@ -355,6 +355,8 @@ else sk->protinfo.af_inet.pmtudisc = IP_PMTUDISC_WANT; + sk->protinfo.af_inet.id = 0; + sock_init_data(sock,sk); sk->destruct = inet_sock_destruct; diff -urN linux/net/ipv4/igmp.c linux.fixed/net/ipv4/igmp.c --- linux/net/ipv4/igmp.c Tue Jan 9 19:54:57 2001 +++ linux.fixed/net/ipv4/igmp.c Fri Feb 23 14:40:38 2001 @@ -235,7 +235,7 @@ iph->saddr = rt->rt_src; iph->protocol = IPPROTO_IGMP; iph->tot_len = htons(IGMP_SIZE); - ip_select_ident(iph, &rt->u.dst); + ip_select_ident(iph, &rt->u.dst, NULL); ((u8*)&iph[1])[0] = IPOPT_RA; ((u8*)&iph[1])[1] = 4; ((u8*)&iph[1])[2] = 0; diff -urN linux/net/ipv4/ip_output.c linux.fixed/net/ipv4/ip_output.c --- linux/net/ipv4/ip_output.c Fri Oct 27 20:03:14 2000 +++ linux.fixed/net/ipv4/ip_output.c Fri Feb 23 14:54:17 2001 @@ -141,7 +141,7 @@ iph->saddr = rt->rt_src; iph->protocol = sk->protocol; iph->tot_len = htons(skb->len); - ip_select_ident(iph, &rt->u.dst); + ip_select_ident(iph, &rt->u.dst, sk); skb->nh.iph = iph; if (opt && opt->optlen) { @@ -307,7 +307,7 @@ if (ip_dont_fragment(sk, &rt->u.dst)) iph->frag_off |= __constant_htons(IP_DF); - ip_select_ident(iph, &rt->u.dst); + ip_select_ident(iph, &rt->u.dst, sk); /* Add an IP checksum. */ ip_send_check(iph); @@ -328,7 +328,7 @@ kfree_skb(skb); return -EMSGSIZE; } - ip_select_ident(iph, &rt->u.dst); + ip_select_ident(iph, &rt->u.dst, sk); return ip_fragment(skb, skb->dst->output); } @@ -425,7 +425,7 @@ int err; int offset, mf; int mtu; - u16 id = 0; + u16 id; int hh_len = (rt->u.dst.dev->hard_header_len + 15)&~15; int nfrags=0; @@ -495,6 +495,8 @@ * Begin outputting the bytes. */ + id = (sk ? sk->protinfo.af_inet.id++ : 0); + do { char *data; struct sk_buff * skb; @@ -677,7 +679,7 @@ iph->tot_len = htons(length); iph->frag_off = df; iph->ttl=sk->protinfo.af_inet.mc_ttl; - ip_select_ident(iph, &rt->u.dst); + ip_select_ident(iph, &rt->u.dst, sk); if (rt->rt_type != RTN_MULTICAST) iph->ttl=sk->protinfo.af_inet.ttl; iph->protocol=sk->protocol; diff -urN linux/net/ipv4/ipmr.c linux.fixed/net/ipv4/ipmr.c --- linux/net/ipv4/ipmr.c Wed Nov 29 06:53:45 2000 +++ linux.fixed/net/ipv4/ipmr.c Fri Feb 23 14:40:45 2001 @@ -1092,7 +1092,7 @@ iph->protocol = IPPROTO_IPIP; iph->ihl = 5; iph->tot_len = htons(skb->len); - ip_select_ident(iph, skb->dst); + ip_select_ident(iph, skb->dst, NULL); ip_send_check(iph); skb->h.ipiph = skb->nh.iph; diff -urN linux/net/ipv4/raw.c linux.fixed/net/ipv4/raw.c --- linux/net/ipv4/raw.c Fri Feb 9 20:29:44 2001 +++ linux.fixed/net/ipv4/raw.c Fri Feb 23 14:40:47 2001 @@ -296,7 +296,7 @@ * ip_build_xmit clean (well less messy). */ if (!iph->id) - ip_select_ident(iph, rfh->dst); + ip_select_ident(iph, rfh->dst, NULL); iph->check=ip_fast_csum((unsigned char *)iph, iph->ihl); } return 0; > FWIW; I am still seeing _really_ bad throughput on a 10M ethernet > segment between 2.4.2+zc-2 and Windows98 SE. Nobody else has > complained so I guess it is something local (mii-tool for Windows > wouldn't be a bad idea), but if the above doesn't work for you I'd > been keen to know about it. I hadn't the time to test it fully yet, but DaveM's quick and dirty patch for this cured my problems. Jan -- Jan Rêkorajski | ALL SUSPECTS ARE GUILTY. PERIOD! bagginsmimuw.edu.pl | OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY? BOFH, MANIAC | -- TROOPS by Kevin Rubio From owner-netdev@oss.sgi.com Sun Feb 25 04:02:18 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 04:01:59 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:9195 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Sun, 25 Feb 2001 04:01:51 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id XAA00067; Sun, 25 Feb 2001 23:01:24 +1100 (EST) Message-ID: <3A98F417.C38A67BE@uow.edu.au> Date: Sun, 25 Feb 2001 23:01:27 +1100 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.2-pre2 i586) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance References: <3A9842DC.B42ECD7A@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 3367 Lines: 85 Jeff Garzik wrote: > >... > 1) Rx Skb recycling. >... > 2) Tx packet grouping. >... > 3) Slabbier packet allocation. Let's see what the profiler says. 10 seconds of TCP xmit followed by 10 seconds of TCP receive. 100 mbits/sec. Kernel 2.4.2+ZC. c0119470 do_softirq 97 0.7132 c020e718 ip_output 99 0.3694 c020a2c8 ip_route_input 103 0.2893 c01fdc4c skb_release_data 113 1.0089 c021312c tcp_sendmsg 113 0.0252 c0129c64 kmalloc 117 0.3953 c0112efc __wake_up_sync 128 0.6667 c01fdd24 __kfree_skb 153 0.6071 c020e824 ip_queue_xmit 154 0.1149 c011be80 del_timer 163 2.2639 c0222fac tcp_v4_rcv 173 0.1022 c010a778 handle_IRQ_event 178 1.4833 c01127fc schedule 200 0.1259 c01d39f8 boomerang_rx 332 0.2823 c024284c csum_partial_copy_generic 564 2.2742 c01d2c84 boomerang_start_xmit 654 0.9033 c0242b3c __generic_copy_from_user 733 12.2167 c01d329c boomerang_interrupt 910 0.8818 c01071f4 poll_idle 41813 1306.6562 00000000 total 48901 0.0367 7088 non-idle ticks. 153+117+113 ticks in skb/memory type functions. So, naively, the most which can be saved here by optimising the skb and memory usage is 5% of networking load. (1% of system load @100 mbps) Total device driver cost is 27% of the networking load. All the meat is in the interrupt load. The 3com driver transfers about three packets per interrupt. Here's the system load (dual CPU): Doing 100mbps TCP send with netperf: 14.9% Doing 100mbps TCP receive with netperf: 23.3% When tx interrupt mitigation is disabled we get 1.5 packets per interrupt doing transmit: Doing 100mbps TCP send with netperf: 16.1% Doing 100mbps TCP receive with netperf: 24.0% So a 2x reduction in interrupt frequency on TCP transmit has saved 1.2% of system load. That's 8% of networking load, and, presumably, 30% of the driver load. That all seems to make sense. The moral? - Tuning skb allocation isn't likely to make much difference. - At the device-driver level the most effective thing is to reduce the number of interrupts. - If we can reduce the driver cost to *zero*, we improve TCP efficiency by 27%. - At the system level the most important thing is to rewrite applications to use sendfile(). (But Rx is more expensive than Tx, so even this ain't the main game). I agree that batching skbs into hard_start_xmit() may allow some driver optimisations. Pass it a vector of skbs rather than one, and let it return an indication of how many were actually consumed. But we'd need to go through an exercise like the above beforehand - it may not be worth the protocol-level trauma. I suspect that a thorough analysis of the best way to use Linux networking, and then a rewrite of important applications so they use the result of that analysis would pay dividends. - From owner-netdev@oss.sgi.com Sun Feb 25 04:23:38 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 04:23:29 -0800 Received: from lsb-catv-1-p021.vtxnet.ch ([212.147.5.21]:40204 "EHLO almesberger.net") by oss.sgi.com with ESMTP id ; Sun, 25 Feb 2001 04:23:10 -0800 Received: (from almesber@localhost) by almesberger.net (8.9.3/8.9.3) id NAA29940; Sun, 25 Feb 2001 13:22:49 +0100 Date: Sun, 25 Feb 2001 13:22:49 +0100 From: Werner Almesberger To: Jeff Garzik Cc: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance Message-ID: <20010225132249.J18271@almesberger.net> References: <3A9842DC.B42ECD7A@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3A9842DC.B42ECD7A@mandrakesoft.com>; from jgarzik@mandrakesoft.com on Sat, Feb 24, 2001 at 06:25:16PM -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1302 Lines: 34 Jeff Garzik wrote: > 1) Rx Skb recycling. Sounds like a potentially useful idea. To solve the most immediate memory pressure problems, maybe VM could provide some function that does a kfree in cases of memory shortage, and that does nothing otherwise, so the driver could offer to free the skb after netif_rx. You still need to go over the list in idle periods, though. > 2) Tx packet grouping. Hmm, I think we need an estimate of how long a packet train you'd usually get. A flag looks reasonably inexpensive. Estimated numbers sound like over-engineering. > Disadvantages? Can this sort of knowledge be obtained by a netdevice > right now, without any kernel modifications? Question is what the hardware really needs. If you can change the interrupt point easily, it's probably cheapest to do all the work in hard_start_xmit. > 3) Slabbier packet allocation. Hmm, this may actually be worse during bursts: if you burst exceeds the preallocated size, you have to perform more expensive/slower operations (e.g. running a tasklet) to refill your cache. - Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH Werner.Almesberger@epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Sun Feb 25 04:42:49 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 04:42:39 -0800 Received: from lsb-catv-1-p021.vtxnet.ch ([212.147.5.21]:41996 "EHLO almesberger.net") by oss.sgi.com with ESMTP id ; Sun, 25 Feb 2001 04:42:17 -0800 Received: (from almesber@localhost) by almesberger.net (8.9.3/8.9.3) id NAA29969; Sun, 25 Feb 2001 13:41:56 +0100 Date: Sun, 25 Feb 2001 13:41:56 +0100 From: Werner Almesberger To: Chris Wedgwood Cc: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance Message-ID: <20010225134156.K18271@almesberger.net> References: <3A9842DC.B42ECD7A@mandrakesoft.com> <3A986EDB.363639E7@coplanar.net> <20010225162357.A12123@metastasis.f00f.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010225162357.A12123@metastasis.f00f.org>; from cw@f00f.org on Sun, Feb 25, 2001 at 04:23:57PM +1300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 859 Lines: 20 Chris Wedgwood wrote: > That said, it would be an extemely neat thing to do from a technical > perspective, but I don't know if you would ever get really good > performance from it. Well, you'd have to re-design the networking code to support NUMA architectures, with a fairly fine granularity. I'm not sure you'd gain anything except possibly for the forwarding fast path. A cheaper, and probably more useful possibility is hardware assistance for specific operations. E.g. hardware-accelerated packet classification looks interesting. I'd also like to see hardware-assistance for shaping on other media than ATM. - Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH Werner.Almesberger@epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Sun Feb 25 05:20:50 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 05:20:40 -0800 Received: from zooty.lancs.ac.uk ([148.88.16.231]:31419 "EHLO zooty.lancs.ac.uk") by oss.sgi.com with ESMTP id ; Sun, 25 Feb 2001 05:20:23 -0800 Received: from mail.lancs.ac.uk ([148.88.1.10] helo=marl.lancs.ac.uk) by zooty.lancs.ac.uk with esmtp (Exim 3.16 #1) id 14X163-0005hx-00; Sun, 25 Feb 2001 13:20:19 +0000 Received: from dynb1bf.pg.local ([10.38.9.191] helo=helium.chromatix.org.uk) by marl.lancs.ac.uk with esmtp (Exim 3.16 #4) id 14X160-00029n-00; Sun, 25 Feb 2001 13:20:16 +0000 Received: from magpie.chromatix.org.uk ([192.168.239.101]) by helium.chromatix.org.uk with esmtp (Exim 3.15 #5) id 14X160-0005SF-00; Sun, 25 Feb 2001 13:20:16 +0000 X-Sender: chromi@helium Message-Id: In-Reply-To: <3A986EDB.363639E7@coplanar.net> References: <3A9842DC.B42ECD7A@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Sun, 25 Feb 2001 13:08:50 +0000 To: Jeremy Jackson , Jeff Garzik From: Jonathan Morton Subject: Re: New net features for added performance Cc: netdev@oss.sgi.com, Linux Knernel Mailing List Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1354 Lines: 35 At 2:32 am +0000 25/2/2001, Jeremy Jackson wrote: >Jeff Garzik wrote: > >(about optimizing kernel network code for busmastering NIC's) > >> Disclaimer: This is 2.5, repeat, 2.5 material. > >Related question: are there any 100Mbit NICs with cpu's onboard? >Something mainstream/affordable?(i.e. not 1G ethernet) >Just recently someone posted asking some technical question about >ARMlinux for and intel card with 2 1G ports, 8 100M ports, >an onboard ARM cpu and 4 other uControllers... seems to me >that ultimately the networking code should go in that direction: >immagine having the *NIC* do most of this... no cache pollution problems... Dunno, but the latest Motorola ColdFire microcontroller has Ethernet built in. I think it's even 100baseTX, but I could be mistaken. -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+ -----END GEEK CODE BLOCK----- From owner-netdev@oss.sgi.com Sun Feb 25 05:58:09 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 05:58:00 -0800 Received: from f00f.stub.clear.net.nz ([203.167.224.51]:64525 "HELO metastasis.f00f.org") by oss.sgi.com with SMTP id ; Sun, 25 Feb 2001 05:57:39 -0800 Received: by metastasis.f00f.org (Postfix, from userid 1000) id C39E59E1F; Mon, 26 Feb 2001 02:57:36 +1300 (NZDT) Date: Mon, 26 Feb 2001 02:57:36 +1300 From: Chris Wedgwood To: Werner Almesberger Cc: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance Message-ID: <20010226025736.A13227@metastasis.f00f.org> References: <3A9842DC.B42ECD7A@mandrakesoft.com> <3A986EDB.363639E7@coplanar.net> <20010225162357.A12123@metastasis.f00f.org> <20010225134156.K18271@almesberger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010225134156.K18271@almesberger.net>; from Werner.Almesberger@epfl.ch on Sun, Feb 25, 2001 at 01:41:56PM +0100 X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 444 Lines: 14 On Sun, Feb 25, 2001 at 01:41:56PM +0100, Werner Almesberger wrote: Well, you'd have to re-design the networking code to support NUMA architectures, with a fairly fine granularity. I'm not sure you'd gain anything except possibly for the forwarding fast path. I'm not convince for a general purpose OS you would gain anything at all; but an an intellectual exercise it's a fascinating idea. I'd make a good PhD thesis. --cw From owner-netdev@oss.sgi.com Sun Feb 25 07:16:49 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 07:16:29 -0800 Received: from tomts7.bellnexxia.net ([209.226.175.40]:54404 "EHLO tomts7-srv.bellnexxia.net") by oss.sgi.com with ESMTP id ; Sun, 25 Feb 2001 07:16:12 -0800 Received: from coplanar.net ([64.230.144.142]) by tomts7-srv.bellnexxia.net (InterMail vM.4.01.03.16 201-229-121-116-20010115) with ESMTP id <20010225151605.NPQD757.tomts7-srv.bellnexxia.net@coplanar.net>; Sun, 25 Feb 2001 10:16:05 -0500 Message-ID: <3A9920B6.393E63B2@coplanar.net> Date: Sun, 25 Feb 2001 10:11:50 -0500 From: Jeremy Jackson X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: Jeff Garzik , netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance References: <3A9842DC.B42ECD7A@mandrakesoft.com> <3A98F417.C38A67BE@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 936 Lines: 29 Andrew Morton wrote: (kernel profile of TCP tx/rx)So, naively, the most which can be saved here by optimising > the skb and memory usage is 5% of networking load. (1% of > system load @100 mbps) > For a local tx/rx. (open question) What happens with a router box with netfilter and queueing? Perhaps this type of optimisation will help more in that case? think about a box with 4 1G NICs being able to route AND do QoS per conntrack connection (ala RSVP and such) Really what I'm looking for is something like SGI's STP (Scheduled Transfer Protocol). mmap your tcp recieve buffer, and have a card smart enough to figure out header alignment, (i.e. know header size based on protocol number) transfer only that, let the kernel process it, then tell the card to DMA the data from the buffer right into process memory. (or other NIC) Make it possible to have the performance of a Juniper network processor + flexiblity of Linux. From owner-netdev@oss.sgi.com Sun Feb 25 11:00:00 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 10:59:50 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:22277 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 25 Feb 2001 10:59:34 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA00976; Sun, 25 Feb 2001 21:59:23 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102251859.VAA00976@ms2.inr.ac.ru> Subject: Re: New net features for added performance To: jgarzik@mandrakesoft.COM (Jeff Garzik) Date: Sun, 25 Feb 2001 21:59:23 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <3A9842DC.B42ECD7A@mandrakesoft.com> from "Jeff Garzik" at Feb 25, 1 02:45:01 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 747 Lines: 28 Hello! > into which the net card DMA's data. If netif_rx returned the SKB > instead of freeing it, But it is always queued by stack. It is rule. Ability to return is a rare exception and it happens only when and exactly when driver _must_ shrink its rx ring to stop congestion. > 2) Tx packet grouping. If the net core has knowledge that more packets > will be following the current one being sent to dev->hard_start_xmit(), It does not have such knowledge. > 3) Slabbier packet allocation. Probably. Note that driver can do this now and you need not to wait for 2.5 to evaluate this. In fact, Jes has made this for hippi ages ago. To summarize: look better to this maillist for Jamal's patch. It is really interesting thing. Alexey From owner-netdev@oss.sgi.com Sun Feb 25 11:31:30 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 11:31:11 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:519 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 25 Feb 2001 11:30:53 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA01764; Sun, 25 Feb 2001 22:23:57 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102251923.WAA01764@ms2.inr.ac.ru> Subject: Re: possible bug x86 2.4.2 SMP in IP receive stack To: feldy@myri.COM (Bob Felderman) Date: Sun, 25 Feb 2001 22:23:57 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <200102232259.OAA20943@frisbee.myri.com> from "Bob Felderman" at Feb 24, 1 02:15:01 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 352 Lines: 13 Hello! > Note that this is using a Myrinet network What driver do you use? Such things are almost 100% explained by double dma on one skb or by some other kind of memory corruption. > Feb 23 12:42:30 rcc2 kernel: Warning: kfree_skb passed an skb still on a list (from c01f58dc). Oops backtrace is better source of information, by the way. Alexey From owner-netdev@oss.sgi.com Sun Feb 25 14:17:42 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 14:17:32 -0800 Received: from u-220-10.karlsruhe.ipdial.viaginterkom.de ([62.180.10.220]:17654 "EHLO dea.waldorf-gmbh.de") by oss.sgi.com with ESMTP id ; Sun, 25 Feb 2001 14:17:18 -0800 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f1PGCRD11961 for netdev@oss.sgi.com; Sun, 25 Feb 2001 17:12:27 +0100 Date: Sun, 25 Feb 2001 17:12:27 +0100 From: Ralf Baechle To: netdev@oss.sgi.com Subject: Failed assertion Message-ID: <20010225171226.A11945@bacchus.dhis.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 239 Lines: 8 During heavy outgoing traffic while benchmarking a network driver I received this message: KERNEL: assertion (atomic_read(&sk->wmem_alloc) == 0) failed at af_inet.c(164):inet_sock_destruct This was with (almost ...) stock 2.4.1. Ralf From owner-netdev@oss.sgi.com Sun Feb 25 21:31:46 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 21:31:37 -0800 Received: from pizda.ninka.net ([216.101.162.242]:41624 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Sun, 25 Feb 2001 21:31:16 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id VAA13657; Sun, 25 Feb 2001 21:28:34 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15001.59778.748224.602042@pizda.ninka.net> Date: Sun, 25 Feb 2001 21:28:34 -0800 (PST) To: Chris Wedgwood Cc: Jan Rekorajski , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [UPDATE] zerocopy BETA 3 In-Reply-To: <20010225163836.A12173@metastasis.f00f.org> References: <14998.2628.144784.585248@pizda.ninka.net> <20010223114249.A27608@sith.mimuw.edu.pl> <20010225163836.A12173@metastasis.f00f.org> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 387 Lines: 14 Chris Wedgwood writes: > --- linux-2.4.2/include/net/ip.h Sun Feb 25 01:15:19 2001 > +++ linux-2.4.2+zc-2/include/net/ip.h Sun Feb 25 01:53:52 2001 You need to part that adds "id" to the sock struct too. This won't build "as-is". Besides, I'd like people to have to test the zerocopy stuff for me, they'll get the ID fix if they do that :-) Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Feb 25 21:40:36 2001 Received: by oss.sgi.com id ; Sun, 25 Feb 2001 21:40:16 -0800 Received: from pizda.ninka.net ([216.101.162.242]:48536 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Sun, 25 Feb 2001 21:39:58 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id VAA13736; Sun, 25 Feb 2001 21:37:21 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15001.60305.529430.679039@pizda.ninka.net> Date: Sun, 25 Feb 2001 21:37:21 -0800 (PST) To: Ralf Baechle Cc: netdev@oss.sgi.com Subject: Re: Failed assertion In-Reply-To: <20010225171226.A11945@bacchus.dhis.org> References: <20010225171226.A11945@bacchus.dhis.org> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 424 Lines: 15 Ralf Baechle writes: > During heavy outgoing traffic while benchmarking a network driver I > received this message: > > KERNEL: assertion (atomic_read(&sk->wmem_alloc) == 0) failed at af_inet.c(164):inet_sock_destruct > > This was with (almost ...) stock 2.4.1. What platform and what driver? It is the first time that any such instance of this message has been reported. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 03:23:37 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 03:23:17 -0800 Received: from zikova.cvut.cz ([147.32.235.100]:13841 "EHLO zikova.cvut.cz") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 03:23:09 -0800 Received: from vcnet.vc.cvut.cz (vcnet.vc.cvut.cz [147.32.240.61]) by zikova.cvut.cz (8.9.0.Beta5/8.9.0.Beta5) with ESMTP id MAA13310; Mon, 26 Feb 2001 12:21:49 +0100 Received: from VCNET/SpoolDir by vcnet.vc.cvut.cz (Mercury 1.21); 26 Feb 101 12:21:51 MET-1MEST Received: from SpoolDir by VCNET (Mercury 1.30); 26 Feb 101 12:21:42 MET-1MEST From: "Petr Vandrovec" Organization: CC CTU Prague To: "David S. Miller" Date: Mon, 26 Feb 2001 12:21:40 MET-1 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: Failed assertion CC: netdev@oss.sgi.com, ralf@oss.sgi.com X-mailer: Pegasus Mail v3.40 Message-ID: <86C68935F9@vcnet.vc.cvut.cz> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 3271 Lines: 101 On 25 Feb 01 at 21:37, David S. Miller wrote: > > During heavy outgoing traffic while benchmarking a network driver I > > received this message: > > > > KERNEL: assertion (atomic_read(&sk->wmem_alloc) == 0) failed at af_inet.c(164):inet_sock_destruct > > > > This was with (almost ...) stock 2.4.1. > > What platform and what driver? It is the first time that any such > instance of this message has been reported. Dave, it is not first time. Unfortunately, I do not have my original message anymore, only Alexey's reply... It happened on dual PIII/800, with tulip driver at: Nov 23, 13:30:37 Nov 24, 22:24:02 - and whoa at 22:24:10: Unable to handle kernel NULL pointer dereference at virtual address 00000000 *pde = 0 Oops: 0002 CPU: 1 EIP: remove_wait_queue+6/36 EFLAGS: 00010086 ... Process ping (pid: 3948, stackpage=c712d000) ... trace: wait_for_packet+230/300 skb_recv_datagram+208/240 raw_recvmsg+104/308 inet_recvmsg+88/112 sock_recvmsg+61/172 sys_recvfrom+173/264 do_page_fault+323/1000 ?? [] ?? do_page_fault+0/1000 ?? do_mmap_pgoff+895/1044 ?? do_getitimer+156/164 ?? sys_socketcall+383/508 system_call+51/56 It did not happened since that time, although I was running 2.4.0-test11 up to 28th November. Machine is running arpwatch, and periodically pings all hosts on my local C subnet. It is not perfect, but: while true; do Q=1 while [ $Q -ne 255 ]; do IP=147.32.240.$Q if [ ! -f $IP ]; then if [ `ping -c 1 $IP | wc -l` -ne 6 ]; then echo $IP is dead else Y=`cat /proc/net/arp | grep "^147.32.240.$Q " | cut -c42-58` echo $IP is alive, ethernet node $Y echo $IP $Y > $IP host -t any $IP | tee /dev/tty >> $IP fi fi Q=`expr $Q + 1` done done I hope this helps. Petr Vandrovec vandrove@vc.cvut.cz Reply from Alexey follows: From: kuznet@ms2.inr.ac.ru Message-Id: <200011232008.XAA06575@ms2.inr.ac.ru> Subject: Re: 2.4.0-test11 and af_inet.c(164) To: VANDROVE@vc.cvut.cz (Petr Vandrovec) Date: Thu, 23 Nov 2000 23:08:16 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Petr Vandrovec" at Nov 23, 0 07:45:43 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 X-Orcpt: rfc822;netdev@oss.sgi.com Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > KERNEL: assertion (atomic_read(&sk->wmem_alloc) == 0) failed at af_inet.c(164):inet_sock_destruct .... > Everything still works and I have no idea how to find whether some 100 bytes > were leaked somewhere or not... (of course, error is non-repeatable) Well, it is not plain 100 bytes leaked. It is rather an absolutely impossible event, which cannot happen even if all the bits leaked. 8) OK... Audit is started. Alexey From owner-netdev@oss.sgi.com Mon Feb 26 14:37:12 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 14:36:53 -0800 Received: from [204.244.205.25] ([204.244.205.25]:11076 "HELO post.gateone.com") by oss.sgi.com with SMTP id ; Mon, 26 Feb 2001 14:36:38 -0800 Received: (qmail 31563 invoked from network); 26 Feb 2001 22:36:33 -0000 Received: from mystery.wizard.ca (HELO mistress) (204.244.205.8) by mail.wizard.ca with SMTP; 26 Feb 2001 22:36:33 -0000 From: Michael Peddemors Reply-To: michael@linuxmagic.com Organization: LinuxMagic Inc. To: Jan Rekorajski , Chris Wedgwood Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff Date: Mon, 26 Feb 2001 15:46:56 -0800 X-Mailer: KMail [version 1.1.95.0] Content-Type: text/plain Cc: "David S. Miller" , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, waltje@uWalt.NL.Mugnet.ORG References: <14998.2628.144784.585248@pizda.ninka.net> <20010225163836.A12173@metastasis.f00f.org> <20010225045420.B10281@sith.mimuw.edu.pl> In-Reply-To: <20010225045420.B10281@sith.mimuw.edu.pl> MIME-Version: 1.0 Message-Id: <0102261546570H.02007@mistress> Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1694 Lines: 48 While doing some work on some ip options stuff, I have noticed a bunchof unused entries in linux/include/linux/ip.h A few things.. why is ip.h not part of the linux/include/net rather than linux/include/linux hierachy? Defined items that are not used anywhere in the source.. Can any of them be deleted now? Also, I was looking into some RFC 1812 stuff. (Thanks for nothing Dave :) and was looking at 4.2.2.6 where it mentions that a router MUST implement the End of Option List option.. Havent' figured out where that is implememented yet.. Also was trying to figure out some things. I want to create a new ip_option for use in some DOS protection experiments. I have a whole 40 bytes (+/-) to share... Now although I don't see anything explicitly prohibiting the use of unused IP Header option space, I know that it really was designed for use by the sending parties, and not routers in between.. Has anyone seen any RFC that explicitly says I MUST NOT? IPTOS_PREC_NETCONTROL IPTOS_PREC_FLASHOVERRIDE IPTOS_PREC_FLASH IPTOS_PREC_IMMEDIATE IPTOS_PREC_PRIORITY IPTOS_PREC_ROUTINE IPOPT_RESERVED1 IPOPT_RESERVED2 IPOPT_OPTVAL IPOPT_OLEN IPOPT_MINOFF MAX_IPOPTLEN IPOPT_EOL > diff -urN linux/include/net/ip.h linux.fixed/include/net/ip.h -------------------------------------------------------- Michael Peddemors - Senior Consultant Unix Administration - WebSite Hosting Network Services - Programming Wizard Internet Services http://www.wizard.ca Linux Support Specialist - http://www.linuxmagic.com -------------------------------------------------------- (604) 589-0037 Beautiful British Columbia, Canada -------------------------------------------------------- From owner-netdev@oss.sgi.com Mon Feb 26 15:30:02 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 15:29:53 -0800 Received: from pizda.ninka.net ([216.101.162.242]:18079 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 15:29:36 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA20224; Mon, 26 Feb 2001 15:25:26 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15002.58854.215318.882641@pizda.ninka.net> Date: Mon, 26 Feb 2001 15:25:26 -0800 (PST) To: michael@linuxmagic.com Cc: Jan Rekorajski , Chris Wedgwood , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, waltje@uWalt.NL.Mugnet.ORG Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff In-Reply-To: <0102261546570H.02007@mistress> References: <14998.2628.144784.585248@pizda.ninka.net> <20010225163836.A12173@metastasis.f00f.org> <20010225045420.B10281@sith.mimuw.edu.pl> <0102261546570H.02007@mistress> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1311 Lines: 35 Michael Peddemors writes: > A few things.. why is ip.h not part of the linux/include/net rather than > linux/include/linux hierachy? Exported to older userlands... > Defined items that are not used anywhere in the source.. > Can any of them be deleted now? > So what, userland makes use of them :-) > Also, I was looking into some RFC 1812 stuff. (Thanks for nothing Dave :) and > was looking at 4.2.2.6 where it mentions that a router MUST implement the End > of Option List option.. Havent' figured out where that is implememented yet.. egrep "IPOPT_END" net/ipv4/ip_options.c You just aren't looking hard enough. > Also was trying to figure out some things. > I want to create a new ip_option for use in some DOS protection experiments. > I have a whole 40 bytes (+/-) to share... Now although I don't see anything > explicitly prohibiting the use of unused IP Header option space, I know that > it really was designed for use by the sending parties, and not routers in > between.. Has anyone seen any RFC that explicitly says I MUST NOT? Not to my knowledge. Routers already change the time to live field, so I see no reason why they can't do smart things with special IP options either (besides efficiency concerns :-). Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 15:35:33 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 15:35:12 -0800 Received: from pizda.ninka.net ([216.101.162.242]:23455 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 15:34:59 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA20281; Mon, 26 Feb 2001 15:32:07 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15002.59255.720093.731878@pizda.ninka.net> Date: Mon, 26 Feb 2001 15:32:07 -0800 (PST) To: "Petr Vandrovec" Cc: netdev@oss.sgi.com, ralf@oss.sgi.com Subject: Re: Failed assertion In-Reply-To: <86C68935F9@vcnet.vc.cvut.cz> References: <86C68935F9@vcnet.vc.cvut.cz> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 756 Lines: 22 Petr Vandrovec writes: > On 25 Feb 01 at 21:37, David S. Miller wrote: > > > During heavy outgoing traffic while benchmarking a network driver I > > > received this message: > > > > > > KERNEL: assertion (atomic_read(&sk->wmem_alloc) == 0) failed at af_inet.c(164):inet_sock_destruct > > > > > > This was with (almost ...) stock 2.4.1. > > > > What platform and what driver? It is the first time that any such > > instance of this message has been reported. > > Dave, it is not first time. Unfortunately, I do not have my original > message anymore, only Alexey's reply... Ok, Ralf please get the information I asked for and also the backtrace printed after the kernel assertion failure. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 15:50:32 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 15:50:12 -0800 Received: from pizda.ninka.net ([216.101.162.242]:30367 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 15:50:08 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA20322; Mon, 26 Feb 2001 15:46:16 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15002.60104.350394.893905@pizda.ninka.net> Date: Mon, 26 Feb 2001 15:46:16 -0800 (PST) To: Jeff Garzik Cc: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance In-Reply-To: <3A9842DC.B42ECD7A@mandrakesoft.com> References: <3A9842DC.B42ECD7A@mandrakesoft.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1865 Lines: 50 Jeff Garzik writes: > 1) Rx Skb recycling. ... > Advantages: A de-allocation immediately followed by a reallocation is > eliminated, less L1 cache pollution during interrupt handling. > Potentially less DMA traffic between card and host. ... > Disadvantages? It simply cannot work, as Alexey stated, in normal circumstances netif_rx() queues until the user reads the data. This is the whole basis of our receive packet processing model within softint/user context. Secondly, I can argue that skb recycling can give _worse_ cache performance. If the next use and access by the card to the skb data is deferred, this gives the cpu a chance to displace those lines in it's cache naturally via displacement instead of being forced quickly to do so when the device touches that data. If the device forces the cache displacement, those cache lines become empty until filled with something later (smaller utilization of total cache contents) whereas natural displacement puts useful data into the cache at the time of the displacement (larger utilization of total cache contents). It is an NT/windows driver API rubbish idea, and it is full crap. > 2) Tx packet grouping. ... > Disadvantages? See Torvalds vs. world discussion on this list about API entry points which pass multiple pages at a time versus simpler ones which pass only a single page at a time. :-) > 3) Slabbier packet allocation. ... > Disadvantages? Doing this might increase cache pollution due to > increased code and data size, but I think the hot path is much improved > (dequeue a properly sized, initialized, skb-reserved'd skb off a list) > and would help mitigate the impact of sudden bursts of traffic. I don't know what I think about this one, but my hunch is that it will lead to worse data packing via such an allocator. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 15:52:02 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 15:51:43 -0800 Received: from kanga.kvack.org ([216.129.200.3]:50705 "EHLO kanga.kvack.org") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 15:51:31 -0800 Received: (from localhost user: 'blah', uid#63042) by kanga.kvack.org with SMTP id ; Mon, 26 Feb 2001 18:47:34 -0500 Date: Mon, 26 Feb 2001 18:47:33 -0500 (EST) From: "Benjamin C.R. LaHaise" To: "David S. Miller" cc: michael@linuxmagic.com, Jan Rekorajski , Chris Wedgwood , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, waltje@uWalt.NL.Mugnet.ORG Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff In-Reply-To: <15002.58854.215318.882641@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 474 Lines: 13 On Mon, 26 Feb 2001, David S. Miller wrote: > Not to my knowledge. Routers already change the time to live field, > so I see no reason why they can't do smart things with special IP > options either (besides efficiency concerns :-). A number of ISPs patch the MSS value to 1492 due to the ridiculous number of PMTU black holes out on the net. Since the ip header fits in the cache of some CPUs (like the P4), this becoming a cheaper operation than ever before. -ben From owner-netdev@oss.sgi.com Mon Feb 26 16:08:53 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:08:42 -0800 Received: from mail5.atl.bellsouth.net ([205.152.0.93]:14242 "EHLO mail5.atl.bellsouth.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:08:14 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by mail5.atl.bellsouth.net (3.3.5alt/0.75.2) with ESMTP id TAA21130; Mon, 26 Feb 2001 19:10:11 -0500 (EST) Message-ID: <3A9AEFAF.1DC89A8A@mandrakesoft.com> Date: Mon, 26 Feb 2001 19:07:11 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance References: <3A9842DC.B42ECD7A@mandrakesoft.com> <15002.60104.350394.893905@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 556 Lines: 20 "David S. Miller" wrote: > Jeff Garzik writes: > > 2) Tx packet grouping. > ... > > Disadvantages? > > See Torvalds vs. world discussion on this list about API entry points > which pass multiple pages at a time versus simpler ones which pass > only a single page at a time. :-) I only want to know if more are coming, not actually pass multiples.. Jeff -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Mon Feb 26 16:11:22 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:11:02 -0800 Received: from pizda.ninka.net ([216.101.162.242]:44959 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:10:58 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA20442; Mon, 26 Feb 2001 16:05:22 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15002.61250.224811.987948@pizda.ninka.net> Date: Mon, 26 Feb 2001 16:05:22 -0800 (PST) To: "Benjamin C.R. LaHaise" Cc: michael@linuxmagic.com, Jan Rekorajski , Chris Wedgwood , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, waltje@uWalt.NL.Mugnet.ORG Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff In-Reply-To: References: <15002.58854.215318.882641@pizda.ninka.net> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 460 Lines: 13 Benjamin C.R. LaHaise writes: > Since the ip header fits in the cache of some CPUs (like the P4), > this becoming a cheaper operation than ever before. At gigapacket rates, it becomes an issue. This guy is talking about tinkering with new IP _options_, not just the header. So even if the IP header itself fits totally in a cache line, the options afterwardsd likely will not and thus require another cache miss. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 16:11:52 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:11:32 -0800 Received: from boreas.isi.edu ([128.9.160.161]:5018 "EHLO boreas.isi.edu") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:11:19 -0800 Received: from ISI.EDU (boreas.isi.edu [128.9.160.161]) by boreas.isi.edu (8.11.2/8.11.2) with ESMTP id f1R0AuH02140; Mon, 26 Feb 2001 16:10:56 -0800 (PST) To: "David S. Miller" , michael@linuxmagic.com, Jan Rekorajski , Chris Wedgwood , netdev@oss.sgi.com, waltje@uWalt.NL.Mugnet.ORG Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff In-reply-to: Your message of "Mon, 26 Feb 2001 15:25:26 PST." <15002.58854.215318.882641@pizda.ninka.net> Date: Mon, 26 Feb 2001 16:10:56 -0800 Message-ID: <2137.983232656@ISI.EDU> From: Craig Milo Rogers Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1600 Lines: 33 > > I want to create a new ip_option for use in some DOS protection experiments . > > I have a whole 40 bytes (+/-) to share... Now although I don't see anythin g > > explicitly prohibiting the use of unused IP Header option space, I know tha t > > it really was designed for use by the sending parties, and not routers in > > between.. Has anyone seen any RFC that explicitly says I MUST NOT? > >Not to my knowledge. Routers already change the time to live field, >so I see no reason why they can't do smart things with special IP >options either (besides efficiency concerns :-). FWIW, the LSRR, SSRR, Record Route, and Internet Timestamp options all require routers to change the contents of their IP option as the packets bearing the option passed through. Of course, the source routine ones are out of favor now... I've forgotten how the Stream ID option was implemented, but I won't be surprised if a router inserted it on the fly (but it was probably inserted by end systems). On the other hand, there was also a competing philosophy that said that the IP checksum must be recomputed incrementally at routers to catch hardware problems in the routers, and an incremental recomputation when changing the size of the header would be more work. The one thing I would worry about is unleashing mutant IP packets upon the world at large. I hope the proposed experiments have a very good firewall. It would be very nice to attempt to acquire an officially blessed IP option number for such experiments before unleashing these packets upon an unprepared world. Craig Milo Rogers From owner-netdev@oss.sgi.com Mon Feb 26 16:14:32 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:14:23 -0800 Received: from pizda.ninka.net ([216.101.162.242]:49055 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:14:06 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA20481; Mon, 26 Feb 2001 16:10:16 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15002.61544.121517.514618@pizda.ninka.net> Date: Mon, 26 Feb 2001 16:10:16 -0800 (PST) To: Jeff Garzik Cc: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance In-Reply-To: <3A9AEFAF.1DC89A8A@mandrakesoft.com> References: <3A9842DC.B42ECD7A@mandrakesoft.com> <15002.60104.350394.893905@pizda.ninka.net> <3A9AEFAF.1DC89A8A@mandrakesoft.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 355 Lines: 12 Jeff Garzik writes: > I only want to know if more are coming, not actually pass multiples.. Ok, then my only concern is that the path from "I know more is coming" down to hard_start_xmit invocation is long. It would mean passing a new piece of state a long distance inside the stack from SKB origin to device. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 16:15:32 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:15:22 -0800 Received: from kanga.kvack.org ([216.129.200.3]:59153 "EHLO kanga.kvack.org") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:15:12 -0800 Received: (from localhost user: 'blah', uid#63042) by kanga.kvack.org with SMTP id ; Mon, 26 Feb 2001 19:11:20 -0500 Date: Mon, 26 Feb 2001 19:11:20 -0500 (EST) From: "Benjamin C.R. LaHaise" To: "David S. Miller" cc: michael@linuxmagic.com, Jan Rekorajski , Chris Wedgwood , linux-kernel@vger.kernel.org, netdev@oss.sgi.com, waltje@uWalt.NL.Mugnet.ORG Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff In-Reply-To: <15002.61250.224811.987948@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 735 Lines: 20 On Mon, 26 Feb 2001, David S. Miller wrote: > At gigapacket rates, it becomes an issue. This guy is talking about > tinkering with new IP _options_, not just the header. So even if the > IP header itself fits totally in a cache line, the options afterwardsd > likely will not and thus require another cache miss. Hmmm, one way around this is to have the packet queue store things in in a linear array of pointers to data areas, then process things in bursts, ie: - find packet data areas for queued packets - walk list doing prefetches of ip header and options - then actually do the packet processing (save output for later) That will require a number of new hooks for pipelining operations, though. Just a thought. -ben From owner-netdev@oss.sgi.com Mon Feb 26 16:21:53 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:21:33 -0800 Received: from peace.netnation.com ([204.174.223.2]:12049 "EHLO peace.netnation.com") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:21:18 -0800 Received: from sim by peace.netnation.com with local (Exim 3.13 #5) id 14XXt5-0008F0-00; Mon, 26 Feb 2001 16:21:07 -0800 Date: Mon, 26 Feb 2001 16:21:07 -0800 From: Simon Kirby To: "David S. Miller" Cc: Jordan Mendelson , ookhoi@dds.nl, Vibol Hou , Linux-Kernel , netdev@oss.sgi.com Subject: Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) Message-ID: <20010226162107.A31575@netnation.com> References: <20010221104723.C1714@humilis> <14995.40701.818777.181432@pizda.ninka.net> <3A9453F4.993A9A74@napster.com> <14996.21701.542448.49413@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <14996.21701.542448.49413@pizda.ninka.net>; from davem@redhat.com on Wed, Feb 21, 2001 at 03:52:37PM -0800 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 837 Lines: 21 On Wed, Feb 21, 2001 at 03:52:37PM -0800, David S. Miller wrote: > There is no reason my patch should have this effect. > > All of this is what appears to be a bug in Windows TCP header > compression, if the ID field of the IPv4 header does not change then > it drops every other packet. > > The change I posted as-is, is unacceptable because it adds unnecessary > cost to a fast path. The final change I actually use will likely > involve using the TCP sequence numbers to calculate an "always > changing" ID number in the IPv4 headers to placate these broken > windows machines. Has such a patch gone in to the kernel yet? Simon- [ Stormix Technologies Inc. ][ NetNation Communications Inc. ] [ sim@stormix.com ][ sim@netnation.com ] [ Opinions expressed are not necessarily those of my employers. ] From owner-netdev@oss.sgi.com Mon Feb 26 16:30:24 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:30:03 -0800 Received: from pizda.ninka.net ([216.101.162.242]:59551 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:29:47 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA23795; Mon, 26 Feb 2001 16:26:51 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15002.62538.794351.930198@pizda.ninka.net> Date: Mon, 26 Feb 2001 16:26:50 -0800 (PST) To: Simon Kirby Cc: Jordan Mendelson , ookhoi@dds.nl, Vibol Hou , Linux-Kernel , netdev@oss.sgi.com Subject: Re: 2.4 tcp very slow under certain circumstances (Re: netdev issues (3c905B)) In-Reply-To: <20010226162107.A31575@netnation.com> References: <20010221104723.C1714@humilis> <14995.40701.818777.181432@pizda.ninka.net> <3A9453F4.993A9A74@napster.com> <14996.21701.542448.49413@pizda.ninka.net> <20010226162107.A31575@netnation.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 188 Lines: 10 Simon Kirby writes: > Has such a patch gone in to the kernel yet? Yep, it is in both the zerocopy and AC patches. (Linus is away at the moment) Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 16:39:53 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:39:43 -0800 Received: from u-217-21.karlsruhe.ipdial.viaginterkom.de ([62.180.21.217]:16370 "EHLO dea.waldorf-gmbh.de") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:39:22 -0800 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f1R0coP26178; Tue, 27 Feb 2001 01:38:50 +0100 Date: Tue, 27 Feb 2001 01:38:50 +0100 From: Ralf Baechle To: "David S. Miller" Cc: "Petr Vandrovec" , netdev@oss.sgi.com Subject: Re: Failed assertion Message-ID: <20010227013850.B17836@bacchus.dhis.org> References: <86C68935F9@vcnet.vc.cvut.cz> <15002.59255.720093.731878@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <15002.59255.720093.731878@pizda.ninka.net>; from davem@redhat.com on Mon, Feb 26, 2001 at 03:32:07PM -0800 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 632 Lines: 17 On Mon, Feb 26, 2001 at 03:32:07PM -0800, David S. Miller wrote: > > > > This was with (almost ...) stock 2.4.1. > > > > > > What platform and what driver? It is the first time that any such > > > instance of this message has been reported. > > > > Dave, it is not first time. Unfortunately, I do not have my original > > message anymore, only Alexey's reply... > > Ok, Ralf please get the information I asked for and also the > backtrace printed after the kernel assertion failure. No backtrace, the machine did continue as you'd suspect after a print. The machine is a dual CPU Origin 200 with an IOC3 NIC. Ralf From owner-netdev@oss.sgi.com Mon Feb 26 16:42:53 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:42:34 -0800 Received: from pizda.ninka.net ([216.101.162.242]:4000 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:42:26 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA27128; Mon, 26 Feb 2001 16:39:13 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15002.63280.997712.635900@pizda.ninka.net> Date: Mon, 26 Feb 2001 16:39:12 -0800 (PST) To: Ralf Baechle Cc: "Petr Vandrovec" , netdev@oss.sgi.com Subject: Re: Failed assertion In-Reply-To: <20010227013850.B17836@bacchus.dhis.org> References: <86C68935F9@vcnet.vc.cvut.cz> <15002.59255.720093.731878@pizda.ninka.net> <20010227013850.B17836@bacchus.dhis.org> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 305 Lines: 11 Ralf Baechle writes: > No backtrace, the machine did continue as you'd suspect after a print. > The machine is a dual CPU Origin 200 with an IOC3 NIC. What is your current kernel based upon, some older 2.4.x or even 2.3.x variant? Or is it sync'd to current? Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 16:49:53 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:49:43 -0800 Received: from u-217-21.karlsruhe.ipdial.viaginterkom.de ([62.180.21.217]:20722 "EHLO dea.waldorf-gmbh.de") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:49:34 -0800 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f1R0nCY26277; Tue, 27 Feb 2001 01:49:12 +0100 Date: Tue, 27 Feb 2001 01:49:12 +0100 From: Ralf Baechle To: "David S. Miller" Cc: "Petr Vandrovec" , netdev@oss.sgi.com Subject: Re: Failed assertion Message-ID: <20010227014912.C17836@bacchus.dhis.org> References: <86C68935F9@vcnet.vc.cvut.cz> <15002.59255.720093.731878@pizda.ninka.net> <20010227013850.B17836@bacchus.dhis.org> <15002.63280.997712.635900@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <15002.63280.997712.635900@pizda.ninka.net>; from davem@redhat.com on Mon, Feb 26, 2001 at 04:39:12PM -0800 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 391 Lines: 12 On Mon, Feb 26, 2001 at 04:39:12PM -0800, David S. Miller wrote: > Ralf Baechle writes: > > No backtrace, the machine did continue as you'd suspect after a print. > > The machine is a dual CPU Origin 200 with an IOC3 NIC. > > What is your current kernel based upon, some older 2.4.x or > even 2.3.x variant? Or is it sync'd to current? 2.4.1 based; network code is unmodified. Ralf From owner-netdev@oss.sgi.com Mon Feb 26 16:53:13 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:52:54 -0800 Received: from isis.its.uow.edu.au ([130.130.68.21]:36844 "EHLO isis.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:52:48 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by isis.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id LAA17924; Tue, 27 Feb 2001 11:52:33 +1100 (EST) Message-ID: <3A9AFA50.ABF4FB17@uow.edu.au> Date: Tue, 27 Feb 2001 00:52:32 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: Ralf Baechle , Petr Vandrovec , netdev@oss.sgi.com Subject: Re: Failed assertion References: <86C68935F9@vcnet.vc.cvut.cz> <15002.59255.720093.731878@pizda.ninka.net> <20010227013850.B17836@bacchus.dhis.org> <15002.63280.997712.635900@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1038 Lines: 31 "David S. Miller" wrote: > > Ralf Baechle writes: > > No backtrace, the machine did continue as you'd suspect after a print. > > The machine is a dual CPU Origin 200 with an IOC3 NIC. > > What is your current kernel based upon, some older 2.4.x or > even 2.3.x variant? Or is it sync'd to current? Could this be a driver problem? This code: netif_rx(skb); ip->rx_skbs[rx_entry] = NULL; /* Poison */ new_skb = ioc3_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC); if (!new_skb) { /* Ouch, drop packet and just recycle packet to keep the ring filled. */ ip->stats.rx_dropped++; new_skb = skb; goto next; } looks scary. We've passed an skb to the network stack, but we can continue to make it available to the device driver at the same time. I'd suggest a printk() in there, plus perhaps do the alloc_skb _before_ the netif_rx(). Don't pass the skb to the stack if it is to be recycled. From owner-netdev@oss.sgi.com Mon Feb 26 16:57:55 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 16:57:44 -0800 Received: from pizda.ninka.net ([216.101.162.242]:14752 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 16:57:38 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA29680; Mon, 26 Feb 2001 16:53:50 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15002.64158.467357.343401@pizda.ninka.net> Date: Mon, 26 Feb 2001 16:53:50 -0800 (PST) To: Andrew Morton Cc: Ralf Baechle , Petr Vandrovec , netdev@oss.sgi.com Subject: Re: Failed assertion In-Reply-To: <3A9AFA50.ABF4FB17@uow.edu.au> References: <86C68935F9@vcnet.vc.cvut.cz> <15002.59255.720093.731878@pizda.ninka.net> <20010227013850.B17836@bacchus.dhis.org> <15002.63280.997712.635900@pizda.ninka.net> <3A9AFA50.ABF4FB17@uow.edu.au> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 295 Lines: 16 Andrew Morton writes: > Could this be a driver problem? > netif_rx(skb); ... > new_skb = skb; This is illegal and broken and will never work. Once you give an skb to netif_rx() it is not yours to reference any longer. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Feb 26 17:00:54 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 17:00:45 -0800 Received: from u-217-21.karlsruhe.ipdial.viaginterkom.de ([62.180.21.217]:26610 "EHLO dea.waldorf-gmbh.de") by oss.sgi.com with ESMTP id ; Mon, 26 Feb 2001 17:00:32 -0800 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f1R0xul26378; Tue, 27 Feb 2001 01:59:56 +0100 Date: Tue, 27 Feb 2001 01:59:56 +0100 From: Ralf Baechle To: Andrew Morton Cc: "David S. Miller" , Petr Vandrovec , netdev@oss.sgi.com Subject: Re: Failed assertion Message-ID: <20010227015956.D17836@bacchus.dhis.org> References: <86C68935F9@vcnet.vc.cvut.cz> <15002.59255.720093.731878@pizda.ninka.net> <20010227013850.B17836@bacchus.dhis.org> <15002.63280.997712.635900@pizda.ninka.net> <3A9AFA50.ABF4FB17@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3A9AFA50.ABF4FB17@uow.edu.au>; from andrewm@uow.edu.au on Tue, Feb 27, 2001 at 12:52:32AM +0000 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1517 Lines: 43 On Tue, Feb 27, 2001 at 12:52:32AM +0000, Andrew Morton wrote: > Date: Tue, 27 Feb 2001 00:52:32 +0000 > From: Andrew Morton > To: "David S. Miller" > CC: Ralf Baechle , Petr Vandrovec , > netdev@oss.sgi.com > Subject: Re: Failed assertion > > "David S. Miller" wrote: > > > > Ralf Baechle writes: > > > No backtrace, the machine did continue as you'd suspect after a print. > > > The machine is a dual CPU Origin 200 with an IOC3 NIC. > > > > What is your current kernel based upon, some older 2.4.x or > > even 2.3.x variant? Or is it sync'd to current? > > Could this be a driver problem? This code: > > netif_rx(skb); > > ip->rx_skbs[rx_entry] = NULL; /* Poison */ > > new_skb = ioc3_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC); > if (!new_skb) { > /* Ouch, drop packet and just recycle packet > to keep the ring filled. */ > ip->stats.rx_dropped++; > new_skb = skb; > goto next; > } > > looks scary. We've passed an skb to the network stack, > but we can continue to make it available to the device > driver at the same time. > > I'd suggest a printk() in there, plus perhaps do the > alloc_skb _before_ the netif_rx(). Don't pass the skb > to the stack if it is to be recycled. Now that you point my eyes at it the crime is more than obvious, thanks ... Ralf From owner-netdev@oss.sgi.com Mon Feb 26 18:31:54 2001 Received: by oss.sgi.com id ; Mon, 26 Feb 2001 18:31:45 -0800 Received: from [204.244.205.25] ([204.244.205.25]:22088 "HELO post.gateone.com") by oss.sgi.com with SMTP id ; Mon, 26 Feb 2001 18:31:33 -0800 Received: (qmail 407 invoked from network); 27 Feb 2001 02:31:28 -0000 Received: from mystery.wizard.ca (HELO mistress) (204.244.205.8) by mail.wizard.ca with SMTP; 27 Feb 2001 02:31:28 -0000 From: Michael Peddemors Reply-To: michael@linuxmagic.com Organization: Wizard Internet Services To: "Benjamin C.R. LaHaise" , "David S. Miller" Subject: Re: [UPDATE] zerocopy.. While working on ip.h stuff Date: Mon, 26 Feb 2001 19:41:54 -0800 X-Mailer: KMail [version 1.1.95.0] Content-Type: text/plain Cc: linux-kernel@vger.kernel.org, netdev@oss.sgi.com References: In-Reply-To: MIME-Version: 1.0 Message-Id: <0102261941540L.02007@mistress> Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1640 Lines: 38 On Mon, 26 Feb 2001, Benjamin C.R. LaHaise wrote: > On Mon, 26 Feb 2001, David S. Miller wrote: > > At gigapacket rates, it becomes an issue. This guy is talking about > > tinkering with new IP _options_, not just the header. So even if the > > IP header itself fits totally in a cache line, the options afterwardsd > > likely will not and thus require another cache miss. Yes, I expect to use the whole of the allowed size :) So instead of the more common IP Header length of 20 bytes, I will be using 25-60 bytes for a header, (But so does source routing) and the router RFC says that we should handle it... Now, of course, you have raised the question of whether that would be handled effeciently with the current kernel code.. > Hmmm, one way around this is to have the packet queue store things in > in a linear array of pointers to data areas, then process things in > bursts, ie: > > - find packet data areas for queued packets > - walk list doing prefetches of ip header and options > - then actually do the packet processing (save output for later) > > That will require a number of new hooks for pipelining operations, though. > Just a thought. > > -ben -- "Catch the magic of Linux...." -------------------------------------------------------- Michael Peddemors - Senior Consultant Unix Administration - WebSite Hosting Network Services - Programming Wizard Internet Services http://www.wizard.ca Linux Support Specialist - http://www.linuxmagic.com -------------------------------------------------------- (604) 589-0037 Beautiful British Columbia, Canada -------------------------------------------------------- From owner-netdev@oss.sgi.com Tue Feb 27 11:16:02 2001 Received: by oss.sgi.com id ; Tue, 27 Feb 2001 11:15:52 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:7428 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 27 Feb 2001 11:15:42 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA20842; Tue, 27 Feb 2001 22:14:55 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200102271914.WAA20842@ms2.inr.ac.ru> Subject: Re: New net features for added performance To: jgarzik@mandrakesoft.COM (Jeff Garzik) Date: Tue, 27 Feb 2001 22:14:55 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <3A9AEFAF.1DC89A8A@mandrakesoft.com> from "Jeff Garzik" at Feb 27, 1 03:15:07 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 657 Lines: 20 Hello! > I only want to know if more are coming, not actually pass multiples.. BTW trying to strain brains how could driver to use this... TX irq mitigation is maximally aggressive even without this, such information is simply useless for mitigation. Merging index updates? Well, it is too easy to do again (for drivers using tx spinlock). Just do not tell driver about new index until at least one of two conditions happen: 1. >N frames are queued at driver (in hard_start_xmit) 2. ; Wed, 28 Feb 2001 06:49:29 -0800 Received: from rhubarb.arl.qwestip.net ([208.47.0.250]:1551 "EHLO rhubarb.arl.qwestip.net") by oss.sgi.com with ESMTP id ; Wed, 28 Feb 2001 06:49:05 -0800 Received: from localhost (jason@localhost) by rhubarb.arl.qwestip.net (8.11.2/8.11.2) with ESMTP id f1SEmr106323 for ; Wed, 28 Feb 2001 09:48:53 -0500 Date: Wed, 28 Feb 2001 09:48:52 -0500 (EST) From: Jason Duerstock cc: Subject: netdevice->interrupt and netdevice->tbusy from 2.2 to 2.4 In-Reply-To: <200102271914.WAA20842@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII To: unlisted-recipients:; (no To-header on input) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 233 Lines: 8 I am trying to get some old 2.2 mac68k network drivers to compile under Linux 2.4. Is there a short explanation somewhere of what happened to these fields and the proper fields/routines I need to fix this properply? Thanks Jason From owner-netdev@oss.sgi.com Wed Feb 28 07:31:39 2001 Received: by oss.sgi.com id ; Wed, 28 Feb 2001 07:31:19 -0800 Received: from 2-031.cwb-adsl.telepar.net.br ([200.193.161.31]:38386 "HELO brinquedo.distro.conectiva") by oss.sgi.com with SMTP id ; Wed, 28 Feb 2001 07:31:12 -0800 Received: by brinquedo.distro.conectiva (Postfix, from userid 501) id 64F502739; Wed, 28 Feb 2001 10:51:49 -0300 (EST) Date: Wed, 28 Feb 2001 10:51:49 -0300 From: Arnaldo Carvalho de Melo To: Jason Duerstock Cc: Subject: Re: netdevice->interrupt and netdevice->tbusy from 2.2 to 2.4 Message-ID: <20010228105149.L24856@conectiva.com.br> References: <200102271914.WAA20842@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i In-Reply-To: ; from jason@rhubarb.arl.qwestip.net on Wed, Feb 28, 2001 at 09:48:52AM -0500 X-Url: http://advogato.org/person/acme Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 439 Lines: 12 Em Wed, Feb 28, 2001 at 09:48:52AM -0500, Jason Duerstock escreveu: > I am trying to get some old 2.2 mac68k network drivers to compile under > Linux 2.4. Is there a short explanation somewhere of what happened to > these fields and the proper fields/routines I need to fix this properply? try this, from Andi: Try http://www.firstfloor.org/~andi/softnet Unfortunately some minor things have changed again since these mails - Arnaldo From owner-netdev@oss.sgi.com Wed Feb 28 18:43:57 2001 Received: by oss.sgi.com id ; Wed, 28 Feb 2001 18:43:47 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:1191 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Wed, 28 Feb 2001 18:43:22 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 3E5711F6A; Wed, 28 Feb 2001 21:21:24 -0500 (EST) Message-ID: <3A9DB224.A603846D@mandrakesoft.com> Date: Wed, 28 Feb 2001 21:21:24 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2 i686) X-Accept-Language: en MIME-Version: 1.0 To: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: rx_copybreak... Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 439 Lines: 12 Instead of unconditionally copying packet sizes over rx_copybreak, in many ethernet drivers... is it worth it for the driver to check and see if the packet is already aligned? Or is that such a rare case that it shouldn't be worth it? Jeff -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Wed Feb 28 20:45:17 2001 Received: by oss.sgi.com id ; Wed, 28 Feb 2001 20:45:07 -0800 Received: from asbestos.linuxcare.com.au ([203.17.0.30]:45302 "EHLO halfway") by oss.sgi.com with ESMTP id ; Wed, 28 Feb 2001 20:44:43 -0800 Received: from halfway ([127.0.0.1] helo=linuxcare.com.au ident=rusty) by halfway with esmtp (Exim 3.22 #1 (Debian)) id 14YKx2-0007bp-00; Thu, 01 Mar 2001 15:44:28 +1100 From: Rusty Russell To: Richard Guy Briggs Cc: netfilter-devel@us5.samba.org, linux-ipsec@freeswan.org, netdev@oss.sgi.com Subject: Re: On Extending NFMark... In-reply-to: Your message of "Tue, 20 Feb 2001 11:01:13 CDT." <20010220110113.C3910@grendel.conscoop.ottawa.on.ca> Date: Thu, 01 Mar 2001 15:44:28 +1100 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2011 Lines: 68 In message <20010220110113.C3910@grendel.conscoop.ottawa.on.ca> you write: > > sorry, but this doesn't make sense to me. adding a single pointer to > > struct sk_buff is a _way_ smaller patch to the kernel than changing > > huge amounts of code in netfilter, iptables, iproute2 and tc. > > I agree it is a way smaller patch, but it is still a patch that we > want to avoid. > > ...that could be useful to others who want to mark packets without > guessing if any other netfilter module is going to inadvertantly use > or modify it. The classic nfmark field problem; that there is only one. See also, nfct. For a generic linked-list of blobs approach, there are several problems: 1) How do I tell which one is mine? 2) What happens when packet is copied? 3) What happens when packet is cloned? 4) What happens when packet is destroyed? For nfmark, the answers are: 2) Copy it. 3) Copy it. 4) Free it. For nfct, the answers are: 2) Copy it, bump refcnt. 3) Copy it, bump refcnt. 4) Dec refcnt, if zero call destructor. Some other stuff has similar properties. struct nf_info { struct nf_info *next; atomic_t refcnt; /* Unique identifier, suggest a valid unique pointer */ void *identifier; /* Called on skb_copy */ int (*copy)(struct skbuff *oldskb, struct skbuff *newskb, struct nf_info *me); /* Called on skb_clone */ int (*clone)(struct skbuff *oldskb, struct skbuff *newskb, struct nf_info *me); /* Called if atomic_dec_and_test(&refcnt) == 0 in __kfree_skb */ int (*destroy)(struct skbuff *dyingskb, struct nf_info *me); /* Your data goes here... */ }; static inline skb_get_nfinfo(struct sk_buff *skb, void *id) { struct nf_info *i; for (i = (skb)->nf_info; i && i->identifier != id; i = i->next); return i; } My gut reaction is against the heaviness of this proposal, but maybe the networking gurus think it's worth the pain, as there *is* other cruft in the skb which may benifit... Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Wed Feb 28 22:24:19 2001 Received: by oss.sgi.com id ; Wed, 28 Feb 2001 22:24:10 -0800 Received: from mail.zmailer.org ([194.252.70.162]:61196 "EHLO zmailer.org") by oss.sgi.com with ESMTP id ; Wed, 28 Feb 2001 22:23:51 -0800 Received: (from localhost user: 'mea' uid#500 fake: STDIN (mea@zmailer.org)) by mail.zmailer.org id ; Thu, 1 Mar 2001 08:23:35 +0200 Date: Thu, 1 Mar 2001 08:23:35 +0200 From: Matti Aarnio To: Jeff Garzik Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: rx_copybreak... Message-ID: <20010301082335.Q15688@mea-ext.zmailer.org> References: <3A9DB224.A603846D@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3A9DB224.A603846D@mandrakesoft.com>; from jgarzik@mandrakesoft.com on Wed, Feb 28, 2001 at 09:21:24PM -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1146 Lines: 29 On Wed, Feb 28, 2001 at 09:21:24PM -0500, Jeff Garzik wrote: > Instead of unconditionally copying packet sizes over rx_copybreak, in > many ethernet drivers... is it worth it for the driver to check and see > if the packet is already aligned? Or is that such a rare case that it > shouldn't be worth it? To know if packet is aligned or not, one must look into the packet to analyze protocols. IMO it is far better to setup the card to do RX DMA into DIX aligned IP frame (the most common case), but hardware which is inherently unable to do that has drivers knowing it. My nonexhaustive list of cards says: - Tulip: RX DMA alignment: 4 bytes (buffer sizes have same alignment, one can't do tricks like receive 12 bytes to the first buffer, rest to the next) - 3c59x: RX DMA alignment: 1 byte (all 3c90X cards) - eepro100: RX DMA alignment: unknown ! (for lack of certain part of intel documents) > Jeff > -- > Jeff Garzik | "You see, in this world there's two kinds of > Building 1024 | people, my friend: Those with loaded guns > MandrakeSoft | and those who dig. You dig." --Blondie /Matti Aarnio From owner-netdev@oss.sgi.com Wed Feb 28 22:54:19 2001 Received: by oss.sgi.com id ; Wed, 28 Feb 2001 22:54:10 -0800 Received: from asbestos.linuxcare.com.au ([203.17.0.30]:2300 "EHLO halfway") by oss.sgi.com with ESMTP id ; Wed, 28 Feb 2001 22:53:47 -0800 Received: from halfway ([127.0.0.1] helo=linuxcare.com.au ident=rusty) by halfway with esmtp (Exim 3.22 #1 (Debian)) id 14YMy1-00086n-00; Thu, 01 Mar 2001 17:53:37 +1100 From: Rusty Russell To: suchots@thaicom.net Cc: netdev@oss.sgi.com Subject: Re: Error with mrouted 3.9b3+ios12 In-reply-to: Your message of "Fri, 23 Feb 2001 16:54:29 +0700." <000101c09d7e$9e06f4b0$e7d0b7ca@pegasus> Date: Thu, 01 Mar 2001 17:53:37 +1100 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 747 Lines: 21 In message <000101c09d7e$9e06f4b0$e7d0b7ca@pegasus> you write: > > Feb 23 09:58:14 ns kernel: ip_dev_loopback_xmit: bad owned skb = cfc80f00: PR E_ROUTING FORWARD POST_ROUTING > Feb 23 09:58:14 ns kernel: skb: pf=2 (unowned) dev=eth1 len=1500 > Feb 23 09:58:14 ns kernel: PROTO=17 202.183.208.204:1449 239.1.1.2:26911 L=15 00 S=0x00 I=54455 F=0x2000 T=4 Hmm, this means that a packet went through PRE_ROUTING, FORWARD and POST_ROUTING, then hit loopback. It's kinda wierd behaviour: multicast packets will pass through PRE_ROUTING, FORWARD, POST_ROUTING (ip_mc_output), then net_rx, net_rx_action, ip_rcv, then ???. I'm surprised you don't get more messages, but they're fairly harmless. Rusty. -- Premature optmztion is rt of all evl. --DK