From owner-netdev@oss.sgi.com Wed Aug 2 08:57:33 2000 Received: by oss.sgi.com id ; Wed, 2 Aug 2000 08:57:14 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:13572 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Wed, 2 Aug 2000 08:56:52 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id AAA00820; Thu, 3 Aug 2000 00:55:59 +0900 To: netdev@oss.sgi.com, linux-ipv6@inner.net CC: usagi@v6.linux.or.jp Subject: Fw: IPv6 Implementation Reports From: Hideaki YOSHIFUJI X-Mailer: Mew version 1.94 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Thu_Aug__3_00:55:54_2000_700)--" Content-Transfer-Encoding: 7bit Message-Id: <20000803005559Y.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Thu, 03 Aug 2000 00:55:59 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 101 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing ----Next_Part(Thu_Aug__3_00:55:54_2000_700)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, I cannot find any reports from Linux. Could anyone fill those forms? -- Hideaki YOSHIFUJI @ USAGI Project Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 ----Next_Part(Thu_Aug__3_00:55:54_2000_700)-- Content-Type: Message/Rfc822 Content-Transfer-Encoding: 7bit Return-Path: Return-Path: Received: from ecei.tohoku.ac.jp (eimail.ecei.tohoku.ac.jp [130.34.195.2]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id AAA00760 for ; Thu, 3 Aug 2000 00:33:25 +0900 X-Authentication-Warning: cerberus.nemoto.ecei.tohoku.ac.jp: Host eimail.ecei.tohoku.ac.jp [130.34.195.2] claimed to be ecei.tohoku.ac.jp Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by ecei.tohoku.ac.jp (8.9.3/3.7W) with ESMTP id AAA13477 for ; Thu, 3 Aug 2000 00:33:24 +0900 (JST) Received: from engmail2.Eng.Sun.COM ([129.146.1.25]) by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id IAA10923; Wed, 2 Aug 2000 08:32:53 -0700 (PDT) Received: from sunroof.eng.sun.com (sunroof.Eng.Sun.COM [129.146.168.88]) by engmail2.Eng.Sun.COM (8.9.3+Sun/8.9.3/ENSMAIL,v1.7) with ESMTP id IAA27823; Wed, 2 Aug 2000 08:30:24 -0700 (PDT) Received: (from majordomo@localhost) by sunroof.eng.sun.com (8.10.2+Sun/8.10.2) id e72FRk425798 for ipng-dist; Wed, 2 Aug 2000 08:27:46 -0700 (PDT) Received: from engmail2.Eng.Sun.COM (engmail2 [129.146.1.25]) by sunroof.eng.sun.com (8.10.2+Sun/8.10.2) with ESMTP id e72FRc625791 for ; Wed, 2 Aug 2000 08:27:38 -0700 (PDT) Received: from lukla.Sun.COM (lukla.Central.Sun.COM [129.147.5.31]) by engmail2.Eng.Sun.COM (8.9.3+Sun/8.9.3/ENSMAIL,v1.7) with ESMTP id IAA27095 for ; Wed, 2 Aug 2000 08:27:38 -0700 (PDT) Received: from mailhost.iprg.nokia.com (mailhost.iprg.nokia.com [205.226.5.12]) by lukla.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id JAA29126 for ; Wed, 2 Aug 2000 09:27:37 -0600 (MDT) Received: from darkstar.iprg.nokia.com (darkstar.iprg.nokia.com [205.226.5.69]) by mailhost.iprg.nokia.com (8.9.3/8.9.3-GLGS) with ESMTP id IAA03263; Wed, 2 Aug 2000 08:27:36 -0700 (PDT) Received: (from root@localhost) by darkstar.iprg.nokia.com (8.9.3/8.9.3-VIRSCAN) id IAA07209; Wed, 2 Aug 2000 08:27:32 -0700 X-Virus-Scanned: Wed, 2 Aug 2000 08:27:32 -0700 Nokia Silicon Valley Email Exploit Scanner Received: from wired-129-36.ietf.marconi.com (147.73.129.36, claiming to be "spruce.iprg.nokia.com") by darkstar.iprg.nokia.com(WTS.12.69) smtpdH0x5rT; Wed, 02 Aug 2000 08:27:29 PDT Message-Id: <4.3.2.7.2.20000802082247.025a7ae8@mailhost.iprg.nokia.com> X-Sender: hinden@mailhost.iprg.nokia.com X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Wed, 02 Aug 2000 08:27:14 -0700 To: ipng@sunroof.eng.sun.com From: Bob Hinden Subject: IPv6 Implementation Reports Cc: hinden@iprg.nokia.com Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-ipng@sunroof.eng.sun.com Precedence: bulk As part of moving the IPv6 over documents to Draft Standard we need to collect implementation reports. With considerable help from Matt Crawford I have created templates for an implementation report. These can be found at: http://playground.sun.com/pub/ipng/implementation-reports/templates/ The templates for Ethernet, Token Ring, and FDDI. The files names are of the form ipv6-over-.txt. If you have an implementation please fill out the template and send it to me. Note: I am missing a template for IPv6 over ARCNET. I will announce it when it is available. At the same location there are also templates for IPv6, Addressing Architecture, Address Aggregation format, Path MTU, and ICMP. If you have not submitted one previously, please do so. The current implementation reports can be found at: http://playground.sun.com/pub/ipng/implementation-reports/ Thanks, Bob -------------------------------------------------------------------- IETF IPng Working Group Mailing List IPng Home Page: http://playground.sun.com/ipng FTP archive: ftp://playground.sun.com/pub/ipng Direct all administrative requests to majordomo@sunroof.eng.sun.com -------------------------------------------------------------------- ----Next_Part(Thu_Aug__3_00:55:54_2000_700)---- From owner-netdev@oss.sgi.com Thu Aug 3 17:05:24 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 17:05:04 -0700 Received: from mail.phillipsmotorsports.com ([208.129.192.28]:63757 "EHLO imail.ipns.com") by oss.sgi.com with ESMTP id ; Thu, 3 Aug 2000 17:04:24 -0700 Received: from danb [209.210.132.93] by imail.ipns.com (SMTPD32-6.00) id A8FFAED02A0; Thu, 03 Aug 2000 17:06:23 -0700 From: "Dan Browning" To: , , , Subject: Development status on bthernet bonding? Date: Thu, 3 Aug 2000 17:01:24 -0700 Message-ID: <000001bffda7$21fc4590$1500000a@danb> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I haven't found Documentation/networking/bonding.txt in any of the new (2.3.x) kernels, is Bonding (esp. F.E.C.) a feature that will only be in 2.2.x? Is there anyone actively maintaining bonding support? If not, can someone open a sourceforge.net project and start collecting information on it? I think FEC is a pretty great functionality for servers that need more than 100mbps but can't afford 1gbps, and I'm supprised that there isn't more clamoring for it's inclusion. Additionally, bonding.txt mentions that HA isn't possible due to the design of the network card drivers. Is this limitation being worked on? I really appreciate your communication, Dan Browning Network Administrator Cyclone Computer Systems From owner-netdev@oss.sgi.com Thu Aug 3 20:24:25 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 20:24:15 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:6148 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Thu, 3 Aug 2000 20:23:47 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id MAA00622; Fri, 4 Aug 2000 12:22:57 +0900 To: netdev@oss.sgi.com, linux-kernel@vger.rutgers.edu CC: linux-ipv6-jp@linux.or.jp Subject: Don't allow mapped address after binding to ipv4. From: Hideaki YOSHIFUJI X-Mailer: Mew version 1.94 on XEmacs 21.1 (Capitol Reef) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20000804122257N.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Fri, 04 Aug 2000 12:22:57 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 48 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi forks, linux-2.2.16 and linux-2.4.0-test5 allow to let an application bind an inet6 socket sd6 to an address / a port that is already bound to inet socket sd4. For example, the 2nd step below should fail. 1. bind sd4 to 127.0.0.1 2. bind sd6 to ::ffff:127.0.0.1 Here a patch to fix this problem (for 2.2.16; maybe for 2.4.0-test5). Thanks in advance. diff -u linux-2.2.16/net/ipv6/tcp_ipv6.c linux-2.2.16-fix/net/ipv6/tcp_ipv6.c --- linux-2.2.16/net/ipv6/tcp_ipv6.c Thu May 4 09:16:53 2000 +++ linux-2.2.16-fix/net/ipv6/tcp_ipv6.c Fri Aug 4 01:08:18 2000 @@ -140,7 +140,12 @@ !ipv6_addr_cmp(&sk->net_pinfo.af_inet6.rcv_saddr, sk2->state != TCP_TIME_WAIT ? &sk2->net_pinfo.af_inet6.rcv_saddr : - &((struct tcp_tw_bucket*)sk)->v6_rcv_saddr)) + &((struct tcp_tw_bucket*)sk)->v6_rcv_saddr) || + (addr_type == IPV6_ADDR_MAPPED && sk2->family == AF_INET && + sk->rcv_saddr == (sk2->state != TCP_TIME_WAIT ? + sk2->rcv_saddr : ((struct tcp_tw_bucket*)sk)->rcv_saddr) + ) + ) break; } } diff -u linux-2.2.16/net/ipv6/udp.c linux-2.2.16-fix/net/ipv6/udp.c --- linux-2.2.16/net/ipv6/udp.c Tue Aug 10 04:04:41 1999 +++ linux-2.2.16-fix/net/ipv6/udp.c Fri Aug 4 11:29:24 2000 @@ -105,7 +105,9 @@ (!sk2->rcv_saddr || addr_type == IPV6_ADDR_ANY || !ipv6_addr_cmp(&sk->net_pinfo.af_inet6.rcv_saddr, - &sk2->net_pinfo.af_inet6.rcv_saddr)) && + &sk2->net_pinfo.af_inet6.rcv_saddr) || + (addr_type == IPV6_ADDR_MAPPED && sk2->family == AF_INET && + sk->rcv_saddr == sk2->rcv_saddr)) && (!sk2->reuse || !sk->reuse)) goto fail; } -- Hideaki YOSHIFUJI @ USAGI Project Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Thu Aug 3 22:07:28 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 22:07:18 -0700 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:14602 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Thu, 3 Aug 2000 22:06:47 -0700 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.9.3) with ESMTP id WAA06588; Thu, 3 Aug 2000 22:50:22 -0700 Message-ID: <398A599E.CDAA7DFE@candelatech.com> Date: Thu, 03 Aug 2000 22:50:22 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.16 i586) X-Accept-Language: en MIME-Version: 1.0 To: Wallace Davis , "netdev@oss.sgi.com" , linux-net Subject: Strange TCP/IP problem? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I've been running 8-hour+ traffic runs on RTL-8139 10/100 ethernet cards connected by a 10/100 etherswitch and linux-kernel 2.2.14 <-> 2.2.16. I'm using three cards in each machine, one for mgt, and two others for data generation/testing. I use the ip command to set up source-based routing policies to direct the ethernet traffic out over the correct port. I'm running 4 Mbps of raw ethernet traffic, full duplex, 500Kbps of UDP traffic, and two 56Kbps TCP/IP connections. After about 6-8 hours, the IP connections stop transmitting traffic. I'm not completely sure whether the connection is actually broken or whether it just stops passing traffic.... I'm still looking into that... Sometimes the receive queue, as shown by netstat is very full, but the TX isn't (My program should be select'ing and pulling off that incoming data...) I find it interesting that when the TCP/IP is locked up, UDP and raw ethernet continue to flow just fine. I also telnet'ed into the LISTENing port, and it connected me just fine. I was able to get traffic running again by bouncing the Ethernet ports up/down with ifconfig and ip (to set up the source-based routing). Anyone know of any significant kernel problems with 2.2.14 or 2.2.16 that might cause this? Any other ideas?? When the system is in it's hosed up state, I see this on the side that is trying to connect to the other: [lanforge@card1 lanforge]$ netstat -an | grep 200 tcp 0 1 192.168.10.111:4730 192.168.10.211:20011 SYN_SENT tcp 0 1 192.168.10.111:4729 192.168.10.211:20010 SYN_SENT tcp 0 1 192.168.10.111:4725 192.168.10.211:20006 SYN_SENT Here is what the server (accept) side looks like: [lanforge@candle lanforge]$ netstat -an | grep 200 tcp 0 0 192.168.10.211:20011 0.0.0.0:* LISTEN tcp 0 0 192.168.10.211:20010 192.168.10.111:4729 SYN_RECV tcp 0 0 192.168.10.211:20010 0.0.0.0:* LISTEN tcp 0 0 192.168.10.211:20006 0.0.0.0:* LISTEN (I've seen where all three were in the SYN_RECV state, just didn't get a trace..) Thanks in advance for any ideas! Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu Aug 3 22:22:08 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 22:21:48 -0700 Received: from panic.ohr.gatech.edu ([130.207.47.194]:26130 "EHLO havoc.gtf.org") by oss.sgi.com with ESMTP id ; Thu, 3 Aug 2000 22:21:32 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id BAA30153; Fri, 4 Aug 2000 01:20:47 -0400 Message-ID: <398A52AF.A5CB4468@mandrakesoft.com> Date: Fri, 04 Aug 2000 01:20:47 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.73 [en] (X11; I; Linux 2.2.17pre13 i686) X-Accept-Language: en MIME-Version: 1.0 To: Ben Greear CC: Wallace Davis , "netdev@oss.sgi.com" , linux-net Subject: Re: Strange TCP/IP problem? References: <398A599E.CDAA7DFE@candelatech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Ben Greear wrote: > > I've been running 8-hour+ traffic runs on RTL-8139 10/100 ethernet cards connected > by a 10/100 etherswitch and linux-kernel 2.2.14 <-> 2.2.16. I'm using three cards in each > machine, one for mgt, and two others for data generation/testing. > I use the ip command to set up source-based routing policies to > direct the ethernet traffic out over the correct port. Does "ifconfig ethX down ; ifconfig ethX up" fix the problem? If so, it's probably a bug in the rtl8139 driver... Working on solving it, but it's not solved yet in any of the available drivers. Jeff -- Jeff Garzik | Building 1024 | Yossarian lives. MandrakeSoft, Inc. | From owner-netdev@oss.sgi.com Thu Aug 3 23:05:17 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 23:04:57 -0700 Received: from zmamail04.zma.compaq.com ([161.114.64.104]:10763 "HELO zmamail04.zma.compaq.com") by oss.sgi.com with SMTP id ; Thu, 3 Aug 2000 23:04:25 -0700 Received: by zmamail04.zma.compaq.com (Postfix, from userid 12345) id 61A1B93D; Fri, 4 Aug 2000 02:03:50 -0400 (EDT) Received: from excsin-gh01.asia.compaq.com (excsin-gh01.asia.compaq.com [16.177.2.7]) by zmamail04.zma.compaq.com (Postfix) with ESMTP id 391C1AD7; Fri, 4 Aug 2000 02:03:49 -0400 (EDT) Received: by excsin-gh01.asia.compaq.com with Internet Mail Service (5.5.2650.21) id ; Fri, 4 Aug 2000 14:03:47 +0800 Message-ID: <3C97F5A0910AD21193120000F8230C2405B8757C@zpoexc1.zpo.dec.com> From: "Cham, LP" To: netdev@oss.sgi.com, linux-net Subject: TCP Management Date: Fri, 4 Aug 2000 14:03:44 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I have 2 machines that act as a client-server. When a connection has been established, the server will keep sending msg to the client whereby the client will simply process and send back the response. The question is, will the response be sent out to the server when the server is still continuously sending the request. How is this managed and by which level - the kernel? How long and how much can the server holds the outgoing response before he has a chance to send out the response? Thanks and regards, Cham From owner-netdev@oss.sgi.com Thu Aug 3 23:28:47 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 23:28:37 -0700 Received: from m201-3-p21.warwick.net ([208.242.201.126]:31494 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Thu, 3 Aug 2000 23:28:12 -0700 Received: from circuit.moureaux.com (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with SMTP id CAA02842; Fri, 4 Aug 2000 02:29:50 -0400 Date: Fri, 4 Aug 2000 02:29:50 -0400 (EDT) From: Statux X-Sender: statux@circuit.moureaux.com To: "Cham, LP" cc: netdev@oss.sgi.com, linux-net Subject: Re: TCP Management In-Reply-To: <3C97F5A0910AD21193120000F8230C2405B8757C@zpoexc1.zpo.dec.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing (tries to glue your words back together) Define message and response, etc. Are you talking packets? or something like an application level reply? On Fri, 4 Aug 2000, Cham, LP wrote: > > Hi, > > I have 2 machines that act as a client-server. > > When a connection has been established, the server > will keep sending msg to the client whereby the client > will simply process and send back the response. > > The question is, will the response be sent out to the > server when the server is still continuously sending > the request. How is this managed and by which level - the kernel? > > How long and how much can the server holds the outgoing response > before he has a chance to send out the response? > > Thanks and regards, > Cham > From owner-netdev@oss.sgi.com Thu Aug 3 23:32:08 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 23:31:47 -0700 Received: from ztxmail03.ztx.compaq.com ([161.114.1.207]:37394 "HELO ztxmail03.ztx.compaq.com") by oss.sgi.com with SMTP id ; Thu, 3 Aug 2000 23:31:23 -0700 Received: by ztxmail03.ztx.compaq.com (Postfix, from userid 12345) id 9EA2EB4C; Fri, 4 Aug 2000 01:30:47 -0500 (CDT) Received: from excsin-gh02.asia.compaq.com (excsin-gh02.asia.compaq.com [16.177.2.8]) by ztxmail03.ztx.compaq.com (Postfix) with ESMTP id 576A346F; Fri, 4 Aug 2000 01:30:46 -0500 (CDT) Received: by excsin-gh02.asia.compaq.com with Internet Mail Service (5.5.2650.21) id ; Fri, 4 Aug 2000 14:30:44 +0800 Message-ID: <3C97F5A0910AD21193120000F8230C2405B87581@zpoexc1.zpo.dec.com> From: "Cham, LP" To: Statux Cc: netdev@oss.sgi.com, linux-net Subject: RE: TCP Management Date: Fri, 4 Aug 2000 14:30:42 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing The message and response are at the application layer. >-----Original Message----- >From: Statux [mailto:statux@bigfoot.com] >Sent: Friday, August 04, 2000 2:30 PM >To: Cham, LP >Cc: netdev@oss.sgi.com; linux-net >Subject: Re: TCP Management > > >(tries to glue your words back together) > >Define message and response, etc. Are you talking packets? or something >like an application level reply? > >On Fri, 4 Aug 2000, Cham, LP wrote: > >> >> Hi, >> >> I have 2 machines that act as a client-server. >> >> When a connection has been established, the server >> will keep sending msg to the client whereby the client >> will simply process and send back the response. >> >> The question is, will the response be sent out to the >> server when the server is still continuously sending >> the request. How is this managed and by which level - the kernel? >> >> How long and how much can the server holds the outgoing response >> before he has a chance to send out the response? >> >> Thanks and regards, >> Cham >> > From owner-netdev@oss.sgi.com Thu Aug 3 23:44:27 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 23:44:17 -0700 Received: from m201-3-p21.warwick.net ([208.242.201.126]:45318 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Thu, 3 Aug 2000 23:43:45 -0700 Received: from circuit.moureaux.com (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with SMTP id CAA02890; Fri, 4 Aug 2000 02:45:27 -0400 Date: Fri, 4 Aug 2000 02:45:27 -0400 (EDT) From: Statux X-Sender: statux@circuit.moureaux.com To: "Cham, LP" cc: netdev@oss.sgi.com, linux-net Subject: RE: TCP Management In-Reply-To: <3C97F5A0910AD21193120000F8230C2405B87581@zpoexc1.zpo.dec.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I think you answered most of your own question... if you're working in the application layer.. then most of the TCP details are already taken care of. > >> The question is, will the response be sent out to the > >> server when the server is still continuously sending > >> the request. How is this managed and by which level - the kernel? The client and server don't know exactly what each other are doing until data is received... provided there is no corruption in the data. Packets are handled by TCP.. which is controlled by the kernel, but if you're doing application level replies, etc.. then it's the application.. which then runs through TCP, then IP (v4 or 6), then the datalink, etc. > >> How long and how much can the server holds the outgoing response > >> before he has a chance to send out the response? This isn't written too well. Hard to understand :) From owner-netdev@oss.sgi.com Thu Aug 3 23:54:47 2000 Received: by oss.sgi.com id ; Thu, 3 Aug 2000 23:54:27 -0700 Received: from ztxmail03.ztx.compaq.com ([161.114.1.207]:8457 "HELO ztxmail03.ztx.compaq.com") by oss.sgi.com with SMTP id ; Thu, 3 Aug 2000 23:53:58 -0700 Received: by ztxmail03.ztx.compaq.com (Postfix, from userid 12345) id 57EE84A6; Fri, 4 Aug 2000 01:53:23 -0500 (CDT) Received: from excsin-gh02.asia.compaq.com (excsin-gh02.asia.compaq.com [16.177.2.8]) by ztxmail03.ztx.compaq.com (Postfix) with ESMTP id 1547B41B; Fri, 4 Aug 2000 01:53:22 -0500 (CDT) Received: by excsin-gh02.asia.compaq.com with Internet Mail Service (5.5.2650.21) id ; Fri, 4 Aug 2000 14:53:20 +0800 Message-ID: <3C97F5A0910AD21193120000F8230C2405B87582@zpoexc1.zpo.dec.com> From: "Cham, LP" To: Statux Cc: netdev@oss.sgi.com, linux-net Subject: RE: TCP Management Date: Fri, 4 Aug 2000 14:53:18 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing yes, when I am working on the application level, most of the TCP details are taken care of. But, I want to know the underlying architecture. That is, all these application message will be broken down into packets at the TCP layer. And, are there two separate buffers at that layer to handle incoming and outgoing packets? If the server continuously send packets to the client, does the client has a chance to send back the processed packet? If not, outgoing packets must be kept in the buffer till he has a chance to be sent out. But, there must be limit as to how much the buffer can hold. Is this implementation at the kernel layer and does it varies between OS? What I have seen from at the application layer is that all return response will ONLY be received until all requests have been sent. regards, Cham >-----Original Message----- >From: Statux [mailto:statux@bigfoot.com] >Sent: Friday, August 04, 2000 2:45 PM >To: Cham, LP >Cc: netdev@oss.sgi.com; linux-net >Subject: RE: TCP Management > > >I think you answered most of your own question... if you're >working in the >application layer.. then most of the TCP details are already taken care >of. > >> >> The question is, will the response be sent out to the >> >> server when the server is still continuously sending >> >> the request. How is this managed and by which level - the kernel? > >The client and server don't know exactly what each other are >doing until >data is received... provided there is no corruption in the >data. Packets >are handled by TCP.. which is controlled by the kernel, but if you're >doing application level replies, etc.. then it's the >application.. which >then runs through TCP, then IP (v4 or 6), then the datalink, etc. > >> >> How long and how much can the server holds the outgoing response >> >> before he has a chance to send out the response? > >This isn't written too well. Hard to understand :) > From owner-netdev@oss.sgi.com Fri Aug 4 00:09:18 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 00:09:08 -0700 Received: from m201-3-p21.warwick.net ([208.242.201.126]:53510 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 00:08:56 -0700 Received: from circuit.moureaux.com (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with SMTP id DAA02968; Fri, 4 Aug 2000 03:10:41 -0400 Date: Fri, 4 Aug 2000 03:10:41 -0400 (EDT) From: Statux X-Sender: statux@circuit.moureaux.com To: "Cham, LP" cc: netdev@oss.sgi.com, linux-net Subject: RE: TCP Management In-Reply-To: <3C97F5A0910AD21193120000F8230C2405B87582@zpoexc1.zpo.dec.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > yes, when I am working on the application level, > most of the TCP details are taken care of. Right. > But, I want to know the underlying architecture. > That is, all these application message will be broken > down into packets at the TCP layer. Right. > And, are there two separate buffers at that layer to handle > incoming and outgoing packets? If the server continuously > send packets to the client, does the client has a chance > to send back the processed packet? There are two buffers: one for sending and one for receiving. > If not, outgoing packets must be kept in the buffer > till he has a chance to be sent out. But, there must be > limit as to how much the buffer can hold. Is this > implementation at the kernel layer and does it varies > between OS? There are limitations to every buffer. Different kernel implementations and different operating systems are roughly the same thing, unless something's been tweaked, etc. The TCP/IP defaults are defined by the implementation and do often vary from OS to OS... but these defaults can usually be changed at runtime using different functions. High and Low water marks for sending and receiving are changable. A book or some other TCP/IP info resource is helpful. I have my UNIX Network Programming book from W. Richard Stevens :) Very good book. > What I have seen from at the application layer is that > all return response will ONLY be received until all > requests have been sent. It depends on how it's implemented. If you're doing batch work.. like send everything.. then receive everything.. then you'll be stuck with what you're describing, for example. From owner-netdev@oss.sgi.com Fri Aug 4 00:21:48 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 00:21:37 -0700 Received: from ztxmail03.ztx.compaq.com ([161.114.1.207]:59408 "HELO ztxmail03.ztx.compaq.com") by oss.sgi.com with SMTP id ; Fri, 4 Aug 2000 00:21:23 -0700 Received: by ztxmail03.ztx.compaq.com (Postfix, from userid 12345) id 0A010963; Fri, 4 Aug 2000 02:20:48 -0500 (CDT) Received: from excsin-gh02.asia.compaq.com (excsin-gh02.asia.compaq.com [16.177.2.8]) by ztxmail03.ztx.compaq.com (Postfix) with ESMTP id C8429B66; Fri, 4 Aug 2000 02:20:46 -0500 (CDT) Received: by excsin-gh02.asia.compaq.com with Internet Mail Service (5.5.2650.21) id ; Fri, 4 Aug 2000 15:20:45 +0800 Message-ID: <3C97F5A0910AD21193120000F8230C2405B87585@zpoexc1.zpo.dec.com> From: "Cham, LP" To: Statux Cc: netdev@oss.sgi.com, linux-net Subject: RE: TCP Management Date: Fri, 4 Aug 2000 15:20:38 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing How's the water mark being set? Can you provide any pointer to sites that contains these information. Thanks, Cham >-----Original Message----- >From: Statux [mailto:statux@bigfoot.com] >Sent: Friday, August 04, 2000 3:11 PM >To: Cham, LP >Cc: netdev@oss.sgi.com; linux-net >Subject: RE: TCP Management > > >> yes, when I am working on the application level, >> most of the TCP details are taken care of. > >Right. > >> But, I want to know the underlying architecture. >> That is, all these application message will be broken >> down into packets at the TCP layer. > >Right. > >> And, are there two separate buffers at that layer to handle >> incoming and outgoing packets? If the server continuously >> send packets to the client, does the client has a chance >> to send back the processed packet? > >There are two buffers: one for sending and one for receiving. > >> If not, outgoing packets must be kept in the buffer >> till he has a chance to be sent out. But, there must be >> limit as to how much the buffer can hold. Is this >> implementation at the kernel layer and does it varies >> between OS? > >There are limitations to every buffer. Different kernel implementations >and different operating systems are roughly the same thing, unless >something's been tweaked, etc. The TCP/IP defaults are defined by the >implementation and do often vary from OS to OS... but these >defaults can >usually be changed at runtime using different functions. High and Low >water marks for sending and receiving are changable. A book or >some other >TCP/IP info resource is helpful. I have my UNIX Network >Programming book >from W. Richard Stevens :) Very good book. > >> What I have seen from at the application layer is that >> all return response will ONLY be received until all >> requests have been sent. > >It depends on how it's implemented. If you're doing batch >work.. like send >everything.. then receive everything.. then you'll be stuck with what >you're describing, for example. > From owner-netdev@oss.sgi.com Fri Aug 4 00:36:38 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 00:36:27 -0700 Received: from m201-3-p21.warwick.net ([208.242.201.126]:59910 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 00:35:54 -0700 Received: from circuit.moureaux.com (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with SMTP id DAA03036; Fri, 4 Aug 2000 03:37:41 -0400 Date: Fri, 4 Aug 2000 03:37:41 -0400 (EDT) From: Statux X-Sender: statux@circuit.moureaux.com To: "Cham, LP" cc: netdev@oss.sgi.com, linux-net Subject: RE: TCP Management In-Reply-To: <3C97F5A0910AD21193120000F8230C2405B87585@zpoexc1.zpo.dec.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > How's the water mark being set? Can you provide > any pointer to sites that contains these information. The water marks are the amount of data in the buffers before data will be sent/received, etc. check out getsockopt From owner-netdev@oss.sgi.com Fri Aug 4 00:41:08 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 00:40:58 -0700 Received: from pandora.cs.kun.nl ([131.174.33.4]:3241 "EHLO pandora.cs.kun.nl") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 00:40:44 -0700 Received: from hera.cs.kun.nl by pandora.cs.kun.nl via hera.cs.kun.nl [131.174.33.2] with ESMTP id JAA25766 (8.8.8/3.12); Fri, 4 Aug 2000 09:40:08 +0200 (MET DST) Received: by hera.cs.kun.nl via ejv@localhost id JAA02238 (8.8.8/3.1); Fri, 4 Aug 2000 09:40:08 +0200 (MET DST) Date: Fri, 4 Aug 2000 09:40:07 +0200 From: Erik Verbruggen To: "Cham, LP" Cc: Statux , netdev@oss.sgi.com, linux-net Subject: Re: TCP Management Message-ID: <20000804094006.D15266@hera.cs.kun.nl> Reply-To: ejv@cs.kun.nl References: <3C97F5A0910AD21193120000F8230C2405B87582@zpoexc1.zpo.dec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <3C97F5A0910AD21193120000F8230C2405B87582@zpoexc1.zpo.dec.com>; from Lp.Cham@compaq.com on Fri, Aug 04, 2000 at 02:53:18PM +0800 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > What I have seen from at the application layer is that > all return response will ONLY be received until all > requests have been sent. What network device are you using? Ethernet? If so, if it is not in full-duplex mode and the server is continuously sending, the client ethernet card just can't get onto the cable because the cable is used. If you stop sending for a small amount of time, the client ethernet card sees that the cable is unused and can send it's data. The other way is to put the ethernet cards in full-duplex mode (not with coax and only some cards can do this, and if you're using a hub, big chance the hub does not support it). Erik. From owner-netdev@oss.sgi.com Fri Aug 4 01:49:38 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 01:49:19 -0700 Received: from colin.muc.de ([193.149.48.1]:39688 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Fri, 4 Aug 2000 01:48:45 -0700 Received: by colin.muc.de id <140556-3>; Fri, 4 Aug 2000 10:48:06 +0200 Message-ID: <20000804104758.31394@colin.muc.de> From: Andi Kleen To: Dan Browning Cc: tadavis@lbl.gov, netdev@oss.sgi.com, linux-net@vger.rutgers.edu, davem@redhat.com Subject: Re: Development status on bthernet bonding? References: <000001bffda7$21fc4590$1500000a@danb> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <000001bffda7$21fc4590$1500000a@danb>; from Dan Browning on Fri, Aug 04, 2000 at 02:09:17AM +0200 Date: Fri, 4 Aug 2000 10:47:58 +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, Aug 04, 2000 at 02:09:17AM +0200, Dan Browning wrote: > > I think FEC is a pretty great functionality for servers that need more than > 100mbps but can't afford 1gbps, and I'm supprised that there isn't more > clamoring for it's inclusion. TEQL and route based load balancing works for near all people, so there is not much demand for just another channel bundling system in addition to the three(five if you count ppp) Linux already has. -Andi From owner-netdev@oss.sgi.com Fri Aug 4 03:42:39 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 03:42:19 -0700 Received: from raven.ecs.soton.ac.uk ([152.78.70.1]:12439 "EHLO raven.ecs.soton.ac.uk") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 03:41:50 -0700 Received: from hawk.ecs.soton.ac.uk (hawk.ecs.soton.ac.uk [152.78.68.142]) by raven.ecs.soton.ac.uk (8.9.3/8.9.3) with ESMTP id LAA13794 for ; Fri, 4 Aug 2000 11:41:17 +0100 (BST) Received: from mofo.ecs.soton.ac.uk (IDENT:root@mofo.ecs.soton.ac.uk [152.78.65.197]) by hawk.ecs.soton.ac.uk (8.9.3/8.9.3) with ESMTP id LAA29826 for ; Fri, 4 Aug 2000 11:41:12 +0100 (BST) Received: (from mkt@localhost) by mofo.ecs.soton.ac.uk (8.9.3/8.9.3) id LAA03642 for netdev@oss.sgi.com; Fri, 4 Aug 2000 11:47:13 +0100 Date: Fri, 4 Aug 2000 11:47:12 +0100 From: Mark Thompson To: netdev@oss.sgi.com Subject: Re: ipv6 implementation docs Message-ID: <20000804114712.A3586@ecs.soton.ac.uk> References: <200006081438.KAA19828@ferret.cs.fiu.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <200006081438.KAA19828@ferret.cs.fiu.edu>; from esj@cs.fiu.edu on Thu, Jun 08, 2000 at 10:38:01AM -0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On 10:38(GMT) 08-06-00, Eric S. Johnson wrote: > An example of a question I have: how do you force autoconfiguration > of the interfaces, again. When you first bring up an interface with > ifconfig eth0 inet6 it does stateless autoconfig. But if you manually > clear things and/or change the prefixes that the router radvd advertises, > I would like to force it to do autoconfig again, I haven't seen how > to do this yet. Yes - I also need this too! Any ideas, people? TIA, Mark/ -- iam: networks and distributed systems ``It's all been such fun -- and I was there'' - David Barron, May 2000 From owner-netdev@oss.sgi.com Fri Aug 4 06:17:20 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 06:17:01 -0700 Received: from [131.94.125.231] ([131.94.125.231]:3346 "EHLO ferret.cs.fiu.edu") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 06:16:31 -0700 Received: from cs.fiu.edu (IDENT:esj@heaven.cs.fiu.edu [131.94.133.12]) by ferret.cs.fiu.edu (8.9.2/FIU-CS-1.2) with ESMTP id JAA22691; Fri, 4 Aug 2000 09:15:12 -0400 (EDT) Message-Id: <200008041315.JAA22691@ferret.cs.fiu.edu> X-Mailer: exmh version 2.1.1 10/15/1999 To: Mark Thompson cc: netdev@oss.sgi.com Subject: Re: ipv6 implementation docs In-Reply-To: Message from Mark Thompson of "Fri, 04 Aug 2000 11:47:12 +0100." <20000804114712.A3586@ecs.soton.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 04 Aug 2000 13:15:12 +0000 From: "Eric S. Johnson" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >From a source code analysis I have found the answer is no. Kind of. Upon module init it assigns each interface its Link local EUI (and the older "provider based addresses if so configured). It will do a router solicitation at that point. >From then on it will accept any advertised prefixes from neighboring routers. I have not found a way to coax it into automatically generating the link local again, short of module removal and reinstall (which can be problematic at times..) It also only seems to send router solicits when configured, no way I have seen to coax them. So I found I would often have to manually re-add the LL EUI addresses as I was playing around. Below is a verbose perl script I used to build the 64 bit EUI address given a mac addr. E #!/usr/local/bin/perl if ( $ARGV[0] eq "" ) { printf("Usage: eui MACADDR\n"); exit 1; } @mac=split(/:/,$ARGV[0]); $mac[0] = hex($mac[0]) | 2 ; $mac[1] = hex($mac[1]) ; $mac[2] = hex($mac[2]) ; $mac[3] = hex($mac[3]) ; $mac[4] = hex($mac[4]) ; $mac[5] = hex($mac[5]) ; printf("%x%x:%xff:fe%x:%x%x\n",$mac[0],$mac[1],$mac[2],$mac[3],$mac[4],$mac[5]); exit 0; From owner-netdev@oss.sgi.com Fri Aug 4 08:46:51 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 08:46:32 -0700 Received: from sargasso.cse.msu.edu ([35.9.20.14]:41109 "EHLO sargasso.cse.msu.edu") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 08:46:05 -0700 Received: from peloton.cse.msu.edu (peloton.cse.msu.edu [35.9.24.160]) by sargasso.cse.msu.edu (8.8.8/8.8.8) with SMTP id LAA04447; Fri, 4 Aug 2000 11:45:28 -0400 (EDT) Date: Fri, 4 Aug 2000 11:45:26 -0400 (EDT) From: Hariharan L Thantry To: Statux cc: "Cham, LP" , netdev@oss.sgi.com, linux-net Subject: RE: TCP Management In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, Whenever, the accept() call succeeds, it creates a new sock data structure which is added to the linked list of existing socks. However, I doubt that any settings made by setsockopt() for the original socket remain in the new inherited socket (2.0.34 kernel). This is my impression. Thanks Hari On Fri, 4 Aug 2000, Statux wrote: > > How's the water mark being set? Can you provide > > any pointer to sites that contains these information. > > The water marks are the amount of data in the buffers before data will be > sent/received, etc. > > check out getsockopt > > - > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.rutgers.edu > From owner-netdev@oss.sgi.com Fri Aug 4 08:50:51 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 08:50:41 -0700 Received: from sargasso.cse.msu.edu ([35.9.20.14]:17559 "EHLO sargasso.cse.msu.edu") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 08:50:17 -0700 Received: from peloton.cse.msu.edu (peloton.cse.msu.edu [35.9.24.160]) by sargasso.cse.msu.edu (8.8.8/8.8.8) with SMTP id LAA05107; Fri, 4 Aug 2000 11:49:42 -0400 (EDT) Date: Fri, 4 Aug 2000 11:49:41 -0400 (EDT) From: Hariharan L Thantry To: Erik Verbruggen cc: "Cham, LP" , Statux , netdev@oss.sgi.com, linux-net Subject: Re: TCP Management In-Reply-To: <20000804094006.D15266@hera.cs.kun.nl> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I think you have a problem with your application code. Is your server in a loop which includes accept()?? Are U spawning a thread for every client request that you get? If it is multithreaded is your server waiting for all clients to end. As somebody else has suggested, check out the bible Unix Network Programming--W. Richard Stevens. If you wanna know about the gory implementation details of TCP, read TCP/IP Illustrated Vol 2 by the same author. Good luck! Hari ------------------------------------------------------------------------------ Hariharan L. Thantry thantryh@cse.msu.edu 4642, S. Hagadorn Road, #E6 Dept of Computer Science and Engg East Lansing, MI 48823 Michigan State University East Lansing, MI 48824-1226 Ph(res): 1-517-332-2645 Ph(off): 1-517-353-6646 ----------------------------------------------------------------------------- On Fri, 4 Aug 2000, Erik Verbruggen wrote: > > What I have seen from at the application layer is that > > all return response will ONLY be received until all > > requests have been sent. > > What network device are you using? Ethernet? If so, if it is not in > full-duplex mode and the server is continuously sending, the client > ethernet card just can't get onto the cable because the cable is used. > If you stop sending for a small amount of time, the client ethernet card > sees that the cable is unused and can send it's data. The other way is > to put the ethernet cards in full-duplex mode (not with coax and only > some cards can do this, and if you're using a hub, big chance the hub > does not support it). > > Erik. > - > To unsubscribe from this list: send the line "unsubscribe linux-net" in > the body of a message to majordomo@vger.rutgers.edu > From owner-netdev@oss.sgi.com Fri Aug 4 13:21:42 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 13:21:13 -0700 Received: from m201-4-p22.warwick.net ([208.242.201.177]:5892 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 13:21:02 -0700 Received: from circuit.moureaux.com (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with SMTP id QAA00989; Fri, 4 Aug 2000 16:22:24 -0400 Date: Fri, 4 Aug 2000 16:22:24 -0400 (EDT) From: Statux X-Sender: statux@circuit.moureaux.com To: Hariharan L Thantry cc: "Cham, LP" , netdev@oss.sgi.com, linux-net Subject: RE: TCP Management In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing 2.0.34 is very old.. upgrade! :) Not that it's relevant to this stuff.. but 2.4.0 is almost out, ya know :) On Fri, 4 Aug 2000, Hariharan L Thantry wrote: > > Hi, > > Whenever, the accept() call succeeds, it creates a new sock data structure > which is added to the linked list of existing socks. However, I doubt that > any settings made by setsockopt() for the original socket remain in the > new inherited socket (2.0.34 kernel). > > This is my impression. > > Thanks > Hari > > > On Fri, 4 Aug 2000, Statux wrote: > > > > How's the water mark being set? Can you provide > > > any pointer to sites that contains these information. > > > > The water marks are the amount of data in the buffers before data will be > > sent/received, etc. > > > > check out getsockopt > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-net" in > > the body of a message to majordomo@vger.rutgers.edu > > > From owner-netdev@oss.sgi.com Fri Aug 4 13:31:13 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 13:30:52 -0700 Received: from sargasso.cse.msu.edu ([35.9.20.14]:24827 "EHLO sargasso.cse.msu.edu") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 13:30:42 -0700 Received: from peloton.cse.msu.edu (peloton.cse.msu.edu [35.9.24.160]) by sargasso.cse.msu.edu (8.8.8/8.8.8) with SMTP id QAA27508; Fri, 4 Aug 2000 16:30:10 -0400 (EDT) Date: Fri, 4 Aug 2000 16:30:04 -0400 (EDT) From: Hariharan L Thantry To: Statux cc: "Cham, LP" , netdev@oss.sgi.com, linux-net Subject: RE: TCP Management In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing ' 2.0.34 is the best documented linux kernel out there... the entire core commentary is based on it...and as you yourself admitted, probably has no relevance to this discussion.. have a great day. hari ------------------------------------------------------------------------------ Hariharan L. Thantry thantryh@cse.msu.edu 4642, S. Hagadorn Road, #E6 Dept of Computer Science and Engg East Lansing, MI 48823 Michigan State University East Lansing, MI 48824-1226 Ph(res): 1-517-332-2645 Ph(off): 1-517-353-6646 ----------------------------------------------------------------------------- On Fri, 4 Aug 2000, Statux wrote: > 2.0.34 is very old.. upgrade! :) Not that it's relevant to this stuff.. > but 2.4.0 is almost out, ya know :) > > On Fri, 4 Aug 2000, Hariharan L Thantry wrote: > > > > > Hi, > > > > Whenever, the accept() call succeeds, it creates a new sock data structure > > which is added to the linked list of existing socks. However, I doubt that > > any settings made by setsockopt() for the original socket remain in the > > new inherited socket (2.0.34 kernel). > > > > This is my impression. > > > > Thanks > > Hari > > > > > > On Fri, 4 Aug 2000, Statux wrote: > > > > > > How's the water mark being set? Can you provide > > > > any pointer to sites that contains these information. > > > > > > The water marks are the amount of data in the buffers before data will be > > > sent/received, etc. > > > > > > check out getsockopt > > > > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-net" in > > > the body of a message to majordomo@vger.rutgers.edu > > > > > > From owner-netdev@oss.sgi.com Fri Aug 4 13:35:13 2000 Received: by oss.sgi.com id ; Fri, 4 Aug 2000 13:34:52 -0700 Received: from m201-4-p22.warwick.net ([208.242.201.177]:27908 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Fri, 4 Aug 2000 13:34:36 -0700 Received: from circuit.moureaux.com (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with SMTP id QAA01042; Fri, 4 Aug 2000 16:35:24 -0400 Date: Fri, 4 Aug 2000 16:35:24 -0400 (EDT) From: Statux X-Sender: statux@circuit.moureaux.com To: Hariharan L Thantry cc: "Cham, LP" , netdev@oss.sgi.com, linux-net Subject: RE: TCP Management In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing best documented, maybe so.. up-to-date, think again :) On Fri, 4 Aug 2000, Hariharan L Thantry wrote: > ' > > > 2.0.34 is the best documented linux kernel out there... > the entire core commentary is based on it...and as you yourself admitted, > probably has no relevance to this discussion.. > > have a great day. > hari > > > ------------------------------------------------------------------------------ > Hariharan L. Thantry > thantryh@cse.msu.edu > > 4642, S. Hagadorn Road, #E6 Dept of Computer Science and Engg > East Lansing, MI 48823 Michigan State University > East Lansing, MI 48824-1226 > Ph(res): 1-517-332-2645 Ph(off): 1-517-353-6646 > ----------------------------------------------------------------------------- > > > On Fri, 4 Aug 2000, Statux wrote: > > > 2.0.34 is very old.. upgrade! :) Not that it's relevant to this stuff.. > > but 2.4.0 is almost out, ya know :) > > > > On Fri, 4 Aug 2000, Hariharan L Thantry wrote: > > > > > > > > Hi, > > > > > > Whenever, the accept() call succeeds, it creates a new sock data structure > > > which is added to the linked list of existing socks. However, I doubt that > > > any settings made by setsockopt() for the original socket remain in the > > > new inherited socket (2.0.34 kernel). > > > > > > This is my impression. > > > > > > Thanks > > > Hari > > > > > > > > > On Fri, 4 Aug 2000, Statux wrote: > > > > > > > > How's the water mark being set? Can you provide > > > > > any pointer to sites that contains these information. > > > > > > > > The water marks are the amount of data in the buffers before data will be > > > > sent/received, etc. > > > > > > > > check out getsockopt > > > > > > > > - > > > > To unsubscribe from this list: send the line "unsubscribe linux-net" in > > > > the body of a message to majordomo@vger.rutgers.edu > > > > > > > > > > From owner-netdev@oss.sgi.com Sat Aug 5 08:40:50 2000 Received: by oss.sgi.com id ; Sat, 5 Aug 2000 08:40:40 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:19979 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 5 Aug 2000 08:40:14 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA09059; Sat, 5 Aug 2000 19:38:04 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008051538.TAA09059@ms2.inr.ac.ru> Subject: Re: Don't allow mapped address after binding to ipv4. To: yoshfuji@v6.linux.or.JP (Hideaki YOSHIFUJI) Date: Sat, 5 Aug 2000 19:38:04 +0400 (MSK DST) Cc: netdev@oss.sgi.com, linux-kernel@vger.rutgers.edu In-Reply-To: <20000804122257N.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> from "Hideaki YOSHIFUJI" at Aug 4, 0 05:14:09 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 229 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > linux-2.2.16 and linux-2.4.0-test5 allow to let an application bind an > inet6 socket sd6 to an address / a port that is already bound to inet > socket sd4. Yeah... Ugly. 8) But we have no choice. Thank you. Alexey From owner-netdev@oss.sgi.com Sat Aug 5 10:47:22 2000 Received: by oss.sgi.com id ; Sat, 5 Aug 2000 10:47:03 -0700 Received: from lightning.swansea.uk.linux.org ([194.168.151.1]:10251 "EHLO the-village.bc.nu") by oss.sgi.com with ESMTP id ; Sat, 5 Aug 2000 10:46:44 -0700 Received: from alan by the-village.bc.nu with local (Exim 2.12 #1) id 13L7vq-00086X-00; Sat, 5 Aug 2000 18:40:22 +0100 Subject: Re: Don't allow mapped address after binding to ipv4. To: kuznet@ms2.inr.ac.ru Date: Sat, 5 Aug 2000 18:40:20 +0100 (BST) Cc: yoshfuji@v6.linux.or.JP (Hideaki YOSHIFUJI), netdev@oss.sgi.com, linux-kernel@vger.rutgers.edu In-Reply-To: <200008051538.TAA09059@ms2.inr.ac.ru> from "kuznet@ms2.inr.ac.ru" at Aug 05, 2000 07:38:04 PM X-Mailer: ELM [version 2.5 PL1] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > > linux-2.2.16 and linux-2.4.0-test5 allow to let an application bind an > > inet6 socket sd6 to an address / a port that is already bound to inet > > socket sd4. > > Yeah... Ugly. 8) But we have no choice. Thank you. Does this not leave us open to 'binding closer' type attacks like NFS packet theft ? From owner-netdev@oss.sgi.com Sat Aug 5 11:28:23 2000 Received: by oss.sgi.com id ; Sat, 5 Aug 2000 11:28:13 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:60427 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 5 Aug 2000 11:27:57 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA13399; Sat, 5 Aug 2000 22:26:44 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008051826.WAA13399@ms2.inr.ac.ru> Subject: Re: Don't allow mapped address after binding to ipv4. To: alan@lxorguk.ukuu.org.uk (Alan Cox) Date: Sat, 5 Aug 2000 22:26:43 +0400 (MSK DST) Cc: yoshfuji@v6.linux.or.JP, netdev@oss.sgi.com, linux-kernel@vger.rutgers.edu In-Reply-To: from "Alan Cox" at Aug 5, 0 06:40:20 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 967 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > > > linux-2.2.16 and linux-2.4.0-test5 allow to let an application bind an > > > inet6 socket sd6 to an address / a port that is already bound to inet > > > socket sd4. > > > > Yeah... Ugly. 8) But we have no choice. Thank you. > > Does this not leave us open to 'binding closer' type attacks like NFS packet > theft ? Seems, with this patch theft is impossible. Without this patch IPv6 can steal sockets used by IP, indeed. Test !sk2->rcv_saddr prevents binding to place used both by IPv6 and IP wildcard. The problem is that native IP sockets do not initialize IPv6 rcv_saddr and it is always ::, so that we cannot check for net_pinfo.af_inet6.rcv_saddr==:: instead of !sk2->rcv_saddr (and it is OK), but unfortunately we also forgot to check for coincidence of real IPv4 identities. The patch fixes this and problems disappear. Seems. 8) Though, I am not 100% sure. You frightened me. 8) This place need to be analyzed more carefully yet. Alexey From owner-netdev@oss.sgi.com Tue Aug 8 17:53:48 2000 Received: by oss.sgi.com id ; Tue, 8 Aug 2000 17:53:28 -0700 Received: from mx.sprintlabs.com ([208.30.174.2]:39940 "EHLO mailman.sprintlabs.com") by oss.sgi.com with ESMTP id ; Tue, 8 Aug 2000 17:53:01 -0700 Received: from landy (ip199-2-53-58.sprintlabs.com [199.2.53.58]) by mailman.sprintlabs.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id P3BFR4ZZ; Tue, 8 Aug 2000 16:56:37 -0700 Message-Id: <4.2.0.58.20000808162429.01a14510@mailman> X-Sender: troscoe@mailman X-Mailer: QUALCOMM Windows Eudora Pro Version 4.2.0.58 Date: Tue, 08 Aug 2000 16:57:06 -0700 To: netdev@oss.sgi.com, linux-kernel@vger.rutgers.edu From: Timothy Roscoe Subject: [PATCH] First-cut IGMPv3 implementation, big, experimental Cc: linux-igmpv3@sprintlabs.com Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a replacement of the IGMPv2 code in net/ipv4/ with IGMPv3. This is a big and complex patch: actually two, one for 2.4.x and one for 2.2.x, plus other changes to user-space include files, so I'm just posting a URL (hope this is OK): http://www.sprintlabs.com/Department/IP-Interworking/multicast/linux-igmpv3/ index.html If you don't know what IGMPv3 is, you're probably not interested this patch just yet. At this stage, this code should be regarded very much as 'experimental', but we've tested it here with real CISCO routers (running both IGMPv2 and IGMPv3 loads of IOS). Briefly, IGMPv3 extends IP multicast with source-based filtering both in the end system and local router (hence the API extensions). More details (and links to the relevant IETF group) are on the web page. -- Timothy Roscoe, Christos Gkantsidis, Supratik Bhattacharyya From owner-netdev@oss.sgi.com Tue Aug 8 19:11:58 2000 Received: by oss.sgi.com id ; Tue, 8 Aug 2000 19:11:48 -0700 Received: from Cantor.suse.de ([194.112.123.193]:5129 "HELO Cantor.suse.de") by oss.sgi.com with SMTP id ; Tue, 8 Aug 2000 19:11:22 -0700 Received: from Hermes.suse.de (Hermes.suse.de [194.112.123.136]) by Cantor.suse.de (Postfix) with ESMTP id 434081E2C1; Wed, 9 Aug 2000 04:10:51 +0200 (MEST) Received: from gruyere.muc.suse.de (unknown [10.23.1.2]) by Hermes.suse.de (Postfix) with ESMTP id 2F31910A034; Wed, 9 Aug 2000 04:10:50 +0200 (MEST) Received: by gruyere.muc.suse.de (Postfix, from userid 14446) id 32E7B2F300; Wed, 9 Aug 2000 04:09:05 +0200 (MEST) Date: Wed, 9 Aug 2000 04:09:05 +0200 From: "Andi Kleen" To: Timothy Roscoe Cc: netdev@oss.sgi.com, linux-igmpv3@sprintlabs.com Subject: Re: [PATCH] First-cut IGMPv3 implementation, big, experimental Message-ID: <20000809040905.A15485@gruyere.muc.suse.de> References: <4.2.0.58.20000808162429.01a14510@mailman> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <4.2.0.58.20000808162429.01a14510@mailman>; from troscoe@sprintlabs.com on Tue, Aug 08, 2000 at 04:57:06PM -0700 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Aug 08, 2000 at 04:57:06PM -0700, Timothy Roscoe wrote: > This is a replacement of the IGMPv2 code in net/ipv4/ with IGMPv3. [...] I started to look over your code. First it is a lot harder to review than necessary, because you were changing white space of unchanged code all over (please do not do that). Also the 2.4 patch seems to recreate net/ipv4/af_inet.c which does not look correct. More comments on the 2.4 version: You do not seem to have any SMP locks on the srclist (you're actually removing the existing locking). This could lead to SMP data corruption. Please readd the im->lock spinlocking and the reference counting for the object. Other data structures seem to have the same problem. igmp_send_state_report is called from a timer, but uses GFP_KERNEL allocation. IP_MULTICAST_FILTER: you do not check mfilt.imsf_numsrc is reasonable, so an user could allocate 128K of kernel memory. SIOCSIPMSFILTER seems to have the same problem. The memory should probably be accounted to the socket buffer anyways (sock_kmalloc) UDP: linear list search looks slow, can't you use a simple closed hash table for that? Only small parts of e.g. the filter code seems to be covered by CONFIG_IP_MULTICAST, all should. robustness should probably be a per device variable, not global. It also needs a atomic updates or a lock. im->tm_running looks very SMP racy. -Andi From owner-netdev@oss.sgi.com Tue Aug 8 21:09:10 2000 Received: by oss.sgi.com id ; Tue, 8 Aug 2000 21:08:50 -0700 Received: from brutus.conectiva.com.br ([200.250.58.146]:64244 "HELO brinquedo.distro.conectiva") by oss.sgi.com with SMTP id ; Tue, 8 Aug 2000 21:08:30 -0700 Received: by brinquedo.distro.conectiva (Postfix, from userid 0) id 4D7ED273B; Wed, 9 Aug 2000 01:17:32 -0300 (BRT) Date: Wed, 9 Aug 2000 01:17:32 -0300 From: Arnaldo Carvalho de Melo To: Jeff Garzik Cc: netdev@oss.sgi.com Subject: unneeded checks/code in some device probes Message-ID: <20000809011732.B3477@conectiva.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.2i X-Url: http://advogato.org/person/acme Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I'm looking at why drivers test if they need to allocate a new dev in the probe routines, like ac3200_probe in the ac3200.c file, I understand that it is assigned to dev->init and then called thru register_netdev, explicitly or thru init_etherdev, when creating a new net_device; it can otherwise be called thru net_dev_init in net/core/dev.c. In the second case (the ugly duck Space.c) the dev parameter for the probe will always be non NULL. If it is called thru register_netdev it will as well be non NULL, so the only case would be for in the driver code the probe to called directly, passing NULL as the dev argument, which is not the case, for example, of ac3200.c, so why check if dev == NULL? Here is the excerpt: /* We should have a "dev" from Space.c or the static module table. */ if (dev == NULL) { printk("ac3200.c: Passed a NULL device.\n"); dev = init_etherdev(0, 0); if (!dev) return -ENOMEM; } there are other drivers with this kind of code, being paranoid? I think that we may well do this: if (dev == NULL) panic(); 8) I'm completely wrong? If so, could any good soul just say something like: "think about this ...", not that much, so that I can go ahead and try to help in cleaning the drivers of what seems to be historic code not needed anymore. If I'm right then now I know why all those drivers seems to not ever have oopsed by not checking the init_etherdev results... it never was called in this situation 8) - Arnaldo From owner-netdev@oss.sgi.com Wed Aug 9 18:04:57 2000 Received: by oss.sgi.com id ; Wed, 9 Aug 2000 18:04:37 -0700 Received: from [212.87.0.35] ([212.87.0.35]:1041 "EHLO atol.icm.edu.pl") by oss.sgi.com with ESMTP id ; Wed, 9 Aug 2000 18:04:06 -0700 Received: from burza.icm.edu.pl ([148.81.208.198]:1692 "EHLO burza.icm.edu.pl" ident: "IDENT-NONSENSE") by atol.icm.edu.pl with ESMTP id ; Thu, 10 Aug 2000 03:00:13 +0200 Received: (from rzm@localhost) by burza.icm.edu.pl (8.9.3/8.9.3/rzm-2.6/icm) id DAA11043; Thu, 10 Aug 2000 03:01:19 +0200 (MET DST) Date: Thu, 10 Aug 2000 03:01:18 +0200 From: Rafal Maszkowski To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: batch mode tc patch Message-ID: <20000810030118.A10224@burza.icm.edu.pl> References: <19991005041006.S8664@burza.icm.edu.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.1i In-Reply-To: <19991005041006.S8664@burza.icm.edu.pl>; from rzm@burza.icm.edu.pl on Tue, Oct 05, 1999 at 04:10:06AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I needed to speed up /sbin/tc and to be able to pass commands from another program easily. I added batch mode to it. It can be used with a file: tc -f file '-' means stdin. The commands should be the same as everything given after tc when called normally, e.g. echo qdisc add dev eth0 root handle 10: cbq bandwidth 10Mbit avpkt 1000 cell 8 | tc -f - The patch and RPMs are in ftp://SunSITE.icm.edu.pl/private/rzm/cbq.init/ Maybe it could be integrated? I am using this tc version in my version of cbq.init , also in the above directory. It has also additional options, peak rate support and bugs fixed. Total setup speedup for huge number of limits (another question is how much sense it makes) is some 30 times for these cbq.init and tc versions. Cc: netdev R. -- W iskier krzesaniu ¿ywem/Materia³ to rzecz g³ówna From owner-netdev@oss.sgi.com Fri Aug 11 00:06:11 2000 Received: by oss.sgi.com id ; Fri, 11 Aug 2000 00:05:51 -0700 Received: from panic.ohr.gatech.edu ([130.207.47.194]:27144 "EHLO havoc.gtf.org") by oss.sgi.com with ESMTP id ; Fri, 11 Aug 2000 00:05:13 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id DAA11584; Fri, 11 Aug 2000 03:04:24 -0400 Message-ID: <3993A578.D1889928@mandrakesoft.com> Date: Fri, 11 Aug 2000 03:04:24 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.73 [en] (X11; I; Linux 2.2.17pre15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Arnaldo Carvalho de Melo CC: netdev@oss.sgi.com Subject: Re: unneeded checks/code in some device probes References: <20000809011732.B3477@conectiva.com.br> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Arnaldo Carvalho de Melo wrote: > Here is the excerpt: > > /* We should have a "dev" from Space.c or the static module table. */ > if (dev == NULL) { > printk("ac3200.c: Passed a NULL device.\n"); > dev = init_etherdev(0, 0); > > if (!dev) > return -ENOMEM; > } > > there are other drivers with this kind of code, being paranoid? I think that we > may well do this: > > if (dev == NULL) > panic(); Agreed. It looks like that is debugging code that wound up being propagated to a few drivers. For the modular case, the driver includes its own net_device(s), which are passed to the probe/init routine. For the non-modular case, one of the entries in the Space.c device table are passed to the probe/init routine. The older net drivers initialization could be made a lot smarter, and smaller, actually... Jeff -- Jeff Garzik | Building 1024 | Andre the Giant has a posse. MandrakeSoft, Inc. | From owner-netdev@oss.sgi.com Fri Aug 11 03:00:53 2000 Received: by oss.sgi.com id ; Fri, 11 Aug 2000 03:00:32 -0700 Received: from [203.126.247.144] ([203.126.247.144]:30714 "EHLO zsngs001") by oss.sgi.com with ESMTP id ; Fri, 11 Aug 2000 03:00:19 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by zsngs001; Fri, 11 Aug 2000 17:29:58 +0800 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id QBHV190K; Fri, 11 Aug 2000 17:29:58 +0800 Received: from uow.edu.au (47.181.194.143 [47.181.194.143]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id QTTPS27J; Fri, 11 Aug 2000 19:30:00 +1000 Message-ID: <3993C7B3.5A50FA4F@uow.edu.au> Date: Fri, 11 Aug 2000 19:30:27 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: jn@it.swin.edu.au CC: "netdev@oss.sgi.com" , xenon@granch.ru Subject: Re: more on the bonding driver References: <39939BC9.4AFEBAFE@it.swin.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing John Newbigin wrote: > > I have found a solution to another problem with the bonding driver. > > If you ifconfig down an enslaved eth device, the bonding driver is not > notified and if there is an attempt to use the device then a crash > results. I have added a one line fix to dev_close in net/core/dev.c > > /* if the device is a slave we should not touch it*/ > if(dev->flags&IFF_SLAVE) > return -EBUSY; > > I put it after the check to see if the interface is up. > > This may not be the best solution but it prevents the kernel crashes and > as bond_release is not implemented there is not much point making the > device release it's self. > > Since I made this change and the one in my last message I have not been > able to produce a crash. > > (here is a script which will cause the crash on 2.2.16) > #!/bin/bash > insmod eepro100 > insmod bonding > ifconfig bond0 192.168.1.2 up > ifenslave bond0 eth0 > ifconfig eth0 down > ifconfig bond0 down > ifconfig bond0 192.168.1.2 up > ifconfig bond0 down > > Again, if someone could confirm this fix it would be good. > > John. I agree. I've looked at the other users of IFF_SLAVE (eql.c and sbni.c). They appear to be OK with this change. In fact I wonder if they're vulnerable to the same problem in kernel 2.2. From owner-netdev@oss.sgi.com Fri Aug 11 03:01:12 2000 Received: by oss.sgi.com id ; Fri, 11 Aug 2000 03:00:53 -0700 Received: from [203.126.247.144] ([203.126.247.144]:30202 "EHLO zsngs001") by oss.sgi.com with ESMTP id ; Fri, 11 Aug 2000 03:00:31 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by zsngs001; Fri, 11 Aug 2000 17:29:46 +0800 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id QBHV190H; Fri, 11 Aug 2000 17:29:47 +0800 Received: from uow.edu.au (47.181.194.143 [47.181.194.143]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id QTTPS27F; Fri, 11 Aug 2000 19:29:48 +1000 Message-ID: <3993C7A6.8E8EC7E6@uow.edu.au> Date: Fri, 11 Aug 2000 19:30:14 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: John Newbigin CC: "netdev@oss.sgi.com" Subject: Re: bonding.c fixes References: <39929416.8D17BFD3@it.swin.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing [ switched mailing lists ] John Newbigin wrote: > > I have examined the bonding code and found a few things which seem to > improve the stability. > > in bond_close, as each device is removed, queue->num_slaves in > incremented, this should be decremented. > > After the loop I set queue->head & queue->tail to NULL. I don't know if > the queue gets reused but if it does it will need these set to NULL. > > It would be good it someone could confirm these changes. > > John. Kernel 2.2, I assume? The queue->num_slaves fix has already made it into 2.2.17-pre14. I agree that queue->head should be zeroed in bond_close(). Otherwise bad things will happen in bond_enslave() next time the binding device is opened and bonded to. May as well zero queue->tail as well. From owner-netdev@oss.sgi.com Sat Aug 12 10:35:08 2000 Received: by oss.sgi.com id ; Sat, 12 Aug 2000 10:34:58 -0700 Received: from [213.255.48.130] ([213.255.48.130]:33774 "HELO halfway.linuxcare.com.au") by oss.sgi.com with SMTP id ; Sat, 12 Aug 2000 10:34:39 -0700 Received: from linuxcare.com.au (localhost [127.0.0.1]) by halfway.linuxcare.com.au (Postfix) with ESMTP id 285B58172; Sun, 13 Aug 2000 03:34:39 +1000 (EST) From: Rusty Russell To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: nfmark routing in ip_route_output() Date: Sun, 13 Aug 2000 03:34:39 +1000 Message-Id: <20000812173439.285B58172@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Forgot this one. Can we add an argument for nfmark field in ip_route_output()? Needed so local traffic can be routed in complex ways (mark altered in NF_IP_LOCAL_OUT). The other option is less invasive: make route_me_harder call ip_route_output_slow() directly, and simply add the parameter there. Trivial patch follows, Rusty. diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/include/net/route.h working-2.4.0-test6-uml-cvs/include/net/route.h --- linux-2.4.0-test6-uml-cvs/include/net/route.h Sat Aug 12 00:49:28 2000 +++ working-2.4.0-test6-uml-cvs/include/net/route.h Sun Aug 13 03:24:50 2000 @@ -99,7 +99,7 @@ u32 src, u8 tos, struct net_device *dev); extern void ip_rt_advice(struct rtable **rp, int advice); extern void rt_cache_flush(int how); -extern int ip_route_output(struct rtable **, u32 dst, u32 src, u32 tos, int oif); +extern int ip_route_output(struct rtable **, u32 dst, u32 src, u32 tos, int oif, unsigned long nfmark); extern int ip_route_input(struct sk_buff*, u32 dst, u32 src, u8 tos, struct net_device *devin); extern unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu); extern void ip_rt_update_pmtu(struct dst_entry *dst, unsigned mtu); @@ -135,14 +135,14 @@ extern __inline__ int ip_route_connect(struct rtable **rp, u32 dst, u32 src, u32 tos, int oif) { int err; - err = ip_route_output(rp, dst, src, tos, oif); + err = ip_route_output(rp, dst, src, tos, oif, 0); if (err || (dst && src)) return err; dst = (*rp)->rt_dst; src = (*rp)->rt_src; ip_rt_put(*rp); *rp = NULL; - return ip_route_output(rp, dst, src, tos, oif); + return ip_route_output(rp, dst, src, tos, oif, 0); } extern void rt_bind_peer(struct rtable *rt, int create); Only in working-2.4.0-test6-uml-cvs/: linux diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/atm/clip.c working-2.4.0-test6-uml-cvs/net/atm/clip.c --- linux-2.4.0-test6-uml-cvs/net/atm/clip.c Wed Jul 12 17:52:22 2000 +++ working-2.4.0-test6-uml-cvs/net/atm/clip.c Sun Aug 13 03:31:50 2000 @@ -525,7 +525,7 @@ unlink_clip_vcc(clip_vcc); return 0; } - error = ip_route_output(&rt,ip,0,1,0); + error = ip_route_output(&rt,ip,0,1,0,0); if (error) return error; neigh = __neigh_lookup(&clip_tbl,&ip,rt->u.dst.dev,1); ip_rt_put(rt); diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/arp.c working-2.4.0-test6-uml-cvs/net/ipv4/arp.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/arp.c Sat Aug 12 00:23:39 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/arp.c Sun Aug 13 03:25:02 2000 @@ -838,7 +838,7 @@ r->arp_flags |= ATF_COM; if (dev == NULL) { struct rtable * rt; - if ((err = ip_route_output(&rt, ip, 0, RTO_ONLINK, 0)) != 0) + if ((err = ip_route_output(&rt, ip, 0, RTO_ONLINK, 0, 0)) != 0) return err; dev = rt->u.dst.dev; ip_rt_put(rt); @@ -921,7 +921,7 @@ if (dev == NULL) { struct rtable * rt; - if ((err = ip_route_output(&rt, ip, 0, RTO_ONLINK, 0)) != 0) + if ((err = ip_route_output(&rt, ip, 0, RTO_ONLINK, 0, 0)) != 0) return err; dev = rt->u.dst.dev; ip_rt_put(rt); diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/icmp.c working-2.4.0-test6-uml-cvs/net/ipv4/icmp.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/icmp.c Sat Aug 12 00:23:39 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/icmp.c Sun Aug 13 03:25:15 2000 @@ -519,7 +519,7 @@ if (ipc.opt->srr) daddr = icmp_param->replyopts.faddr; } - if (ip_route_output(&rt, daddr, rt->rt_spec_dst, RT_TOS(skb->nh.iph->tos), 0)) + if (ip_route_output(&rt, daddr, rt->rt_spec_dst, RT_TOS(skb->nh.iph->tos), 0, 0)) goto out; if (icmpv4_xrlim_allow(rt, icmp_param->icmph.type, icmp_param->icmph.code)) { @@ -631,7 +631,7 @@ * fast routing cache at first. Otherwise an attacker can * grow the routing table. */ - if (ip_route_output(&rt, iph->saddr, saddr, RT_TOS(tos), 0)) + if (ip_route_output(&rt, iph->saddr, saddr, RT_TOS(tos), 0, 0)) goto out; if (ip_options_echo(&icmp_param.replyopts, skb_in)) @@ -654,7 +654,7 @@ ipc.opt = &icmp_param.replyopts; if (icmp_param.replyopts.srr) { ip_rt_put(rt); - if (ip_route_output(&rt, icmp_param.replyopts.faddr, saddr, RT_TOS(tos), 0)) + if (ip_route_output(&rt, icmp_param.replyopts.faddr, saddr, RT_TOS(tos), 0, 0)) goto out; } diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/igmp.c working-2.4.0-test6-uml-cvs/net/ipv4/igmp.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/igmp.c Sat Aug 12 00:23:39 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/igmp.c Sun Aug 13 03:28:28 2000 @@ -204,7 +204,7 @@ if (type == IGMP_HOST_LEAVE_MESSAGE) dst = IGMP_ALL_ROUTER; - if (ip_route_output(&rt, dst, 0, 0, dev->ifindex)) + if (ip_route_output(&rt, dst, 0, 0, dev->ifindex, 0)) return -1; if (rt->rt_src == 0) { ip_rt_put(rt); @@ -610,7 +610,7 @@ __dev_put(dev); } - if (!dev && !ip_route_output(&rt, imr->imr_multiaddr.s_addr, 0, 0, 0)) { + if (!dev && !ip_route_output(&rt, imr->imr_multiaddr.s_addr, 0, 0, 0, 0)) { dev = rt->u.dst.dev; ip_rt_put(rt); } diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/ip_gre.c working-2.4.0-test6-uml-cvs/net/ipv4/ip_gre.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/ip_gre.c Sat Aug 12 00:23:39 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/ip_gre.c Sun Aug 13 03:29:08 2000 @@ -485,7 +485,7 @@ skb2->nh.raw = skb2->data; /* Try to guess incoming interface */ - if (ip_route_output(&rt, eiph->saddr, 0, RT_TOS(eiph->tos), 0)) { + if (ip_route_output(&rt, eiph->saddr, 0, RT_TOS(eiph->tos), 0, 0)) { kfree_skb(skb2); return; } @@ -495,7 +495,7 @@ if (rt->rt_flags&RTCF_LOCAL) { ip_rt_put(rt); rt = NULL; - if (ip_route_output(&rt, eiph->daddr, eiph->saddr, eiph->tos, 0) || + if (ip_route_output(&rt, eiph->daddr, eiph->saddr, eiph->tos, 0, 0) || rt->u.dst.dev->type != ARPHRD_IPGRE) { ip_rt_put(rt); kfree_skb(skb2); @@ -729,7 +729,7 @@ tos &= ~1; } - if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link)) { + if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link, 0)) { tunnel->stat.tx_carrier_errors++; goto tx_error; } @@ -1080,7 +1080,7 @@ struct rtable *rt; if (ip_route_output(&rt, t->parms.iph.daddr, t->parms.iph.saddr, RT_TOS(t->parms.iph.tos), - t->parms.link)) { + t->parms.link, 0)) { MOD_DEC_USE_COUNT; return -EADDRNOTAVAIL; } @@ -1153,7 +1153,7 @@ if (iph->daddr) { struct rtable *rt; - if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link)) { + if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link, 0)) { tdev = rt->u.dst.dev; ip_rt_put(rt); } diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/ip_output.c working-2.4.0-test6-uml-cvs/net/ipv4/ip_output.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/ip_output.c Wed Apr 12 17:43:07 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/ip_output.c Sun Aug 13 03:29:42 2000 @@ -118,7 +118,8 @@ if (ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos) | RTO_CONN, - skb->sk ? skb->sk->bound_dev_if : 0)) { + skb->sk ? skb->sk->bound_dev_if : 0, + skb->nfmark)) { printk("route_me_harder: No more route.\n"); return -EINVAL; } @@ -404,7 +405,7 @@ */ if (ip_route_output(&rt, daddr, sk->saddr, RT_TOS(sk->protinfo.af_inet.tos) | RTO_CONN | sk->localroute, - sk->bound_dev_if)) + sk->bound_dev_if, 0)) goto no_route; __sk_dst_set(sk, &rt->u.dst); } @@ -987,7 +988,7 @@ daddr = replyopts.opt.faddr; } - if (ip_route_output(&rt, daddr, rt->rt_spec_dst, RT_TOS(skb->nh.iph->tos), 0)) + if (ip_route_output(&rt, daddr, rt->rt_spec_dst, RT_TOS(skb->nh.iph->tos), 0, 0)) return; /* And let IP do all the hard work. diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/ipip.c working-2.4.0-test6-uml-cvs/net/ipv4/ipip.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/ipip.c Sat Aug 12 00:23:39 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/ipip.c Sun Aug 13 03:28:49 2000 @@ -419,7 +419,7 @@ skb2->nh.raw = skb2->data; /* Try to guess incoming interface */ - if (ip_route_output(&rt, eiph->saddr, 0, RT_TOS(eiph->tos), 0)) { + if (ip_route_output(&rt, eiph->saddr, 0, RT_TOS(eiph->tos), 0, 0)) { kfree_skb(skb2); return; } @@ -429,7 +429,7 @@ if (rt->rt_flags&RTCF_LOCAL) { ip_rt_put(rt); rt = NULL; - if (ip_route_output(&rt, eiph->daddr, eiph->saddr, eiph->tos, 0) || + if (ip_route_output(&rt, eiph->daddr, eiph->saddr, eiph->tos, 0, 0) || rt->u.dst.dev->type != ARPHRD_IPGRE) { ip_rt_put(rt); kfree_skb(skb2); @@ -556,7 +556,7 @@ goto tx_error_icmp; } - if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link)) { + if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link, 0)) { tunnel->stat.tx_carrier_errors++; goto tx_error_icmp; } @@ -813,7 +813,7 @@ if (iph->daddr) { struct rtable *rt; - if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link)) { + if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link, 0)) { tdev = rt->u.dst.dev; ip_rt_put(rt); } diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/ipmr.c working-2.4.0-test6-uml-cvs/net/ipv4/ipmr.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/ipmr.c Sat Aug 12 00:23:39 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/ipmr.c Sun Aug 13 03:29:23 2000 @@ -1142,11 +1142,11 @@ #endif if (vif->flags&VIFF_TUNNEL) { - if (ip_route_output(&rt, vif->remote, vif->local, RT_TOS(iph->tos), vif->link)) + if (ip_route_output(&rt, vif->remote, vif->local, RT_TOS(iph->tos), vif->link, 0)) return; encap = sizeof(struct iphdr); } else { - if (ip_route_output(&rt, iph->daddr, 0, RT_TOS(iph->tos), vif->link)) + if (ip_route_output(&rt, iph->daddr, 0, RT_TOS(iph->tos), vif->link, 0)) return; } diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ip_fw_compat_masq.c working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ip_fw_compat_masq.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ip_fw_compat_masq.c Fri Jul 28 21:36:46 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ip_fw_compat_masq.c Sun Aug 13 03:30:35 2000 @@ -72,7 +72,8 @@ /* Pass 0 instead of saddr, since it's going to be changed anyway. */ - if (ip_route_output(&rt, iph->daddr, 0, 0, 0) != 0) { + if (ip_route_output(&rt, iph->daddr, 0, 0, 0, (*pskb)->nfmark) + != 0) { DEBUGP("ipnat_rule_masquerade: Can't reroute.\n"); return NF_DROP; } diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ip_nat_core.c working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ip_nat_core.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ip_nat_core.c Sat Aug 12 00:23:40 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ip_nat_core.c Sun Aug 13 03:30:41 2000 @@ -204,7 +204,7 @@ struct rtable *rt; /* FIXME: IPTOS_TOS(iph->tos) --RR */ - if (ip_route_output(&rt, var_ip, 0, 0, 0) != 0) { + if (ip_route_output(&rt, var_ip, 0, 0, 0, 0) != 0) { DEBUGP("do_extra_mangle: Can't get route to %u.%u.%u.%u\n", IP_PARTS(var_ip)); return 0; diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_MASQUERADE.c working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_MASQUERADE.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_MASQUERADE.c Wed Jul 12 17:52:23 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_MASQUERADE.c Sun Aug 13 03:31:04 2000 @@ -85,7 +85,7 @@ if (ip_route_output(&rt, (*pskb)->nh.iph->daddr, 0, RT_TOS((*pskb)->nh.iph->tos)|RTO_CONN, - out->ifindex) != 0) { + out->ifindex, (*pskb)->nfmark) != 0) { /* Shouldn't happen */ printk("MASQUERADE: No route: Rusty's brain broke!\n"); return NF_DROP; diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_MIRROR.c working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_MIRROR.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_MIRROR.c Wed Jul 12 17:52:23 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_MIRROR.c Sun Aug 13 03:31:13 2000 @@ -44,7 +44,7 @@ /* Backwards */ if (ip_route_output(&rt, iph->saddr, iph->daddr, RT_TOS(iph->tos) | RTO_CONN, - 0)) { + 0, 0)) { return 0; } diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_REJECT.c working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_REJECT.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_REJECT.c Fri Jul 28 21:36:46 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/netfilter/ipt_REJECT.c Sun Aug 13 03:31:19 2000 @@ -113,7 +113,7 @@ /* Routing */ if (ip_route_output(&rt, nskb->nh.iph->daddr, nskb->nh.iph->saddr, RT_TOS(nskb->nh.iph->tos) | RTO_CONN, - 0) != 0) + 0, 0) != 0) goto free_nskb; dst_release(nskb->dst); diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/raw.c working-2.4.0-test6-uml-cvs/net/ipv4/raw.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/raw.c Wed Jul 12 17:52:23 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/raw.c Sun Aug 13 03:29:15 2000 @@ -400,7 +400,7 @@ rfh.saddr = sk->protinfo.af_inet.mc_addr; } - err = ip_route_output(&rt, daddr, rfh.saddr, tos, ipc.oif); + err = ip_route_output(&rt, daddr, rfh.saddr, tos, ipc.oif, 0); if (err) goto done; diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/syncookies.c working-2.4.0-test6-uml-cvs/net/ipv4/syncookies.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/syncookies.c Sat Aug 12 00:23:40 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/syncookies.c Sun Aug 13 03:30:00 2000 @@ -181,7 +181,7 @@ opt->srr ? opt->faddr : req->af.v4_req.rmt_addr, req->af.v4_req.loc_addr, sk->protinfo.af_inet.tos | RTO_CONN, - 0)) { + 0, 0)) { tcp_openreq_free(req); return NULL; } diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv4/tcp_ipv4.c working-2.4.0-test6-uml-cvs/net/ipv4/tcp_ipv4.c --- linux-2.4.0-test6-uml-cvs/net/ipv4/tcp_ipv4.c Sat Aug 12 00:23:40 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv4/tcp_ipv4.c Sun Aug 13 03:29:54 2000 @@ -1175,10 +1175,10 @@ opt = req->af.v4_req.opt; if(ip_route_output(&rt, ((opt && opt->srr) ? opt->faddr : - req->af.v4_req.rmt_addr), + req->af.v4_req.rmt_addr,), req->af.v4_req.loc_addr, RT_TOS(sk->protinfo.af_inet.tos) | RTO_CONN | sk->localroute, - sk->bound_dev_if)) { + sk->bound_dev_if, 0)) { IP_INC_STATS_BH(IpOutNoRoutes); return NULL; } @@ -1746,7 +1746,7 @@ err = ip_route_output(&rt, daddr, sk->saddr, RT_TOS(sk->protinfo.af_inet.tos) | RTO_CONN | sk->localroute, - sk->bound_dev_if); + sk->bound_dev_if, 0); if (err) { sk->err_soft=-err; sk->error_report(sk); diff -ur -X /tmp/filej6zXhl --minimal linux-2.4.0-test6-uml-cvs/net/ipv6/sit.c working-2.4.0-test6-uml-cvs/net/ipv6/sit.c --- linux-2.4.0-test6-uml-cvs/net/ipv6/sit.c Sat Aug 12 00:23:40 2000 +++ working-2.4.0-test6-uml-cvs/net/ipv6/sit.c Sun Aug 13 03:31:46 2000 @@ -474,7 +474,7 @@ dst = addr6->s6_addr32[3]; } - if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link)) { + if (ip_route_output(&rt, dst, tiph->saddr, RT_TOS(tos), tunnel->parms.link, 0)) { tunnel->stat.tx_carrier_errors++; goto tx_error_icmp; } @@ -740,7 +740,7 @@ if (iph->daddr) { struct rtable *rt; - if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link)) { + if (!ip_route_output(&rt, iph->daddr, iph->saddr, RT_TOS(iph->tos), tunnel->parms.link, 0)) { tdev = rt->u.dst.dev; ip_rt_put(rt); } -- Hacking time. From owner-netdev@oss.sgi.com Sat Aug 12 19:59:03 2000 Received: by oss.sgi.com id ; Sat, 12 Aug 2000 19:58:53 -0700 Received: from ns1121.munich.netsurf.de ([195.180.235.121]:18692 "HELO fred.muc.de") by oss.sgi.com with SMTP id ; Sat, 12 Aug 2000 19:58:38 -0700 Received: by fred.muc.de (Postfix, from userid 500) id 0ED87E3911; Sun, 13 Aug 2000 04:30:35 +0200 (CEST) Date: Sun, 13 Aug 2000 04:30:35 +0200 From: Andi Kleen To: Rusty Russell Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: nfmark routing in ip_route_output() Message-ID: <20000813043035.A4195@fred.muc.de> References: <20000812173439.285B58172@halfway.linuxcare.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20000812173439.285B58172@halfway.linuxcare.com.au>; from rusty@linuxcare.com.au on Sat, Aug 12, 2000 at 07:36:28PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, Aug 12, 2000 at 07:36:28PM +0200, Rusty Russell wrote: > Forgot this one. > > Can we add an argument for nfmark field in ip_route_output()? Needed > so local traffic can be routed in complex ways (mark altered in > NF_IP_LOCAL_OUT). > > The other option is less invasive: make route_me_harder call > ip_route_output_slow() directly, and simply add the parameter there. > > Trivial patch follows, It seems it is missing the patch to route.c I also do not understand why you cannot simply change skb->nfmark before calling ip_route_output. -Andi From owner-netdev@oss.sgi.com Sun Aug 13 09:31:25 2000 Received: by oss.sgi.com id ; Sun, 13 Aug 2000 09:31:15 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:6149 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 13 Aug 2000 09:31:00 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA04346; Sun, 13 Aug 2000 20:30:21 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008131630.UAA04346@ms2.inr.ac.ru> Subject: Re: nfmark routing in ip_route_output() To: rusty@linuxcare.com.au (Rusty Russell) Date: Sun, 13 Aug 2000 20:30:21 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000812173439.285B58172@halfway.linuxcare.com.au> from "Rusty Russell" at Aug 13, 0 03:34:39 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 997 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Can we add an argument for nfmark field in ip_route_output()? Needed > so local traffic can be routed in complex ways (mark altered in > NF_IP_LOCAL_OUT). To say is that I dislike this is to say nothing. If you have an skb to reroute, reroute skb. If you do not want to depend on skb, add new function using rt_key as argument. You may even replace ip_route_output() with this new function everywhere, it will be a bit slower, but it is worth to do, because has lots of useful applications not bound to nfmark. But adding new argument (and with such name, which cries to be #ifdef'd) to function used mainly in context, where there are no way to set something but zero is strange idea. > The other option is less invasive: make route_me_harder call > ip_route_output_slow() directly, and simply add the parameter there. But why did you select _more_ invasive way then? 8) I can answer to myself. It is easy to see from name of this function, look at it more carefully. Alexey From owner-netdev@oss.sgi.com Sun Aug 13 13:27:48 2000 Received: by oss.sgi.com id ; Sun, 13 Aug 2000 13:27:39 -0700 Received: from ldn52-113.Leiden.NL.net ([212.206.213.114]:5383 "EHLO ida.2y.net") by oss.sgi.com with ESMTP id ; Sun, 13 Aug 2000 13:27:26 -0700 Received: from freeler.nl (IDENT:jorg@localhost [127.0.0.1]) by ida.2y.net (8.9.3/8.9.3) with ESMTP id WAA02737 for ; Sun, 13 Aug 2000 22:27:16 +0200 Message-ID: <399704A4.80CF1BD1@freeler.nl> Date: Sun, 13 Aug 2000 22:27:16 +0200 From: Jorg de Jong X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.0-test6 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: problems with creating tunnel Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I am justs starting to fool around with ipv6. (Dont you just love mails like this ;-) ) And of cource I have a problem: I requested a tunnel from freenet6 and the command in the file the send are: ifconfig sit0 up ifconfig eth0 add 3ffe:b00:c18:1fff:0:0:0:559 ifconfig sit0 tunnel ::206.123.31.102 ifconfig sit1 up route -A inet6 add default gw fe80::206.123.31.102 dev sit1 during the creation of the tunnel I get : [root] /root > ifconfig sit0 tunnel ::206.123.31.102 SIOCSIFDSTADDR: No buffer space available Not good I asume. On the various web pages I tried to look for pointer where to find information on how to solve this problem, but did not find any usfull info. My box is running 2.4.0-test6 If someone needs more info, say the word and I will try to get it.... Thanks for your help, Jorg de Jong [root] /root > ifconfig eth0 Link encap:Ethernet HWaddr 52:54:AB:16:55:8F inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::5254:ab16:558f/10 Scope:Link inet6 addr: 3ffe:b00:c18:1fff::559/0 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:73 dropped:0 overruns:0 carrier:73 collisions:1241 txqueuelen:100 Interrupt:11 Base address:0x1820 ippp0 Link encap:Point-to-Point Protocol inet addr:193.79.255.151 P-t-P:193.79.255.3 Mask:255.255.255.0 UP POINTOPOINT RUNNING NOARP MTU:1500 Metric:1 RX packets:4806 errors:0 dropped:0 overruns:0 frame:0 TX packets:4846 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:30 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16144 Metric:1 RX packets:10168 errors:0 dropped:0 overruns:0 frame:0 TX packets:10168 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 sit0 Link encap:IPv6-in-IPv4 inet6 addr: ::193.79.255.151/96 Scope:Compat inet6 addr: ::127.0.0.1/96 Scope:Unknown inet6 addr: ::192.168.1.1/96 Scope:Compat UP RUNNING NOARP MTU:1480 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:62 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 sit1 Link encap:IPv6-in-IPv4 inet6 addr: fe80::c14f:ff97/10 Scope:Link inet6 addr: fe80::c0a8:101/10 Scope:Link UP POINTOPOINT RUNNING NOARP MTU:1480 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 -- Jorg de Jong Work : mailto:jorg.de.jong@ict.nl Play : mailto:j.e.s.de.jong@freeler.nl From owner-netdev@oss.sgi.com Sun Aug 13 14:49:39 2000 Received: by oss.sgi.com id ; Sun, 13 Aug 2000 14:49:30 -0700 Received: from sabre-wulf.nvg.ntnu.no ([129.241.210.67]:29701 "EHLO sabre-wulf.nvg.ntnu.no") by oss.sgi.com with ESMTP id ; Sun, 13 Aug 2000 14:49:07 -0700 Received: from tyrell.nvg.ntnu.no ([IPv6:::ffff:129.241.210.70]:54276 "EHLO tyrell.nvg.ntnu.no" ident: "TIMEDOUT2" whoson: "-unregistered-") by sabre-wulf.nvg.ntnu.no with ESMTP id ; Sun, 13 Aug 2000 23:48:45 +0200 Received: (from venaas@localhost) by tyrell.nvg.ntnu.no (8.9.3/8.8.4) id XAA16738; Sun, 13 Aug 2000 23:48:34 +0200 Date: Sun, 13 Aug 2000 23:48:33 +0200 From: Stig Venaas To: Jorg de Jong Cc: netdev@oss.sgi.com Subject: Re: problems with creating tunnel Message-ID: <20000813234833.A16654@nvg.ntnu.no> References: <399704A4.80CF1BD1@freeler.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <399704A4.80CF1BD1@freeler.nl>; from j.e.s.de.jong@freeler.nl on Sun, Aug 13, 2000 at 10:27:16PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, Aug 13, 2000 at 10:27:16PM +0200, Jorg de Jong wrote: > Hi, > > I am justs starting to fool around with ipv6. > (Dont you just love mails like this ;-) ) You're interested in IPv6, that's good! > during the creation of the tunnel I get : > [root] /root > ifconfig sit0 tunnel ::206.123.31.102 > SIOCSIFDSTADDR: No buffer space available Yes, ifconfig doesn't work well with recent kernels, while iptunnel that comes together with ifconfig in the net-tools package does work. I've been planning to look into this... Should be possible to fix ifconfig. Perhaps someone else here knows more. I think this should work: iptunnel add freenet6 mode sit remote 206.123.31.102 ttl 64 Stig From owner-netdev@oss.sgi.com Mon Aug 14 06:42:26 2000 Received: by oss.sgi.com id ; Mon, 14 Aug 2000 06:42:06 -0700 Received: from logina.lt ([195.22.177.68]:1547 "EHLO lolo.logina.lt") by oss.sgi.com with ESMTP id ; Mon, 14 Aug 2000 06:42:02 -0700 Received: from localhost (and@localhost) by lolo.logina.lt (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id PAA08303; Mon, 14 Aug 2000 15:43:47 -0200 Date: Mon, 14 Aug 2000 15:43:47 -0200 (GMT+2) From: Andrius Kasparavicius X-Sender: and@lolo.logina.lt cc: Jorg de Jong , netdev@oss.sgi.com Subject: Re: problems with creating tunnel In-Reply-To: <20000813234833.A16654@nvg.ntnu.no> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII To: unlisted-recipients:; (no To-header on input) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 606 Lines: 16 > > during the creation of the tunnel I get : > > [root] /root > ifconfig sit0 tunnel ::206.123.31.102 > > SIOCSIFDSTADDR: No buffer space available I think this message mean's that on this tunnel you already made tunnel to somethink. For making IPv6 tunnels via IPv4 better use tool iproute2. ------------------------- Kasparavicius Andrius ________________________________________________________________________ http://www.andrius.org ICQ:17701001 tel.: +370 87 25630 nick: Casper AND-RIPE AND-6BONE From owner-netdev@oss.sgi.com Mon Aug 14 07:28:17 2000 Received: by oss.sgi.com id ; Mon, 14 Aug 2000 07:28:07 -0700 Received: from magnus.cordef.net.pl ([212.160.102.222]:10759 "HELO kepler.agaran.6bone.pl") by oss.sgi.com with SMTP id ; Mon, 14 Aug 2000 07:27:40 -0700 Received: by kepler.agaran.6bone.pl (Postfix+IPv6, from userid 500) id BE4F4BF9C; Mon, 14 Aug 2000 15:49:51 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by kepler.agaran.6bone.pl (Postfix+IPv6) with ESMTP id 15711BF91 for ; Mon, 14 Aug 2000 15:49:50 +0200 (CEST) Date: Mon, 14 Aug 2000 15:49:50 +0200 (CEST) From: Maciej 'Agaran' Pijanka To: NetDevel List Subject: IPv4 in IPv6 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 369 Lines: 11 Hello Could anyone tell me if exist that possibility for linux 2.2.x ? something like reversed sit.. i looked at sources..but all things bases on ipv4..hashing and so on.. -- Maciej 'Agaran' Pijanka MAP2-6BONE i386, Linux 2.2, Pine, Slrn, Vi(m), IPv6, Gdb, I do not fear computers. I fear the lack of them. -- Isaac Asimov From owner-netdev@oss.sgi.com Mon Aug 14 09:03:37 2000 Received: by oss.sgi.com id ; Mon, 14 Aug 2000 09:03:17 -0700 Received: from lightning.swansea.uk.linux.org ([194.168.151.1]:17274 "EHLO the-village.bc.nu") by oss.sgi.com with ESMTP id ; Mon, 14 Aug 2000 09:03:05 -0700 Received: from alan by the-village.bc.nu with local (Exim 2.12 #1) id 13OMc7-0004Ie-00 for netdev@oss.sgi.com; Mon, 14 Aug 2000 16:57:23 +0100 Received: from mh2dmz3.bloomberg.net ([206.156.53.152]) by the-village.bc.nu with esmtp (Exim 2.12 #1) id 13OM1w-0004DZ-00 for alan@lxorguk.ukuu.org.uk; Mon, 14 Aug 2000 16:20:02 +0100 Received: from mh2ny.bloomberg.com by mh2dmz3.bloomberg.net with ESMTP for alan@lxorguk.ukuu.org.uk; Mon, 14 Aug 2000 11:25:25 -0400 Received: from [172.20.58.193] by mh2ny.bloomberg.com with ESMTP; Mon, 14 Aug 2000 11:25:18 -0400 Received: from localhost (localhost [[UNIX: localhost]]) by anton.bloomberg.com (8.9.3/8.9.3) id LAA28547; Mon, 14 Aug 2000 11:22:19 -0400 From: Anton Ghiugan Organization: BLOOMBERG L.P. To: linux-kernel@vger.rutgers.edu, linux-net@vger.rutgers.edu Subject: About TCP_MAXSEG and getsockopt() Date: Mon, 14 Aug 2000 11:04:34 -0400 X-Mailer: KMail [version 1.0.29] Cc: alan@lxorguk.ukuu.org.uk Message-Id: <00081411221909.17546@anton> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 3689 Lines: 117 It looks like getsockopt() function call always returns a default value of zero for MSS parameter (maximum segement size) unless the user has preset it to some different value by calling setsockopt(). Checking with the source code (net/ipv4/tcp_output.c) I have discovered that Linux uses three different variables for MSS (listed below are fragments form comments found in lines 254-274 in the above mentioned source file): - "user_mss is mss set by user by TCP_MAXSEG" - "mss_cache is current effective sending mss" - "mss_clamp is mss negotiated at connection setup. It is minumum of user_mss and mss received with SYN" As you may guess getsockopt()/setsockopt() when called w/ TCP_MAXSEG operates *only* on the user_mss. While I find perfectly normal for setsockopt() to operate on (that is : modify) user_mss - since the user should be provided w/ a meaning to express what he/she wants the mss to be - I don't find any good reason why getsockopt() should return the same value. At least two reasons makes me belive this is an error: 1. The user program can always remember what value he passed to setsockopt(), if any. Besides of that any system call , including getsockopt() should take longer than a in-memory access in user space. 2. Other implementations of TCP/IP protocol (notably SunOS) return correct value. See the source code at the end of this email of a small test application that has been used to verify this assumption. IMHO getsockopt() should return the value of mss_cache variable. Enclosed is a patch that should modify the kernel in this matter. ---------> CUT HERE <---------------- *** linux/net/ipv4/tcp.c Mon Aug 14 10:52:52 2000 --- - Mon Aug 14 11:01:58 2000 *************** *** 1764,1772 **** len = min(len, sizeof(int)); switch(optname) { case TCP_MAXSEG: ! val = tp->user_mss; break; case TCP_NODELAY: val = (sk->nonagle == 1); break; --- 1764,1772 ---- len = min(len, sizeof(int)); switch(optname) { case TCP_MAXSEG: ! val = tp->mss_cache; break; case TCP_NODELAY: val = (sk->nonagle == 1); break; ---------> CUT HERE <---------------- As promised here it is the source code of the test application that helped me to detect the above bug. The program has been tested on several Linux machines including: 2.2.14-12smp /i686 2.2.16-3 /alpha ---------> CUT HERE <---------------- #include #include #include #include #include #include #include int main(int argc, char **argv) { struct sockaddr_in dest ; unsigned int mss ; socklen_t sl; int s ; if(argc != 3) { fputs("USAGE: getmss.c \n", stderr) ; return 0 ; } s = socket(AF_INET, SOCK_STREAM, 0) ; if(s == -1) { perror("socket"); return -1 ; } memset(&dest, 0, sizeof(dest)) ; dest.sin_family = AF_INET ; dest.sin_addr.s_addr = inet_addr(argv[1]) ; dest.sin_port = htons(atoi(argv[2])); if(connect(s, &dest, sizeof(dest)) == -1) { perror("connect"); close(s) ; return -1 ; } fprintf(stderr, "Connected to %s on port %s !\n", argv[1], argv[2]); mss = 0 ; sl = sizeof(mss) ; if(getsockopt(s, IPPROTO_TCP, TCP_MAXSEG, &mss, &sl) == -1) { perror("getsockopt"); close(s) ; return -1 ; } fprintf(stdout, "MSS = %u (sizeof(MSS) = %d)\n", mss, sl) ; fflush(stdout) ; close(s) ; return 0 ; } ---------> CUT HERE <---------------- From owner-netdev@oss.sgi.com Mon Aug 14 09:58:38 2000 Received: by oss.sgi.com id ; Mon, 14 Aug 2000 09:58:28 -0700 Received: from colin.muc.de ([193.149.48.1]:22790 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Mon, 14 Aug 2000 09:58:13 -0700 Received: by colin.muc.de id <140559-2>; Mon, 14 Aug 2000 18:58:01 +0200 Message-ID: <20000814185756.43609@colin.muc.de> From: Andi Kleen To: Maciej 'Agaran' Pijanka Cc: NetDevel List Subject: Re: IPv4 in IPv6 References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from Maciej 'Agaran' Pijanka on Mon, Aug 14, 2000 at 04:29:05PM +0200 Date: Mon, 14 Aug 2000 18:57:56 +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 386 Lines: 10 On Mon, Aug 14, 2000 at 04:29:05PM +0200, Maciej 'Agaran' Pijanka wrote: > Hello > Could anyone tell me if exist that possibility for linux 2.2.x ? > something like reversed sit.. > i looked at sources..but all things bases on ipv4..hashing and so on.. ftp.suse.com:/pub/people/ak/tunnel/fourtun-2.tgz is a simple v4-in-v6 tunnel for Linux 2.2. It requires static tunnel setup. -Andi From owner-netdev@oss.sgi.com Mon Aug 14 14:08:43 2000 Received: by oss.sgi.com id ; Mon, 14 Aug 2000 14:08:33 -0700 Received: from m202-2-p45.warwick.net ([208.242.202.100]:3332 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Mon, 14 Aug 2000 14:08:17 -0700 Received: from circuit.moureaux.com (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with SMTP id RAA01008; Mon, 14 Aug 2000 17:11:06 -0400 Date: Mon, 14 Aug 2000 17:11:06 -0400 (EDT) From: Statux X-Sender: statux@circuit.moureaux.com To: "Maciej 'Agaran' Pijanka" cc: NetDevel List Subject: Re: IPv4 in IPv6 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 682 Lines: 19 this question seems very unclear (hard to understand). Are you talking IPv4 mapped IPv6 addresses, or... ? or are you talking the other way around like have a v6 mapped as v4 (which is impossible). please restate the original question. On Mon, 14 Aug 2000, Maciej 'Agaran' Pijanka wrote: > Hello > Could anyone tell me if exist that possibility for linux 2.2.x ? > something like reversed sit.. > i looked at sources..but all things bases on ipv4..hashing and so on.. > > -- > Maciej 'Agaran' Pijanka MAP2-6BONE > i386, Linux 2.2, Pine, Slrn, Vi(m), IPv6, Gdb, > I do not fear computers. I fear the lack of them. > -- Isaac Asimov > From owner-netdev@oss.sgi.com Mon Aug 14 20:07:16 2000 Received: by oss.sgi.com id ; Mon, 14 Aug 2000 20:07:07 -0700 Received: from magnus.cordef.net.pl ([212.160.102.222]:11527 "HELO kepler.agaran.6bone.pl") by oss.sgi.com with SMTP id ; Mon, 14 Aug 2000 20:06:46 -0700 Received: by kepler.agaran.6bone.pl (Postfix+IPv6, from userid 500) id 328D7BF9C; Tue, 15 Aug 2000 05:05:56 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by kepler.agaran.6bone.pl (Postfix+IPv6) with ESMTP id 1ECCBBF91 for ; Tue, 15 Aug 2000 05:05:55 +0200 (CEST) Date: Tue, 15 Aug 2000 05:05:52 +0200 (CEST) From: Maciej 'Agaran' Pijanka To: NetDevel List Subject: Re: IPv4 in IPv6 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 14 Aug 2000, Statux wrote: > this question seems very unclear (hard to understand). Are you talking > IPv4 mapped IPv6 addresses, or... ? or are you talking the other way > around like have a v6 mapped as v4 (which is impossible). please restate > the original question. i asked for something like gif on bsd which can make tunnel binded to ipv6 endpoints and inside with ipv4 i looked at sources of sit,gre,ipip tunnels but all things in it are based on ipv4 addresses. some kernel structs bases on that too.. and if i modify ip_tunnel struct, there is problem with tools which depends on that struct (ex iproute2) > > On Mon, 14 Aug 2000, Maciej 'Agaran' Pijanka wrote: > > > Hello > > Could anyone tell me if exist that possibility for linux 2.2.x ? > > something like reversed sit.. > > i looked at sources..but all things bases on ipv4..hashing and so on.. > > > > -- > > Maciej 'Agaran' Pijanka MAP2-6BONE > > i386, Linux 2.2, Pine, Slrn, Vi(m), IPv6, Gdb, > > I do not fear computers. I fear the lack of them. > > -- Isaac Asimov > > > > -- Maciej 'Agaran' Pijanka MAP2-6BONE i386, Linux 2.2, Pine, Slrn, Vi(m), IPv6, Gdb, I do not fear computers. I fear the lack of them. -- Isaac Asimov From owner-netdev@oss.sgi.com Tue Aug 15 04:06:08 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 04:05:48 -0700 Received: from tml.hut.fi ([130.233.44.1]:37382 "EHLO tml-gw.tml.hut.fi") by oss.sgi.com with ESMTP id ; Tue, 15 Aug 2000 04:05:27 -0700 Received: (from smap@localhost) by tml-gw.tml.hut.fi (8.8.7/8.8.7) id OAA27806 for ; Tue, 15 Aug 2000 14:05:24 +0300 Received: from caffeine.tml.hut.fi(130.233.45.27) by tml-gw.tml.hut.fi via smap (V2.0) id xma027800; Tue, 15 Aug 00 14:05:15 +0300 Received: from morphine.tml.hut.fi (morphine.tml.hut.fi [130.233.45.7]) by caffeine.tml.hut.fi (8.10.2/8.10.2) with ESMTP id e7FB5Qu19989 for ; Tue, 15 Aug 2000 14:05:26 +0300 (EET DST) Received: from localhost (lpetande@localhost) by morphine.tml.hut.fi (8.9.2/8.7.1) with ESMTP id OAA26632 for ; Tue, 15 Aug 2000 14:05:02 +0300 (EET DST) X-Authentication-Warning: morphine.tml.hut.fi: lpetande owned process doing -bs Date: Tue, 15 Aug 2000 14:05:02 +0300 (EET DST) From: Lars Henrik Petander To: NetDevel List Subject: IPv6 in IPv6 In-Reply-To: <20000814185756.43609@colin.muc.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 14 Aug 2000, Andi Kleen wrote: > On Mon, Aug 14, 2000 at 04:29:05PM +0200, Maciej 'Agaran' Pijanka wrote: > > Hello > > Could anyone tell me if exist that possibility for linux 2.2.x ? > > something like reversed sit.. > > i looked at sources..but all things bases on ipv4..hashing and so on.. > > ftp.suse.com:/pub/people/ak/tunnel/fourtun-2.tgz is a simple v4-in-v6 > tunnel for Linux 2.2. It requires static tunnel setup. Sounds interesting. Has anyone implemented v6-in-v6 tunneling with virtual devices for the 2.4 kernels? Henrik > > -Andi > From owner-netdev@oss.sgi.com Tue Aug 15 05:07:38 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 05:07:28 -0700 Received: from mumm.ibr.cs.tu-bs.de ([134.169.34.190]:47571 "EHLO mumm.ibr.cs.tu-bs.de") by oss.sgi.com with ESMTP id ; Tue, 15 Aug 2000 05:07:11 -0700 Received: from kelts.ibr.cs.tu-bs.de (IDENT:root@kelts [134.169.34.131]) by mumm.ibr.cs.tu-bs.de (8.9.3/8.9.3) with ESMTP id OAA24670; Tue, 15 Aug 2000 14:07:08 +0200 (MET DST) Received: (from dieder@localhost) by kelts.ibr.cs.tu-bs.de (8.9.3/8.9.3) id OAA06896; Tue, 15 Aug 2000 14:07:07 +0200 Date: Tue, 15 Aug 2000 14:07:07 +0200 Message-Id: <200008151207.OAA06896@kelts.ibr.cs.tu-bs.de> X-Authentication-Warning: kelts.ibr.cs.tu-bs.de: dieder set sender to dieder@kelts.ibr.cs.tu-bs.de using -f From: Joerg Diederich To: linux-diffserv@lrc.di.epfl.ch CC: netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru, ak@muc.de In-reply-to: <20000815131723.A1117@fred.muc.de> (message from Andi Kleen on Tue, 15 Aug 2000 13:17:23 +0200) Subject: Bugfix u32 filters: Patch instead of whole file References: <200008150749.JAA20285@kelts.ibr.cs.tu-bs.de> <20000815131723.A1117@fred.muc.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="++----------20000815140240-83039500----------++" Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. --++----------20000815140240-83039500----------++ Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi! To better recognize the changes I have made to cls_u32.c, I have attached a patch against DS pre9 below. Regards, /J"org -------------------------------------- Hi, I have looked a little into the problems with deleting u32 Filters. I'm using DS8 with the pre9 patches posted on the 9th of July by Werner on a 2.2.14 System (however, 2.2.16 should have the same problems since no file has changed in linux/net/sched since then). I found two additional problems with deleting filters apart from the one fixed by ds pre9. 1. Refcnt of child-Filters (non root_ht) was not set correctly on creation. 2. When performing a 'tc qdisc del dev eth0 root' these child Filters were not deleted correctly (memory leak). With both fixes I can delete child filters with something like: 'tc filter del dev eth1 protocol ip prio 244 parent 1:0 handle 800::801 u32' Furthermore, I added a fix so that 'tc filter ls' did not show filter twice or even more often. This happened when adding several u32 filters to a queue with different priorities. A 'tc filter ls' showed a list of filters for all used priorities, but the child-Filter (non-root ht) were shown in each filter-priority-list regardlessly of the priority of the child filter. I did not have time to test these patches that much, so please be careful :-) Of course, I am interested in feedback. The cls_u32.c file including the ds9pre patch and my patches is attached below. Regards, /J"org Diederich --++----------20000815140240-83039500----------++ Content-Type: application/octet-stream; name="cls_u32.patch" Content-Transfer-Encoding: base64 LS0tIG5ldC9zY2hlZC9jbHNfdTMyLmMub3JpZwlUdWUgQXVnIDE1IDEzOjU4OjQ2IDIwMDAKKysr IG5ldC9zY2hlZC9jbHNfdTMyLmMJVHVlIEF1ZyAxNSAxNDowMToxNSAyMDAwCkBAIC03Niw2ICs3 Niw3IEBACiAJaW50CQkJcmVmY250OwogCXVuc2lnbmVkCQlkaXZpc29yOwogCXUzMgkJCWhnZW5l cmF0b3I7CisJdTMyCQkJcHJpbzsKIAlzdHJ1Y3QgdGNfdV9rbm9kZQkqaHRbMV07CiB9OwogCkBA IC04OCw3ICs4OSw3IEBACiAJdTMyCQkJaGdlbmVyYXRvcjsKIH07CiAKLXN0YXRpYyBzdHJ1Y3Qg dGNfdV9jb21tb24gKnUzMl9saXN0Oworc3RhdGljIHN0cnVjdCB0Y191X2NvbW1vbiAqdTMyX2xp c3QgPSBOVUxMOwogCiBzdGF0aWMgX19pbmxpbmVfXyB1bnNpZ25lZCB1MzJfaGFzaF9mb2xkKHUz MiBrZXksIHN0cnVjdCB0Y191MzJfc2VsICpzZWwpCiB7CkBAIC0yODAsNiArMjgxLDggQEAKIAly b290X2h0LT5yZWZjbnQrKzsKIAlyb290X2h0LT5oYW5kbGUgPSB0cF9jID8gZ2VuX25ld19odGlk KHRwX2MpIDogMHg4MDAwMDAwMDsKIAorCXJvb3RfaHQtPnByaW8gPSB0cC0+cHJpbzsKKwogCWlm ICh0cF9jID09IE5VTEwpIHsKIAkJdHBfYyA9IGttYWxsb2Moc2l6ZW9mKCp0cF9jKSwgR0ZQX0tF Uk5FTCk7CiAJCWlmICh0cF9jID09IE5VTEwpIHsKQEAgLTM3NCw2ICszNzcsMjUgQEAKIAlyZXR1 cm4gLUVOT0VOVDsKIH0KIAorc3RhdGljIGludCB1MzJfZGVsZXRlKHN0cnVjdCB0Y2ZfcHJvdG8g KnRwLCB1bnNpZ25lZCBsb25nIGFyZykKK3sKKwlzdHJ1Y3QgdGNfdV9obm9kZSAqaHQgPSAoc3Ry dWN0IHRjX3VfaG5vZGUqKWFyZzsKKworCWlmIChodCA9PSBOVUxMKQorCQlyZXR1cm4gMDsKKwor CWlmIChUQ19VMzJfS0VZKGh0LT5oYW5kbGUpKQorCQlyZXR1cm4gdTMyX2RlbGV0ZV9rZXkodHAs IChzdHJ1Y3QgdGNfdV9rbm9kZSopaHQpOworCisJaWYgKHRwLT5yb290ID09IGh0KQorCQlyZXR1 cm4gLUVJTlZBTDsKKworCWlmICgtLWh0LT5yZWZjbnQgPT0gMCkKKwkJdTMyX2Rlc3Ryb3lfaG5v ZGUodHAsIGh0KTsKKworCXJldHVybiAwOworfQorCiBzdGF0aWMgdm9pZCB1MzJfZGVzdHJveShz dHJ1Y3QgdGNmX3Byb3RvICp0cCkKIHsKIAlzdHJ1Y3QgdGNfdV9jb21tb24gKnRwX2MgPSB0cC0+ ZGF0YTsKQEAgLTM5Niw3ICs0MTgsMTAgQEAKIAkJfQogCiAJCWZvciAoaHQ9dHBfYy0+aGxpc3Q7 IGh0OyBodCA9IGh0LT5uZXh0KQotCQkJdTMyX2NsZWFyX2hub2RlKHRwLCBodCk7CisvKiBKRDog d2l0aG91dCB0aGlzIGNoYW5nZSB3ZSBkbyBub3QgZnJlZSB0aGUgbWVtb3J5IGZvciBvdGhlciB0 aGFuIHRoZSByb290X2h0CisgICBpZiB3ZSBkbyAndGMgcWRpc2MgZGVsIGRldiBldGgxIHJvb3Qn ICovCisJCSAgLyoJCQl1MzJfY2xlYXJfaG5vZGUodHAsIGh0KTsqLworCQkJdTMyX2RlbGV0ZSh0 cCwgKGxvbmcgaW50KSBodCk7CiAKIAkJd2hpbGUgKChodCA9IHRwX2MtPmhsaXN0KSAhPSBOVUxM KSB7CiAJCQl0cF9jLT5obGlzdCA9IGh0LT5uZXh0OwpAQCAtNDEzLDI1ICs0MzgsNiBAQAogCXRw LT5kYXRhID0gTlVMTDsKIH0KIAotc3RhdGljIGludCB1MzJfZGVsZXRlKHN0cnVjdCB0Y2ZfcHJv dG8gKnRwLCB1bnNpZ25lZCBsb25nIGFyZykKLXsKLQlzdHJ1Y3QgdGNfdV9obm9kZSAqaHQgPSAo c3RydWN0IHRjX3VfaG5vZGUqKWFyZzsKLQotCWlmIChodCA9PSBOVUxMKQotCQlyZXR1cm4gMDsK LQotCWlmIChUQ19VMzJfS0VZKGh0LT5oYW5kbGUpKQotCQlyZXR1cm4gdTMyX2RlbGV0ZV9rZXko dHAsIChzdHJ1Y3QgdGNfdV9rbm9kZSopaHQpOwotCi0JaWYgKHRwLT5yb290ID09IGh0KQotCQly ZXR1cm4gLUVJTlZBTDsKLQotCWlmICgtLWh0LT5yZWZjbnQgPT0gMCkKLQkJdTMyX2Rlc3Ryb3lf aG5vZGUodHAsIGh0KTsKLQotCXJldHVybiAwOwotfQotCiBzdGF0aWMgdTMyIGdlbl9uZXdfa2lk KHN0cnVjdCB0Y191X2hub2RlICpodCwgdTMyIGhhbmRsZSkKIHsKIAlzdHJ1Y3QgdGNfdV9rbm9k ZSAqbjsKQEAgLTUzNSwxMCArNTQxLDEzIEBACiAJCQlyZXR1cm4gLUVOT0JVRlM7CiAJCW1lbXNl dChodCwgMCwgc2l6ZW9mKCpodCkgKyBkaXZpc29yKnNpemVvZih2b2lkKikpOwogCQlodC0+dHBf YyA9IHRwX2M7Ci0JCWh0LT5yZWZjbnQgPSAwOworCQlodC0+cmVmY250ID0gMTsKIAkJaHQtPmRp dmlzb3IgPSBkaXZpc29yOwogCQlodC0+aGFuZGxlID0gaGFuZGxlOwogCQlodC0+bmV4dCA9IHRw X2MtPmhsaXN0OworCisJCWh0LT5wcmlvID0gdHAtPnByaW87CisJCQogCQl0cF9jLT5obGlzdCA9 IGh0OwogCQkqYXJnID0gKHVuc2lnbmVkIGxvbmcpaHQ7CiAJCXJldHVybiAwOwpAQCAtNjExLDEw ICs2MjAsMTIgQEAKIAogCWZvciAoaHQgPSB0cF9jLT5obGlzdDsgaHQ7IGh0ID0gaHQtPm5leHQp IHsKIAkJaWYgKGFyZy0+Y291bnQgPj0gYXJnLT5za2lwKSB7CisJCQlpZiAoaHQtPnByaW8gPT0g dHAtPnByaW8pIHsKIAkJCWlmIChhcmctPmZuKHRwLCAodW5zaWduZWQgbG9uZylodCwgYXJnKSA8 IDApIHsKIAkJCQlhcmctPnN0b3AgPSAxOwogCQkJCXJldHVybjsKIAkJCX0KKwkJCX0KIAkJfQog CQlhcmctPmNvdW50Kys7CiAJCWZvciAoaCA9IDA7IGggPD0gaHQtPmRpdmlzb3I7IGgrKykgewpA QCAtNjIzLDExICs2MzQsMTMgQEAKIAkJCQkJYXJnLT5jb3VudCsrOwogCQkJCQljb250aW51ZTsK IAkJCQl9Ci0JCQkJaWYgKGFyZy0+Zm4odHAsICh1bnNpZ25lZCBsb25nKW4sIGFyZykgPCAwKSB7 CisJCWlmIChodC0+cHJpbyA9PSB0cC0+cHJpbykgeworCQkJaWYgKGFyZy0+Zm4odHAsICh1bnNp Z25lZCBsb25nKW4sIGFyZykgPCAwKSB7CiAJCQkJCWFyZy0+c3RvcCA9IDE7CiAJCQkJCXJldHVy bjsKIAkJCQl9Ci0JCQkJYXJnLT5jb3VudCsrOworCQl9CisJCQlhcmctPmNvdW50Kys7CiAJCQl9 CiAJCX0KIAl9Cg== --++----------20000815140240-83039500----------++-- From owner-netdev@oss.sgi.com Tue Aug 15 07:52:19 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 07:52:00 -0700 Received: from ns1226.munich.netsurf.de ([195.180.235.226]:59653 "HELO fred.muc.de") by oss.sgi.com with SMTP id ; Tue, 15 Aug 2000 07:51:51 -0700 Received: by fred.muc.de (Postfix, from userid 500) id 66AD9E38E0; Tue, 15 Aug 2000 16:54:03 +0200 (CEST) Date: Tue, 15 Aug 2000 16:54:03 +0200 From: Andi Kleen To: Lars Henrik Petander Cc: NetDevel List Subject: Re: IPv6 in IPv6 Message-ID: <20000815165403.A3842@fred.muc.de> References: <20000814185756.43609@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from lpetande@tml.hut.fi on Tue, Aug 15, 2000 at 01:06:57PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Aug 15, 2000 at 01:06:57PM +0200, Lars Henrik Petander wrote: > > > On Mon, 14 Aug 2000, Andi Kleen wrote: > > > On Mon, Aug 14, 2000 at 04:29:05PM +0200, Maciej 'Agaran' Pijanka wrote: > > > Hello > > > Could anyone tell me if exist that possibility for linux 2.2.x ? > > > something like reversed sit.. > > > i looked at sources..but all things bases on ipv4..hashing and so on.. > > > > ftp.suse.com:/pub/people/ak/tunnel/fourtun-2.tgz is a simple v4-in-v6 > > tunnel for Linux 2.2. It requires static tunnel setup. > > Sounds interesting. Has anyone implemented v6-in-v6 tunneling > with virtual devices for the 2.4 kernels? Fourtun should handle v6-in-SIT-in-v4-over-v6. It would also not be very hard to extend it to v6-in-v6. Fourtun is 2.2 only ATM and may miss some of the SMP locking required for 2.4. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Tue Aug 15 08:53:20 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 08:53:00 -0700 Received: from jil.informatik.uni-rostock.de ([139.30.5.243]:58302 "EHLO jil.informatik.uni-rostock.de") by oss.sgi.com with ESMTP id ; Tue, 15 Aug 2000 08:52:31 -0700 Received: from hokkaido.informatik.uni-rostock.de (echter@hokkaido [139.30.1.235]) by jil.informatik.uni-rostock.de (8.9.3/8.9.3/relay3.3) with ESMTP id RAA21844; Tue, 15 Aug 2000 17:52:26 +0200 (MET DST) Received: (from echter@localhost) by hokkaido.informatik.uni-rostock.de (8.8.5/8.8.5/fin2.0) id RAA27272; Tue, 15 Aug 2000 17:52:25 +0200 (MET DST) Date: Tue, 15 Aug 2000 17:52:25 +0200 From: Jan Echternach To: James Morris Cc: netfilter@samba.org, netdev@oss.sgi.com Subject: NLMSG_* macros (was: Re: ULOG comments) Message-ID: <20000815175225.B26543@hokkaido.informatik.uni-rostock.de> Reply-To: Jan Echternach Mail-Followup-To: James Morris , netfilter@samba.org, netdev@oss.sgi.com References: <20000811162634.A3814@hokkaido.informatik.uni-rostock.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from jmorris@intercode.com.au on Sat, Aug 12, 2000 at 01:22:53AM +1000 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing [Cc'ed to netdev] On Sat, Aug 12, 2000 at 01:22:53AM +1000, James Morris wrote: > The NLMSG_ macros must be used when modifing or accessing a netlink > bytestream. See netlink(3) and netlink(7). But why? IMHO, NLMSG_* just add an uneccessary wrapper for messages that can't ever have multiple parts. I don't see it as a clean interface in this case. And are netlink(3) and netlink(7) really accurate for NETLINK_FIREWALL? They seem to concentrate on NETLINK_ROUTE. For example, both man pages refer to libnetlink which only supports NETLINK_ROUTE. Should NLMSG_* be used by netfilter targets for all kinds of netlink messages over NETLINK_FIREWALL or NETLINK_NFLOG type sockets? -- Jan From owner-netdev@oss.sgi.com Tue Aug 15 09:16:10 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 09:16:01 -0700 Received: from ns1226.munich.netsurf.de ([195.180.235.226]:1286 "HELO fred.muc.de") by oss.sgi.com with SMTP id ; Tue, 15 Aug 2000 09:15:45 -0700 Received: by fred.muc.de (Postfix, from userid 500) id 87D79E38E0; Tue, 15 Aug 2000 18:18:12 +0200 (CEST) Date: Tue, 15 Aug 2000 18:18:12 +0200 From: Andi Kleen To: James Morris , netfilter@samba.org, netdev@oss.sgi.com Cc: jan.echternach@informatik.uni-rostock.de Subject: Re: NLMSG_* macros (was: Re: ULOG comments) Message-ID: <20000815181812.A5358@fred.muc.de> References: <20000811162634.A3814@hokkaido.informatik.uni-rostock.de> <20000815175225.B26543@hokkaido.informatik.uni-rostock.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20000815175225.B26543@hokkaido.informatik.uni-rostock.de>; from echter@informatik.uni-rostock.de on Tue, Aug 15, 2000 at 05:54:03PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Aug 15, 2000 at 05:54:03PM +0200, Jan Echternach wrote: > [Cc'ed to netdev] > > On Sat, Aug 12, 2000 at 01:22:53AM +1000, James Morris wrote: > > The NLMSG_ macros must be used when modifing or accessing a netlink > > bytestream. See netlink(3) and netlink(7). > > But why? IMHO, NLMSG_* just add an uneccessary wrapper for messages > that can't ever have multiple parts. I don't see it as a clean > interface in this case. It is strongly recommended to use the NLMSG_* macros to avoid alignment problems on other architectures than i386. > > And are netlink(3) and netlink(7) really accurate for > NETLINK_FIREWALL? They seem to concentrate on NETLINK_ROUTE. For > example, both man pages refer to libnetlink which only supports > NETLINK_ROUTE. They are accurate as far as I know (but missing some stuff) > > Should NLMSG_* be used by netfilter targets for all kinds of netlink > messages over NETLINK_FIREWALL or NETLINK_NFLOG type sockets? Yes. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Tue Aug 15 09:25:40 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 09:25:20 -0700 Received: from jil.informatik.uni-rostock.de ([139.30.5.243]:57538 "EHLO jil.informatik.uni-rostock.de") by oss.sgi.com with ESMTP id ; Tue, 15 Aug 2000 09:25:18 -0700 Received: from hokkaido.informatik.uni-rostock.de (echter@hokkaido [139.30.1.235]) by jil.informatik.uni-rostock.de (8.9.3/8.9.3/relay3.3) with ESMTP id SAA22236; Tue, 15 Aug 2000 18:25:13 +0200 (MET DST) Received: (from echter@localhost) by hokkaido.informatik.uni-rostock.de (8.8.5/8.8.5/fin2.0) id SAA27763; Tue, 15 Aug 2000 18:25:12 +0200 (MET DST) Date: Tue, 15 Aug 2000 18:25:12 +0200 From: Jan Echternach To: Andi Kleen Cc: netfilter@samba.org, netdev@oss.sgi.com Subject: Re: NLMSG_* macros (was: Re: ULOG comments) Message-ID: <20000815182512.D26543@hokkaido.informatik.uni-rostock.de> Reply-To: Jan Echternach Mail-Followup-To: Andi Kleen , netfilter@samba.org, netdev@oss.sgi.com References: <20000811162634.A3814@hokkaido.informatik.uni-rostock.de> <20000815175225.B26543@hokkaido.informatik.uni-rostock.de> <20000815181812.A5358@fred.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20000815181812.A5358@fred.muc.de>; from ak@muc.de on Tue, Aug 15, 2000 at 06:18:12PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Aug 15, 2000 at 06:18:12PM +0200, Andi Kleen wrote: > It is strongly recommended to use the NLMSG_* macros to avoid alignment problems > on other architectures than i386. But there are absolutely no alignement problems with single-part messages. Actually, there are even fewer alignment problems without NLMSG_* in this case because you don't need to use malloc() to allocate the buffer. You could also use a simple variable of structure type with automatic or static storage duration if the netlink datagram contains such a structure. BTW, are there any other reasons for using NLMSG_* apart from alignement issues? -- Jan From owner-netdev@oss.sgi.com Tue Aug 15 13:47:24 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 13:47:04 -0700 Received: from colin.muc.de ([193.149.48.1]:17937 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Tue, 15 Aug 2000 13:46:50 -0700 Received: by colin.muc.de id <140564-2>; Tue, 15 Aug 2000 22:46:35 +0200 Message-ID: <20000815224633.53823@colin.muc.de> From: Andi Kleen To: Jan Echternach Cc: Andi Kleen , netfilter@samba.org, netdev@oss.sgi.com Subject: Re: NLMSG_* macros (was: Re: ULOG comments) References: <20000811162634.A3814@hokkaido.informatik.uni-rostock.de> <20000815175225.B26543@hokkaido.informatik.uni-rostock.de> <20000815181812.A5358@fred.muc.de> <20000815182512.D26543@hokkaido.informatik.uni-rostock.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <20000815182512.D26543@hokkaido.informatik.uni-rostock.de>; from Jan Echternach on Tue, Aug 15, 2000 at 06:25:29PM +0200 Date: Tue, 15 Aug 2000 22:46:34 +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Aug 15, 2000 at 06:25:29PM +0200, Jan Echternach wrote: > On Tue, Aug 15, 2000 at 06:18:12PM +0200, Andi Kleen wrote: > > It is strongly recommended to use the NLMSG_* macros to avoid alignment problems > > on other architectures than i386. > > But there are absolutely no alignement problems with single-part > messages. Actually, there are even fewer alignment problems without There is between the header and the payload. > BTW, are there any other reasons for using NLMSG_* apart from > alignement issues? The main reason is alignment, and it is usually cleaner than doing pointer arithmetic by hand in case of multipart messages. -Andi From owner-netdev@oss.sgi.com Tue Aug 15 14:02:54 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 14:02:34 -0700 Received: from ldn52-50.Leiden.NL.net ([212.206.213.51]:17924 "EHLO ida.2y.net") by oss.sgi.com with ESMTP id ; Tue, 15 Aug 2000 14:02:16 -0700 Received: from freeler.nl (IDENT:jorg@localhost [127.0.0.1]) by ida.2y.net (8.9.3/8.9.3) with ESMTP id XAA01442 for ; Tue, 15 Aug 2000 23:02:07 +0200 Message-ID: <3999AFCF.7B051979@freeler.nl> Date: Tue, 15 Aug 2000 23:02:07 +0200 From: Jorg de Jong X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.0-test6 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: [PATCH] ipv6 sit.c smal bug fix in creating tunnel Content-Type: multipart/mixed; boundary="------------70B05C3A0FF8D14220FD86D5" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. --------------70B05C3A0FF8D14220FD86D5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, the following triggers the error [root] /root > ifconfig sit0 tunnel ::206.123.31.102 SIOCSIFDSTADDR: No buffer space available and the attached patch against 2.4.0-test6 fixes it. Am I right to post this fix here or should it go to the lkml or somewhere else? -- Jorg de Jong Work : mailto:jorg.de.jong@ict.nl Play : mailto:j.e.s.de.jong@freeler.nl --------------70B05C3A0FF8D14220FD86D5 Content-Type: text/plain; charset=us-ascii; name="2.4.0.test6.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="2.4.0.test6.diff" diff -u --recursive --new-file linux-2.4.0-test6/net/ipv6/sit.c linux/net/ipv6/sit.c --- linux-2.4.0-test6/net/ipv6/sit.c Tue Aug 15 22:28:27 2000 +++ linux/net/ipv6/sit.c Tue Aug 15 22:55:05 2000 @@ -188,7 +188,7 @@ } if (i==100) goto failed; - memcpy(nt->parms.name, dev->name, IFNAMSIZ); + memcpy(parms->name, dev->name, IFNAMSIZ); } if (register_netdevice(dev) < 0) goto failed; --------------70B05C3A0FF8D14220FD86D5-- From owner-netdev@oss.sgi.com Tue Aug 15 15:13:14 2000 Received: by oss.sgi.com id ; Tue, 15 Aug 2000 15:13:04 -0700 Received: from cpu2747.adsl.bellglobal.com ([207.236.55.216]:6639 "EHLO grendel.conscoop.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Tue, 15 Aug 2000 15:12:51 -0700 Received: (from rgb@localhost) by grendel.conscoop.ottawa.on.ca (8.9.0/8.9.0) id OAA05097; Tue, 15 Aug 2000 14:35:40 -0400 Date: Tue, 15 Aug 2000 14:35:39 -0400 From: Richard Guy Briggs To: Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list Cc: John Gilmore , Hugh Daniel , Henry Spencer , Hugh Redelmeier , Richard Guy Briggs Subject: FreeS/WAN redesign thoughts (KLIPS, IPSEC) Message-ID: <20000815143539.B4771@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="qcHopEYAB45HaUaB" Content-Disposition: inline User-Agent: Mutt/1.2i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing --qcHopEYAB45HaUaB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable FreeS/WAN IPSEC -- KLIPS2 DESIGN THOUGHTS =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This document was written shortly after OLS2000, inspired from a meeting with Rusty and Marc in Montreal in November 1999 and two intense FreeS/WAN BoFs at OLS2000. Current kernel version reference is 2.4.0-test4. The idea is to redesign KLIPS (kernel parts of FreeS/WAN) to avoid all the 'stoopid routing tricks' (TM) to which we have had to resort over the last 2+ years and to add a proper SPDB to do proper incoming IPSEC policy checks. We are hoping to use existing pattern-matching tools rather than invent our own. NetFilter appears to have all the pattern matching capabilities, but is limited in other ways. This is an exploratory document. Please comment, particularly if I have missed or mis-understood something, to the linux-ipsec, netfilter or netdev lists. The basic architecture of NetFilter is: --->[1]--->(ROUTE)--->[3]--->[4]---> where: | ^ [1] NF_IP_PRE_ROUTING | | [2] NF_IP_LOCAL_IN | (ROUTE) [3] NF_IP_FORWARD v | [4] NF_IP_POST_ROUTING [2] [5] [5] NF_IP_LOCAL_OUT | ^ =20 | | =20 v | =20 The basic path through the kernel as it concerns IPSEC for the three types of packets is as follows: IN: nic sanity check NF_IP_PRE_ROUTING route-in ip-options defragment NF_IP_LOCAL_IN layer3demux application FORWARD: nic sanity check NF_IP_PRE_ROUTING routing-in ip-options digesting ttl decrement and check NF_IP_FORWARD fragment NF_IP_POST_ROUTING output() nic OUT: application layer3mux NF_IP_LOCAL_OUT route-out NF_IP_POST_ROUTING output() nic Keep in mind that Destination NAT (port forwarding) gets applied in NF_IP_PRE_ROUTING and Source NAT (masquerading) gets applied in NF_IP_POST_ROUTING. ----------- There is more than one possible approach. The following is not exhaustive. --- 1 --- Treat incoming IPSEC encapsulation as a layer 3 protocol and decapsulate it at the Layer 3 demultiplexer. An incoming packet starts off with a sanity check. It then goes through all the NF_IP_PRE_ROUTING hooks starting with the SPDB checking. Since it is a fresh ESP or AH packet, it will not have any nfmarks and unless that outer IP header should have been processed by another SG in between, no policy will have been required, letting it through. The rest of the NF_IP_PRE_ROUTING hooks may cause it to be DNATed and defragmented. It then goes through routing which thinks it is a local packet, deals with any outer header IP options, then defragmentation and NF_IP_LOCAL_IN filter (allow ESP,AH) before getting to ipsec_rcv() where the outer bundle is authenticated and decrypted and nfmarked before being passed back to netif_rx(). The next IP header is now visible. The SADB would be managed via the PF_KEYv2 socket I/F. For local packets, it follows the same path, getting checked at NF_IP_PRE_ROUTING for policy using previously set nfmark. If this passes, routing looks at the now-visible next IP header and routes it locally where inner IP options and defragmentation are processed. NF_IP_LOCAL_IN then gets to check filtering policy for other L3 protocols. If it is the endpoint for multiple bundles, it iterates, having exposed the next IP header. For non-local packets, it goes through the incoming sanity check again, goes through NF_IP_PRE_ROUTING where it could get DNATed and defragmented, it routes, potentially through an existing virtual IPSEC device, one per connection, not per physical I/F. IP options and TTL are processed before being filtered at NF_IP_FORWARD, fragmented, then intercepted at NF_IP_POST_ROUTING after SNAT for encryption and authentication. Again, at NF_IP_POST_ROUTING, an IPSEC matching module would make a decision about the fate of the packet. It would have several possible targets: ACCEPT would allow the packet through with no processing. ENCRYPT would send it off to the equivalent of ipsec_tunnel_start_transmit() after setting nfmark if it knows that the SA exists. QUEUE would allow the packet to be sent to userspace to set up keying for a connection. The way that nfmark is used is rather vague. It is presently only 32 bits. Ideally, I would like to be able to indicate exactly which SAs were processed on the way in, which would most easily be represented by as many as 4 SAs (AH, ESP, IPCOMP, IPIP), each having an 8 bit protocol field (absolute minimum of a 2-bits), 32-bit destination address field (for IPv4, IPv6 would be 128) and a 32-bit SPI. This is a potential maximum of 672 bits. A way of mapping 672 bits on to the 32 bits available would be required to use this. A lookup table could be used to map nfmarks to SAIDs, not the SAs themselves, since the SAs could disappear at any time the tdb table is not locked. It should be able to represent a bundle of SAs where one SA could be used in more than one bundle. There could also be more than one right answer for the incoming SPDB. The SPDB would be managed via a combination of PF_KEYv2 socket I/F extensions and iptables. A separate NetFilter table called 'ipsec' (as opposed to 'filter' or 'nat') would have the first hook at NF_IP_PRE_ROUTING and the last hook at NF_IP_POST_ROUTING. iptables uses the AF_NETLINK socket family. I'm not certain exactly where a packet routed through an optional IPSEC virtual I/F gets injected into the system. ----------- --- 2 --- Treat incoming IPSEC encapsulation as an enhancement of the layer 2 protocol and decapsulate it at the NF_IP_PRE_ROUTING hook. This option is less favourable as it stands since it involves creating our own SPDB engine. An incoming packet starts off with a sanity check. It then goes through the NF_IP_PRE_ROUTING match hook for IPSEC, which would be the first in priority, matching every single packet to force it through a policy check. If it was an ESP or AH packet with a local destination address, it would then be sent to ipsec_rcv() and the first bundle would be processed, keeping state until that bundle is completely processed. At this point the incoming SPDB would be checked to ensure that the proper policy had been applied to it. If there is another bundle inside with an ESP or AH header, that bundle is processed, storing the new and old state. This SPDB check would not be iptables-based since we have already gone through the match and target hooks and would have too much state to store in nfmark. The result of the SPDB check would be ACCEPT or DROP (It could also be STOLEN or QUEUEd at this point for opportunistic encryption). The SADB and SPDB entries would be managed via the extended PF_KEYv2 socket I/F. The rest of the NF_IP_PRE_ROUTING hooks may cause it to be DNATed and defragmented. =20 For local packets, routing looks at the now-visible next IP header and routes it locally where inner IP options and defragmentation are processed. NF_IP_LOCAL_IN then gets to check filtering policy for=20 layer 3 protocols. For non-local packets, it routes, potentially through an existing virtual IPSEC device, one per connection, not per physical I/F. IP options and TTL are processed before being filtered at NF_IP_FORWARD then fragmented. Packets are then sent through all the hooks at NF_IP_POST_ROUTING potentially for SNAT, after which the last hook would force all packets to go through the IPSEC outgoing processing module. Here outgoing policy would be checked, again not necessarily by iptables, encryption and authentication would be applied as available, then the result would be ACCEPT or DROP (It could again be STOLEN or QUEUEd at this point for opportunistic encryption). ------------------ If there are any other directions we should be considering, please suggest... slainte mhath, RGB --=20 Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: --qcHopEYAB45HaUaB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: 2.6.3i iQCVAwUBOZmNed+sBuIhFagtAQFsQQP+JxycadmG1haPauVGqXCr7YrxbHsLjdin vbsjT9z92RQnHmL+6twCzLah8CNO6vKhoF9/9yJVThcEiWufDQzvGaKw9xVyaszs xRQABFbF+ROZ5DdnAWMu9yVFWGNhvbEAob5VdYKI0uCTafmcBPFPppriEfdYxuGk uo3B8GcYUAo= =UG9W -----END PGP SIGNATURE----- --qcHopEYAB45HaUaB-- From owner-netdev@oss.sgi.com Wed Aug 16 00:03:05 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 00:02:55 -0700 Received: from mailsrv.panservice.it ([212.66.96.7]:43275 "EHLO mailsrv.panservice.it") by oss.sgi.com with ESMTP id ; Wed, 16 Aug 2000 00:02:35 -0700 Received: from perit.panservice.it (per.noc.panservice.it [212.66.96.162]) by mailsrv.panservice.it (8.9.3/8.9.3) with ESMTP id IAA11943 for ; Wed, 16 Aug 2000 08:35:42 +0200 Message-Id: <4.3.0.20000816083419.0187b440@moon.panservice.it> X-Sender: perit@moon.panservice.it X-Mailer: QUALCOMM Windows Eudora Version 4.3 Date: Wed, 16 Aug 2000 08:37:22 +0200 To: netdev@oss.sgi.com From: Giuliano Peritore Subject: Re: IPv6 in IPv6 In-Reply-To: <20000815165403.A3842@fred.muc.de> References: <20000814185756.43609@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >Fourtun should handle v6-in-SIT-in-v4-over-v6. It would also not be very hard >to extend it to v6-in-v6. Fourtun is 2.2 only ATM and may miss some of the SMP Some days ago I released an update of my sniffer and protocol analyzer COLD (http://www.panservice.it/cold) which now includes IPv6 and ICMP6 support. I've tested sniffing of IPv6 over Ethernet and IPv6 encapsulated in IPv4. I've two open problems: a) Find a workaround because libpcap is not able to sniff from SIT interfaces b) sniff IPv6 over IPv6, but I've no such tunnel :) I would be very glad if someone could send me a brief output of cold --ascii --hex >file which includes some IPv6 over IPv6 packet. Thanks. --------------------------------------------------- Dott. Giuliano Peritore - g.peritore@panservice.it Direzione - Panservice Servizi professionali per Internet ed il Networking Panservice e' associata AIIP -- RIPE Local Registry Phone: +39 0773 410020 Fax +39 0773 470219 http://www.panservice.it --------------------------------------------------- From owner-netdev@oss.sgi.com Wed Aug 16 03:23:00 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 03:22:50 -0700 Received: from jil.informatik.uni-rostock.de ([139.30.5.243]:44175 "EHLO jil.informatik.uni-rostock.de") by oss.sgi.com with ESMTP id ; Wed, 16 Aug 2000 03:22:38 -0700 Received: from hokkaido.informatik.uni-rostock.de (echter@hokkaido [139.30.1.235]) by jil.informatik.uni-rostock.de (8.9.3/8.9.3/relay3.3) with ESMTP id MAA29186; Wed, 16 Aug 2000 12:22:33 +0200 (MET DST) Received: (from echter@localhost) by hokkaido.informatik.uni-rostock.de (8.8.5/8.8.5/fin2.0) id MAA22401; Wed, 16 Aug 2000 12:22:32 +0200 (MET DST) Date: Wed, 16 Aug 2000 12:22:31 +0200 From: Jan Echternach To: Andi Kleen Cc: netfilter@samba.org, netdev@oss.sgi.com Subject: Re: NLMSG_* macros (was: Re: ULOG comments) Message-ID: <20000816122231.A21914@hokkaido.informatik.uni-rostock.de> Reply-To: Jan Echternach Mail-Followup-To: Andi Kleen , netfilter@samba.org, netdev@oss.sgi.com References: <20000811162634.A3814@hokkaido.informatik.uni-rostock.de> <20000815175225.B26543@hokkaido.informatik.uni-rostock.de> <20000815181812.A5358@fred.muc.de> <20000815182512.D26543@hokkaido.informatik.uni-rostock.de> <20000815224633.53823@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20000815224633.53823@colin.muc.de>; from ak@muc.de on Tue, Aug 15, 2000 at 10:46:34PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Aug 15, 2000 at 10:46:34PM +0200, Andi Kleen wrote: > On Tue, Aug 15, 2000 at 06:25:29PM +0200, Jan Echternach wrote: > > On Tue, Aug 15, 2000 at 06:18:12PM +0200, Andi Kleen wrote: > > But there are absolutely no alignement problems with single-part > > messages. Actually, there are even fewer alignment problems without > > There is between the header and the payload. I'm sorry that I didn't represent the issue clearly. I understand that user space code should also use NLMSG_* if the kernel uses it. Ipchains in Linux-2.2 uses a simple structure without padding on the NETLINK_FIREWALL socket. But netfilter in 2.4 puts a nlmsghdr before a similar structure, even though it doesn't use its multipart feature. The new ULOG target on the new NETLINK_NFLOG type socket also uses NLMSG_* without ever producing multipart messages. There seems to be a strong tendency among the netfilter developers to wrap everything in a nlmsghdr structure. It's not too late to change the datagram formats on NETLINK_NFLOG. I think NLMSG_* only makes the data larger and the code more complex and less readable. There is no inherent padding in the data here, there wouldn't be any alignment problems in the first place if NLMSG_* wasn't used. Should NLMSG_* be used on the new NETLINK_NFLOG socket? -- Jan From owner-netdev@oss.sgi.com Wed Aug 16 03:27:20 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 03:27:10 -0700 Received: from colin.muc.de ([193.149.48.1]:40452 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Wed, 16 Aug 2000 03:27:03 -0700 Received: by colin.muc.de id <140559-1>; Wed, 16 Aug 2000 12:26:53 +0200 Message-ID: <20000816122650.48683@colin.muc.de> From: Andi Kleen To: Jan Echternach Cc: Andi Kleen , netfilter@samba.org, netdev@oss.sgi.com Subject: Re: NLMSG_* macros (was: Re: ULOG comments) References: <20000811162634.A3814@hokkaido.informatik.uni-rostock.de> <20000815175225.B26543@hokkaido.informatik.uni-rostock.de> <20000815181812.A5358@fred.muc.de> <20000815182512.D26543@hokkaido.informatik.uni-rostock.de> <20000815224633.53823@colin.muc.de> <20000816122231.A21914@hokkaido.informatik.uni-rostock.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <20000816122231.A21914@hokkaido.informatik.uni-rostock.de>; from Jan Echternach on Wed, Aug 16, 2000 at 12:23:00PM +0200 Date: Wed, 16 Aug 2000 12:26:50 +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Aug 16, 2000 at 12:23:00PM +0200, Jan Echternach wrote: > On Tue, Aug 15, 2000 at 10:46:34PM +0200, Andi Kleen wrote: > > On Tue, Aug 15, 2000 at 06:25:29PM +0200, Jan Echternach wrote: > > > On Tue, Aug 15, 2000 at 06:18:12PM +0200, Andi Kleen wrote: > > > But there are absolutely no alignement problems with single-part > > > messages. Actually, there are even fewer alignment problems without > > > > There is between the header and the payload. > > I'm sorry that I didn't represent the issue clearly. I understand that > user space code should also use NLMSG_* if the kernel uses it. > > Ipchains in Linux-2.2 uses a simple structure without padding on the > NETLINK_FIREWALL socket. But netfilter in 2.4 puts a nlmsghdr before a > similar structure, even though it doesn't use its multipart feature. > The new ULOG target on the new NETLINK_NFLOG type socket also uses > NLMSG_* without ever producing multipart messages. There seems to be a > strong tendency among the netfilter developers to wrap everything in a > nlmsghdr structure. nlmsghdr has other uses than just handling multipart messages. For examples it gives you a sequence number so that you can try to detect lost packets and a way to request acks. netlink sockets require a nlmsghdr > > It's not too late to change the datagram formats on NETLINK_NFLOG. I > think NLMSG_* only makes the data larger and the code more complex and > less readable. There is no inherent padding in the data here, there > wouldn't be any alignment problems in the first place if NLMSG_* wasn't > used. > > Should NLMSG_* be used on the new NETLINK_NFLOG socket? Definitely. -Andi From owner-netdev@oss.sgi.com Wed Aug 16 03:58:10 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 03:58:00 -0700 Received: from jil.informatik.uni-rostock.de ([139.30.5.243]:39826 "EHLO jil.informatik.uni-rostock.de") by oss.sgi.com with ESMTP id ; Wed, 16 Aug 2000 03:57:47 -0700 Received: from hokkaido.informatik.uni-rostock.de (echter@hokkaido [139.30.1.235]) by jil.informatik.uni-rostock.de (8.9.3/8.9.3/relay3.3) with ESMTP id MAA29460; Wed, 16 Aug 2000 12:57:39 +0200 (MET DST) Received: (from echter@localhost) by hokkaido.informatik.uni-rostock.de (8.8.5/8.8.5/fin2.0) id MAA22885; Wed, 16 Aug 2000 12:57:38 +0200 (MET DST) Date: Wed, 16 Aug 2000 12:57:38 +0200 From: Jan Echternach To: Andi Kleen Cc: netfilter@samba.org, netdev@oss.sgi.com Subject: Re: NLMSG_* macros (was: Re: ULOG comments) Message-ID: <20000816125738.B21914@hokkaido.informatik.uni-rostock.de> Reply-To: Jan Echternach Mail-Followup-To: Andi Kleen , netfilter@samba.org, netdev@oss.sgi.com References: <20000811162634.A3814@hokkaido.informatik.uni-rostock.de> <20000815175225.B26543@hokkaido.informatik.uni-rostock.de> <20000815181812.A5358@fred.muc.de> <20000815182512.D26543@hokkaido.informatik.uni-rostock.de> <20000815224633.53823@colin.muc.de> <20000816122231.A21914@hokkaido.informatik.uni-rostock.de> <20000816122650.48683@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20000816122650.48683@colin.muc.de>; from ak@muc.de on Wed, Aug 16, 2000 at 12:26:50PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Aug 16, 2000 at 12:26:50PM +0200, Andi Kleen wrote: > nlmsghdr has other uses than just handling multipart messages. For > examples it gives you a sequence number so that you can try to detect > lost packets and a way to request acks. > netlink sockets require a nlmsghdr NETLINK_NFLOG packets originate in the kernel. Packet loss is signaled by ENOBUFS, and resending of packets is impossible unless the kernel is willing to eat huge amounts of memory before finally stopping all networking whenever an application decides to stop reading from the socket. > > Should NLMSG_* be used on the new NETLINK_NFLOG socket? > > Definitely. Ok, I'll use NLMSG_* for my iptables targets, because I'm out of arguments against it, even though I still don't see how any feature of nlmsghdr could be used by NETFILTER_NFLOG that isn't provided by a plain C structure in a simpler way. -- Jan From owner-netdev@oss.sgi.com Wed Aug 16 08:34:40 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 08:34:31 -0700 Received: from jil.informatik.uni-rostock.de ([139.30.5.243]:17585 "EHLO jil.informatik.uni-rostock.de") by oss.sgi.com with ESMTP id ; Wed, 16 Aug 2000 08:34:22 -0700 Received: from hokkaido.informatik.uni-rostock.de (echter@hokkaido [139.30.1.235]) by jil.informatik.uni-rostock.de (8.9.3/8.9.3/relay3.3) with ESMTP id RAA02395 for ; Wed, 16 Aug 2000 17:34:17 +0200 (MET DST) Received: (from echter@localhost) by hokkaido.informatik.uni-rostock.de (8.8.5/8.8.5/fin2.0) id RAA27507 for netdev@oss.sgi.com; Wed, 16 Aug 2000 17:34:16 +0200 (MET DST) Date: Wed, 16 Aug 2000 17:34:16 +0200 From: Jan Echternach To: netdev@oss.sgi.com Subject: protinfo.af_netlink->state Message-ID: <20000816173416.A27392@hokkaido.informatik.uni-rostock.de> Reply-To: Jan Echternach Mail-Followup-To: netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I'm not sure what the meaning of bit 0 in sk->protinfo.af_netlink->state is, but netlink_overrun() sets this bit, whereas netlink_set_error() does not. This might be a bug. -- Jan From owner-netdev@oss.sgi.com Wed Aug 16 17:01:45 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 17:01:25 -0700 Received: from dsl-64-34-34-9.telocity.com ([64.34.34.9]:4870 "HELO slinky.jounce.net") by oss.sgi.com with SMTP id ; Wed, 16 Aug 2000 17:01:00 -0700 Received: (qmail 10958 invoked by uid 1000); 16 Aug 2000 18:34:16 -0000 From: jackw@slinky.jounce.net Date: Wed, 16 Aug 2000 14:34:16 -0400 To: netdev@oss.sgi.com Subject: NE2K-PCI driver bug report Message-ID: <20000816143416.A10944@jounce.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Bug Report Summary: NE2K-PCI driver causes downloads to timeout on certain files at certain spots repeatedly. Description: When downloading certain files, seemingly independant of the protocol, client, or server, the transfer will timeout on the exact same byte each time. The point at which this happens varies from file to file, and there is no consistency in the files that I can manage to find. I thought this was a problem with hardware until two other users asked about the same problem in the #debian IRC channel on openprojects.net. This did not occur when using an ISA NE2K card. Keywords: Download, Timeout, NE2K-PCI No OOPS message applicable One file that this has been problematic on is ftp://download.stormix.com/storm/iso/split/sl20.l, as it freezes at 20735464 bytes of 48234496. Environment: ver_linux output: >-- Versions installed: (if some fields are empty or looks >-- unusual then possibly you have very old versions) >Linux slinky 2.4.0-test5 #3 Mon Jul 31 01:10:06 EDT 2000 i686 unknown >Kernel modules 2.3.14 >Gnu C 2.95.2 >Binutils 2.10.0.18 >Linux C Library .. >Dynamic Linker (ld.so) 1.9.11 >ls: /usr/lib/libg++.so: No such file or directory >Procps 2.0.6 >Mount 2.10n >Net-tools (1999-04-20) >Kbd 0.99 >Sh-utils 2.0i >Sh-utils Parker. >Sh-utils >Sh-utils Inc. >Sh-utils NO >Sh-utils PURPOSE. cat /proc/cpuinfo: >processor : 0 >vendor_id : CyrixInstead >cpu family : 6 >model : 1 >model name : 6x86MX 2.5x Core/Bus Clock >stepping : 4 >cpu MHz : 167.048741 >fdiv_bug : no >hlt_bug : no >sep_bug : no >f00f_bug : no >coma_bug : yes >fpu : yes >fpu_exception : yes >cpuid level : 1 >wp : yes >flags : fpu de tsc msr cx8 mtrr pge cmov mmx >bogomips : 333.41 cat /proc/modules >no modules used cat /proc/ioports >0000-001f : dma1 >0020-003f : pic1 >0040-005f : timer >0060-006f : keyboard >0080-008f : dma page reg >00a0-00bf : pic2 >00c0-00df : dma2 >00f0-00ff : fpu >02f8-02ff : serial(set) >03c0-03df : vga+ >03f8-03ff : serial(set) >0cf8-0cff : PCI conf1 >5800-583f : Intel Corporation 82371AB PIIX4 ACPI >5c00-5c1f : Intel Corporation 82371AB PIIX4 ACPI >6000-601f : Intel Corporation 82371AB PIIX4 USB >6400-641f : Winbond Electronics Corp W89C940 > 6400-641f : ne2k-pci >6800-68ff : Advanced System Products, Inc ABP940-U / ABP960-U > 6800-680f : advansys >6c00-6c1f : Realtek Semiconductor Co., Ltd. RTL-8029(AS) > 6c00-6c1f : NE2000 >f000-f00f : Intel Corporation 82371AB PIIX4 IDE cat /proc/iomem >00000000-0009fbff : System RAM >0009fc00-0009ffff : System RAM >000a0000-000bffff : Video RAM area >000c0000-000c7fff : Video ROM >000c8000-000cafff : Extension ROM >000f0000-000fffff : System ROM >00100000-07ffffff : System RAM > 00100000-00289a7f : Kernel code > 00289a80-002a6723 : Kernel data >e0000000-e01fffff : Trident Microsystems TGUI 9440 >e0200000-e020ffff : Trident Microsystems TGUI 9440 >e0210000-e02100ff : Advanced System Products, Inc ABP940-U / ABP960-U >ffff0000-ffffffff : reserved cat /proc/pci >PCI devices found: > Bus 0, device 0, function 0: > Host bridge: Intel Corporation 430TX - 82439TX MTXC (rev 1). > Master Capable. Latency=32. > Bus 0, device 1, function 0: > ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 1). > Bus 0, device 1, function 1: > IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 1). > Master Capable. Latency=80. > I/O at 0xf000 [0xf00f]. > Bus 0, device 1, function 2: > USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 1). > Master Capable. Latency=80. > I/O at 0x6000 [0x601f]. > Bus 0, device 1, function 3: > Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 1). > Bus 0, device 9, function 0: > Ethernet controller: Winbond Electronics Corp W89C940 (rev 0). > IRQ 9. > I/O at 0x6400 [0x641f]. > Bus 0, device 10, function 0: > SCSI storage controller: Advanced System Products, Inc ABP940-U / ABP960-U (rev 3). > IRQ 10. > Master Capable. Latency=80. Min Gnt=4.Max Lat=4. > I/O at 0x6800 [0x68ff]. > Non-prefetchable 32 bit memory at 0xe0210000 [0xe02100ff]. > Bus 0, device 11, function 0: > Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS) (rev 0). > IRQ 11. > I/O at 0x6c00 [0x6c1f]. > Bus 0, device 12, function 0: > VGA compatible controller: Trident Microsystems TGUI 9440 (rev 227). > Non-prefetchable 32 bit memory at 0xe0000000 [0xe01fffff]. > Non-prefetchable 32 bit memory at 0xe0200000 [0xe020ffff]. cat /proc/scsi/scsi >Attached devices: >Host: scsi0 Channel: 00 Id: 03 Lun: 00 > Vendor: COMPAQPC Model: WDE4360 Rev: 1.52 > Type: Direct-Access ANSI SCSI revision: 02 Other Notes: Files are not able to be resumed, although the server supports resumes, the download stalls immediately upon the connection opening. It occurrs through masqueraded systems running various other operating systems as well. This problem has existed since running every kernel since 2.2.12 or so. It started when I purchased a new net card when I got my DSL, and I never had these problems when using my old dedicated dialup. Thanks for taking a look at this, feedback is appreciated, -Whafro -- M. Jackson Wilkinson Phone: 877-832-9021 Cell: 215-919-1513 From owner-netdev@oss.sgi.com Wed Aug 16 21:04:16 2000 Received: by oss.sgi.com id ; Wed, 16 Aug 2000 21:04:06 -0700 Received: from p209.stsn.com ([63.161.206.209]:12416 "EHLO linux.kernel.dk") by oss.sgi.com with ESMTP id ; Wed, 16 Aug 2000 21:03:53 -0700 Received: (from tmathiasen@localhost) by linux.kernel.dk (8.9.3/8.9.3) id GAA00912; Thu, 17 Aug 2000 06:00:49 -0400 Date: Thu, 17 Aug 2000 06:00:49 -0400 From: Torben Mathiasen To: jackw@slinky.jounce.net Cc: netdev@oss.sgi.com Subject: Re: NE2K-PCI driver bug report Message-ID: <20000817060049.C780@w-tmathiasen> References: <20000816143416.A10944@jounce.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.2i In-Reply-To: <20000816143416.A10944@jounce.net>; from jackw@slinky.jounce.net on Wed, Aug 16, 2000 at 02:34:16PM -0400 X-OS: Linux 2.4.0-test5 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Aug 16 2000, jackw@slinky.jounce.net wrote: > Bug Report > > Summary: NE2K-PCI driver causes downloads to timeout on certain files > at certain spots repeatedly. > > Description: When downloading certain files, seemingly independant > of the protocol, client, or server, the transfer will timeout on > the exact same byte each time. The point at which this happens > varies from file to file, and there is no consistency in the files > that I can manage to find. I thought this was a problem with > hardware until two other users asked about the same problem in the > #debian IRC channel on openprojects.net. This did not occur when > using an ISA NE2K card. > > Keywords: Download, Timeout, NE2K-PCI > Does an interface down/up fix it? What happens if you copy one file to another name. Does it then still happen in the exact same byte? Or if you make two files with the exact same size but different content. Very odd problem indeed. Regards, Torben Mathiasen From owner-netdev@oss.sgi.com Thu Aug 17 00:26:28 2000 Received: by oss.sgi.com id ; Thu, 17 Aug 2000 00:26:18 -0700 Received: from sirppi.helsinki.fi ([128.214.205.27]:27918 "EHLO sirppi.helsinki.fi") by oss.sgi.com with ESMTP id ; Thu, 17 Aug 2000 00:26:10 -0700 Received: from localhost (amlaukka@localhost) by sirppi.helsinki.fi (8.10.1/8.10.1) with ESMTP id e7H7PrM14224; Thu, 17 Aug 2000 10:25:54 +0300 (EET DST) X-Authentication-Warning: sirppi.helsinki.fi: amlaukka owned process doing -bs Date: Thu, 17 Aug 2000 10:25:53 +0300 (EET DST) From: Aki M Laukkanen To: jackw@slinky.jounce.net cc: netdev@oss.sgi.com Subject: Re: NE2K-PCI driver bug report In-Reply-To: <20000816143416.A10944@jounce.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, 16 Aug 2000 jackw@slinky.jounce.net wrote: > One file that this has been problematic on is > ftp://download.stormix.com/storm/iso/split/sl20.l, as it freezes > at 20735464 bytes of 48234496. For the record, downloading this file didn't stall here with: 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS) Anyway, if it's repeatable maybe you should get a tcpdump out of it. It sounds _very_ improbable that your network card could cause such behaviour. -- D. From owner-netdev@oss.sgi.com Thu Aug 17 01:12:38 2000 Received: by oss.sgi.com id ; Thu, 17 Aug 2000 01:12:28 -0700 Received: from luna.tlmat.unican.es ([193.144.186.2]:15631 "EHLO luna.tlmat.unican.es") by oss.sgi.com with ESMTP id ; Thu, 17 Aug 2000 01:12:24 -0700 Received: from centauro (lira.tlmat.unican.es [193.144.186.27]) by luna.tlmat.unican.es with SMTP (8.7.6/8.7.1) id KAA17944 for ; Thu, 17 Aug 2000 10:28:44 +0200 (METDST) Message-ID: <001401c00823$2e279f00$1bba90c1@tlmat.unican.es> From: =?iso-8859-1?B?UmFt824gQWf8ZXJv?= To: Subject: __kfree_skb Date: Thu, 17 Aug 2000 10:14:34 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0011_01C00833.F1195F20" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. ------=_NextPart_000_0011_01C00833.F1195F20 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear all =20 I'm writing a little module (kernel 2.2.14) and I need to use = several sk_buffer's functions. All of them seem to work fine. But when I use kfree_skb and I = try to install the module, I get the message that generates my question... Could anybody tell me how can I use this function? Thank you all in advance... Ram=F3n =20 PD.- Please I wish to be personally CC'ed the answers/comments posted to = the list in response to my posting ------=_NextPart_000_0011_01C00833.F1195F20 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
    Dear all
 
    I'm writing a little module = (kernel 2.2.14)=20 and I need to use several sk_buffer's
functions. All of them seem to = work=20 fine. But when I use kfree_skb and I try
to install the module, I get = the=20 message  that generates my question...

    Could anybody tell me how can = I use=20 this function?

    Thank you all in=20 advance...

    Ram=F3n
 
PD.- Please I wish to be personally CC'ed the=20 answers/comments posted to the list in response to my=20 posting
------=_NextPart_000_0011_01C00833.F1195F20-- From owner-netdev@oss.sgi.com Thu Aug 17 06:37:31 2000 Received: by oss.sgi.com id ; Thu, 17 Aug 2000 06:37:12 -0700 Received: from cr502987-a.rchrd1.on.wave.home.com ([24.42.206.69]:20999 "EHLO localhost.localdomain") by oss.sgi.com with ESMTP id ; Thu, 17 Aug 2000 06:36:46 -0700 Received: from localhost (bart@localhost) by localhost.localdomain (8.9.3/8.8.7) with ESMTP id JAA08959; Thu, 17 Aug 2000 09:35:10 -0400 Date: Thu, 17 Aug 2000 09:35:08 -0400 (EDT) From: Bart Trojanowski X-Sender: bart@localhost.localdomain To: Richard Guy Briggs cc: Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list , John Gilmore , Hugh Daniel , Henry Spencer , Hugh Redelmeier , Richard Guy Briggs Subject: Re: linux-ipsec: FreeS/WAN redesign thoughts (KLIPS, IPSEC) In-Reply-To: <20000815143539.B4771@grendel.conscoop.ottawa.on.ca> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 15 Aug 2000, Richard Guy Briggs wrote: > > The way that nfmark is used is rather vague. It is presently only 32 > bits. Ideally, I would like to be able to indicate exactly which SAs > were processed on the way in, which would most easily be represented by > as many as 4 SAs (AH, ESP, IPCOMP, IPIP), each having an 8 bit protocol > field (absolute minimum of a 2-bits), 32-bit destination address field > (for IPv4, IPv6 would be 128) and a 32-bit SPI. This is a potential > maximum of 672 bits. A way of mapping 672 bits on to the 32 bits > available would be required to use this. A lookup table could be used > to map nfmarks to SAIDs, not the SAs themselves, since the SAs could > disappear at any time the tdb table is not locked. It should be able to > represent a bundle of SAs where one SA could be used in more than one > bundle. There could also be more than one right answer for the incoming > SPDB. I don't have a clear understanding on why a packet would need to know which SAs where used. Was this because we want to check if a packet is allowed to emerge from a certain tunnel? > The SPDB would be managed via a combination of PF_KEYv2 socket I/F > extensions and iptables. A separate NetFilter table called 'ipsec' > (as opposed to 'filter' or 'nat') would have the first hook at > NF_IP_PRE_ROUTING and the last hook at NF_IP_POST_ROUTING. iptables > uses the AF_NETLINK socket family. With respect to backwards compatibility in kernels and the ipchains -> iptables issue. You mentioned using iptables hooks for this. I think that backwards compatibility is important so I would like to explore this a bit to see if it would be possible to have a source base that compiled on 2.2 and 2.4 without any hacks to the kernel source. Ipchains does not have provisions for a new table - thus sharing the 1 table with other chains (input, output and forward) is needed. I assume that having a disjoint chain in ipchains is possible so this is not really an issue... even though it's much cleaner to have your own table as in iptables. At OLS Andy, I think it was him anyway, commented on the posibility of backporting some of the new networking code from 2.4 to 2.2. Does anyone remember if this included iptabless? > I'm not certain exactly where a packet routed through an optional IPSEC > virtual I/F gets injected into the system. I believe this is done using nf_reinject(); this function allows you to decide what to do with the packet upon reinjection. I know that you can tell it to DROP, ACCEPT, and REPEAT. It is probably the later that we will want to do further routing checks and so on. I don't know how this translates to ipchains. > Treat incoming IPSEC encapsulation as an enhancement of the layer 2 > protocol and decapsulate it at the NF_IP_PRE_ROUTING hook. This option > is less favourable as it stands since it involves creating our own SPDB > engine. I think that reusing the existing tools is a good idea. At first glance I like the first scenario more. Bart. -- WebSig: http://www.jukie.net/~bart/sig/ From owner-netdev@oss.sgi.com Thu Aug 17 10:02:42 2000 Received: by oss.sgi.com id ; Thu, 17 Aug 2000 10:02:32 -0700 Received: from cpu2747.adsl.bellglobal.com ([207.236.55.216]:49403 "EHLO grendel.conscoop.ottawa.on.ca") by oss.sgi.com with ESMTP id ; Thu, 17 Aug 2000 10:02:07 -0700 Received: (from rgb@localhost) by grendel.conscoop.ottawa.on.ca (8.9.0/8.9.0) id JAA21175; Thu, 17 Aug 2000 09:24:16 -0400 Date: Thu, 17 Aug 2000 09:24:16 -0400 From: Richard Guy Briggs To: Bart Trojanowski Cc: Richard Guy Briggs , Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list , John Gilmore , Hugh Daniel , Henry Spencer , Hugh Redelmeier Subject: Re: linux-ipsec: FreeS/WAN redesign thoughts (KLIPS, IPSEC) Message-ID: <20000817092416.P15043@grendel.conscoop.ottawa.on.ca> References: <20000815143539.B4771@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="H7cT1SUwsqXggVRO" Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: ; from bart@jukie.net on Thu, Aug 17, 2000 at 09:35:08AM -0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing --H7cT1SUwsqXggVRO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Aug 17, 2000 at 09:35:08AM -0400, Bart Trojanowski wrote: > On Tue, 15 Aug 2000, Richard Guy Briggs wrote: > >=20 > > The way that nfmark is used is rather vague. It is presently only 32 > > bits. Ideally, I would like to be able to indicate exactly which SAs > > were processed on the way in, which would most easily be represented by > > as many as 4 SAs (AH, ESP, IPCOMP, IPIP), each having an 8 bit protocol > > field (absolute minimum of a 2-bits), 32-bit destination address field > > (for IPv4, IPv6 would be 128) and a 32-bit SPI. This is a potential > > maximum of 672 bits. A way of mapping 672 bits on to the 32 bits > > available would be required to use this. A lookup table could be used > > to map nfmarks to SAIDs, not the SAs themselves, since the SAs could > > disappear at any time the tdb table is not locked. It should be able to > > represent a bundle of SAs where one SA could be used in more than one > > bundle. There could also be more than one right answer for the incoming > > SPDB. >=20 > I don't have a clear understanding on why a packet would need to know > which SAs where used. Was this because we want to check if a packet is > allowed to emerge from a certain tunnel? We must be able to ensure that a certain policy was followed in sending the packet to this machine. If it was sent in cleartext from a spoofed machine, but policy dictates that it was expected to be 3DES MD5 processed from a certain SG, we *must not* trust that packet and we *must not* reply to it. > > The SPDB would be managed via a combination of PF_KEYv2 socket I/F > > extensions and iptables. A separate NetFilter table called 'ipsec' > > (as opposed to 'filter' or 'nat') would have the first hook at > > NF_IP_PRE_ROUTING and the last hook at NF_IP_POST_ROUTING. iptables > > uses the AF_NETLINK socket family. >=20 > With respect to backwards compatibility in kernels and the ipchains -> > iptables issue. You mentioned using iptables hooks for this. I think > that backwards compatibility is important so I would like to explore this > a bit to see if it would be possible to have a source base that compiled > on 2.2 and 2.4 without any hacks to the kernel source. I believe NetFilter has been backported to 2.2? Can someone confirm this? > Ipchains does not have provisions for a new table - thus sharing the 1 > table with other chains (input, output and forward) is needed. I assume > that having a disjoint chain in ipchains is possible so this is not really > an issue... even though it's much cleaner to have your own table as in > iptables. Possibly. I wondered this myself. I did contemplate using the 'nat' table to accomplish our goals, but decided it would be cleaner to use our own table that would have a different priority so that IPSEC would always be applied before DNAT on input and after SNAT on output. > At OLS Andy, I think it was him anyway, commented on the posibility of > backporting some of the new networking code from 2.4 to 2.2. Does anyone > remember if this included iptabless? I don't remember, but he did opine in the last couple of days in an informal channel that all of the interesting network stuff had been backported from 2.4 to 2.2 and this would annoy the 2.4 spindoctors. > > I'm not certain exactly where a packet routed through an optional IPSEC > > virtual I/F gets injected into the system. >=20 > I believe this is done using nf_reinject(); this function allows you to > decide what to do with the packet upon reinjection. I know that you can > tell it to DROP, ACCEPT, and REPEAT. It is probably the later that we > will want to do further routing checks and so on. Right, possibly to NF_IP_LOCAL_OUT > I don't know how this translates to ipchains. It would have to be routed, so I would hazzard to guess input chain. > > Treat incoming IPSEC encapsulation as an enhancement of the layer 2 > > protocol and decapsulate it at the NF_IP_PRE_ROUTING hook. This option > > is less favourable as it stands since it involves creating our own SPDB > > engine. >=20 > I think that reusing the existing tools is a good idea. At first glance I > like the first scenario more. I tend to agree, but don't have a solution yet to the NFMARK->SPDB mapping. > Bart. Thanks for your comments. slainte mhath, RGB --=20 Richard Guy Briggs -- PGP key available Auto-Free Ottawa! Canada Prevent Internet Wiretapping! -- FreeS/WAN: Thanks for voting Green! -- Marillion: --H7cT1SUwsqXggVRO Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: 2.6.3i iQCVAwUBOZvnft+sBuIhFagtAQGMEgQAocEaT44gxxRulrtLm9Yspzf5O0uBVB+6 j2i19fCWMT62Nd0whSMykM3lzSMdDHShhlvN9ASopIjFztfWaSfYjUY+ldEbFcjb G6dfYdSl/LY/O5+g/PkTJ/Yy1CeN1vgtSe+EWoUG21FsSWInToaLu3WEpIjaAf1J 8YWjncKVhWU= =D74E -----END PGP SIGNATURE----- --H7cT1SUwsqXggVRO-- From owner-netdev@oss.sgi.com Thu Aug 17 13:43:43 2000 Received: by oss.sgi.com id ; Thu, 17 Aug 2000 13:43:22 -0700 Received: from mail.inconnect.com ([209.140.64.7]:53227 "HELO mail.inconnect.com") by oss.sgi.com with SMTP id ; Thu, 17 Aug 2000 13:43:02 -0700 Received: (qmail 12086 invoked from network); 17 Aug 2000 20:43:01 -0000 Received: from ultra1.inconnect.com (209.140.64.2) by mail with SMTP; 17 Aug 2000 20:43:01 -0000 Date: Thu, 17 Aug 2000 14:43:01 -0600 (MDT) From: Keyshaun X-Sender: kruger@ultra1.inconnect.com To: Linux Netdev Subject: 2.4.0-test6 modules Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I am compiling modules for my kernel and I keep getting the same warning. {standard input}: Assembler messages {standard input}:8: Warning: Ignoring changed section attributes for .modinfo I got the latest binutils and installed it yesterday. Has anyone else been getting this in their compile? Shaun From owner-netdev@oss.sgi.com Thu Aug 17 15:24:34 2000 Received: by oss.sgi.com id ; Thu, 17 Aug 2000 15:24:24 -0700 Received: from lsi.lsil.com ([147.145.40.2]:60061 "EHLO lsi.lsil.com") by oss.sgi.com with ESMTP id ; Thu, 17 Aug 2000 15:24:04 -0700 Received: from mhbs.lsil.com ([147.145.31.100]) by lsi.lsil.com (8.9.3+Sun/8.9.1) with ESMTP id PAA21440 for ; Thu, 17 Aug 2000 15:24:03 -0700 (PDT) Received: from inca.co.lsil.com by mhbs.lsil.com with ESMTP for netdev@oss.sgi.com; Thu, 17 Aug 2000 15:23:55 -0700 Received: from exw-kansas.ks.lsil.com (exw-kansas.ks.lsil.com [153.79.8.7]) by inca.co.lsil.com (8.9.3/8.9.3) with ESMTP id QAA24111; Thu, 17 Aug 2000 16:23:52 -0600 (MDT) Received: from lsil.com (nromernt.ks.lsil.com [153.79.8.107]) by exw-kansas.ks.lsil.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id Q199F35X; Thu, 17 Aug 2000 17:22:49 -0500 Message-Id: <399C65B4.4388B0C6@lsil.com> Date: Thu, 17 Aug 2000 17:22:44 -0500 From: Noah Romer Reply-To: noah.romer@lsil.com Organization: LSI Logic X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: "Romer, Noah" Subject: freeing an skb still on a list? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I'm working on a network driver for a fibre channel host adapter (the LSI Logic FC4909, or whatever the marketing department decided to call it). At various points in my testing, I've managed to kill the system by calling dev_kfree_skb_irq on a Tx skb. The scenario goes like this: I'm always at a point in my testing where I'm hammering out packets very quickly (i.e. `ping -f`, or telneting into the other system, running vi and holding down an arrow key for a few seconds), at some point in the test (when is rather unpredictable), I'll get a packet to transmit, send it off to the host adapter and, when the host adapter tells me it's done with it, I call dev_kfree_skb_irq with a pointer to the skb. I'm not doing anything different with it than I've done with several thousand, sometimes millions, of packets before it. The console output is: Warning: kfree_skb passed and skb still on a list (from c01c4392). kernel BUG at skbuff.c:276! Entering kdb (0xc0d00000) Panic: invalid operand due to panic @ 0xc01c225d A `bt` in the kernel debugger shows: __kfree_skb net_tx_action do_softirq do_IRQ ret_from_intr I'm currently running 2.4.0-test5 with the kdb patches from oss.sgi.com (ftp://oss.sgi.com/www/projects/kdb/download/ix86/kdb-v1.3-2.4.0-test5-pre5.gz), although I've seen this since at least 2.3.99-pre6 (I've just now gotten enough other bugs chased down to pay attention to this one). Any ideas as to why kfree_skb thinks that the skb is still on a list? The only thing I do with Tx packets, besides send them, is to add on a FC Optional header to the start of the packet. Thanks, Noah Romer P.S. The driver code has not yet been released, although a few people outside the company have seen it in various stages. If it would help, I can post the relevant sections of code. P.P.S. I'm not on this mailing list, so if whomever (if anyone) replies, I would greatly appreciate it if you could CC me. From owner-netdev@oss.sgi.com Fri Aug 18 01:08:46 2000 Received: by oss.sgi.com id ; Fri, 18 Aug 2000 01:08:26 -0700 Received: from [202.102.223.33] ([202.102.223.33]:61293 "EHLO mx1.ustc.edu.cn") by oss.sgi.com with ESMTP id ; Fri, 18 Aug 2000 01:08:10 -0700 Received: from ustc.edu.cn (hpe25.nic.ustc.edu.cn [202.38.64.1]) by mx1.ustc.edu.cn (8.8.7/8.8.6) with SMTP id QAA23818 for ; Fri, 18 Aug 2000 16:41:59 -0800 Received: from mail.ustc.edu.cn by ustc.edu.cn with SMTP (8.6.10/16.2) id QAA00212; Fri, 18 Aug 2000 16:14:38 +0800 Received: (qmail 15094 invoked by uid 2746); 18 Aug 2000 08:02:50 -0000 Date: Fri, 18 Aug 2000 16:02:50 +0800 (CST) From: YaNan Guo To: Richard Guy Briggs cc: Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list , John Gilmore , Hugh Daniel , Henry Spencer , Hugh Redelmeier , Richard Guy Briggs Subject: Re: FreeS/WAN redesign thoughts (KLIPS, IPSEC) In-Reply-To: <20000815143539.B4771@grendel.conscoop.ottawa.on.ca> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I want to know when the FreeSwan really support IPv6? Thanks. From owner-netdev@oss.sgi.com Fri Aug 18 06:53:32 2000 Received: by oss.sgi.com id ; Fri, 18 Aug 2000 06:53:22 -0700 Received: from [168.188.44.2] ([168.188.44.2]:45462 "EHLO flower.comeng.chungnam.ac.kr") by oss.sgi.com with ESMTP id ; Fri, 18 Aug 2000 06:53:00 -0700 Received: from Beethoven (Beethoven.comeng.chungnam.ac.kr [168.188.46.199]) by flower.comeng.chungnam.ac.kr (8.9.1/8.9.1) with SMTP id WAA09691 for ; Fri, 18 Aug 2000 22:47:56 +0900 (KST) Message-ID: <001201c0091b$91a43a40$c72ebca8@ce.cnu.ac.kr> From: "Park, Hyun Seo" To: Subject: Hello! I have a Question ~ Date: Fri, 18 Aug 2000 22:52:36 +0900 MIME-Version: 1.0 Content-Type: text/plain; charset="ks_c_5601-1987" Content-Transfer-Encoding: base64 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6600 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6600 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing SGVsbG8uIEFsbCAuDQpBbSBJIGluIHJpZ2h0IHBsYWNlID8NCkkgaGVhcmQgdGhpcyBtYWlsIGFk ZHJlc3MgaXMgYSBtYWlsaW5nIGxpc3QgLi4uLiANCkhvdyBjYW4gSSBqb2luID8NCk5pY2UgdG8g bWVldCB5b3UgIX4NCg0KQW5kIEkgaGF2ZSBhIHF1ZXN0aW9uIGFib3V0IElQSVAgdHVubmVsIGFu ZCAibmV3X3R1bm5lbC5jIiBpbg0KL3Vzci9zcmMvfn4vZHJpdmVycy9uZXQvIC4uLg0KSSBhbSB1 c2luZyAibmV3X3R1bm5lbCIgZm9yIE1vYmlsZSBJUCAuLi4NCkkgd2FudCB0byBrbm93IGFib3V0 IH4NCg0KICAgIFdoZW4gYSBwYWNrZXQgaXMgdHVubmVsZWQgYnkgdHVubmVsIGludGVyZmFjZSwg ZS5xIHR1bmwwICwNCiAgICBJIHdhbnQgdG8gcHJvY2VzcyB0aGF0IHBhY2tldCAuLi4NCiAgICBG b3IgZXhhbXBsZSwgSSB3YW50IHRvIGNvdW50IHRoZSBudW1iZXIgb2YgcGFja2V0cyB0dW5uZWxl ZCBieSB0dW5sMA0KICAgIHdpdGggbGVzcyBvdmVyaGVhZCAuLi4uDQogICAgSSBtZWFuIGJ5ICJs ZXNzIG92ZXJoZWFkIiwNCiAgICAgICAgIkkgZG8gbm90IHdhbnQgdG8gYWx3YXlzIGNoZWNrIHR1 bmwwIHRvIGRvIGl0IG9yDQogICAgICAgIEkgZG8gbm90IHdhbnQgdG8gY2hlY2sgYWxsIHBhY2tl dHMgcmVjZWl2ZWQgYnkgbXkgaG9zdCBhbmQNCiAgICAgICAgc2VlIHRocm91Z2ggSVAgaGVhZGVy IG9mIHRob3NlIHBhY2tldHMgZm9yIElQIGFkZHJlc3Mgb2YgZGVzdGluYXRpb24NCi4uLi4iDQoN CiAgICBJcyB0aGVyZSBhbnkgcG9zc2libGUgd2F5ID8NCg0KVGhhbmtzIGluIGFkdmFuY2UgLi4u DQpIYXZlIGEgR3JlYXQgRGF5IH4NCg0KDQotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLQ0KUGFyaywgSHl1biBTZW8gDQpHcmFkdWF0ZSBTdHVkZW50DQpEZXBhcnRtZW50IG9m IENvbXB1dGVyIEVuZ2luZWVyaW5nLCBDaHVuZ25hbSBOYXRpb25hbCBVbml2ZXJzaXR5DQoyMjAg R3VuZy1kb25nLFl1U2VvbmctR3UsIFRhZWpvbiwgMzA1NzY0LCBLb3JlYQ0KRS1tYWlsOiBoc3Bh cmtAY2UuY251LmFjLmtyDQpJQ1EgIyA6IDI4MDU1MDkzICAgDQpQaG9uZSA6ICs4Mi00Mi04MjMt NjA0OQ0KDQo= From owner-netdev@oss.sgi.com Sat Aug 19 07:28:26 2000 Received: by oss.sgi.com id ; Sat, 19 Aug 2000 07:28:16 -0700 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:55969 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Sat, 19 Aug 2000 07:27:57 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Sat, 19 Aug 2000 06:00:54 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id QYZF72F4; Sat, 19 Aug 2000 06:04:25 -0500 Received: from uow.edu.au (47.181.194.64 [47.181.194.64]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id RBSBA8K7; Sat, 19 Aug 2000 21:04:25 +1000 Message-ID: <399E6A64.CB616120@uow.edu.au> Date: Sat, 19 Aug 2000 21:07:16 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" , Alexey Kuznetosv CC: "netdev@oss.sgi.com" Subject: [patch] IP_FRAG_TIME versus unregister_netdevice Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing There's a piece of code at the end of unregister_netdevice() which waits for up to ten seconds waiting for the device to become un-busy. If the timeout is expired it prints the "Wait for crash" message and returns. The problem is, orphan ipv4 fragments have a reference to the device and they have an arbitrarily long in-kernel lifetime. Default is 30 seconds. So if you send an orphan fragment to your linux box and then do an ifdown/rmmod, you'll get the "Wait for crash" message. Then, typically, the net_device is kfree'ed and the driver module is unloaded. Then, at some time in the future, the fragments start expiring. This causes atomic_decs of dev->refcnt (it has been kfree'ed) and, possibly, a call to dev->destructor (its text has been unmapped). The kernel will crash. I see several ways to fix this: 1: Make unregister_netdevice hunt down all the skbuffs associated with this device and release them. Sounds hard. 2: Make the 10 second delay equal to the max of sysctl_ipfrag_time, sysctl_ip6frag_time, and who knows what else. This isn't attractive because it'll require us to be able to predict the future lifetime of all skbuffs for all protocols. 3: Make unregister_netdevice return -EAGAIN, so the caller (usually /sbin/rmmod) can make a policy decision what to do. Long term, this is the correct architecture. The problem with this is that the net drivers will then need to be taught that unregister_netdevice can fail. They will ALL need changing. It's not a big change, but frankly, if we're going to churn each and every driver then we may as well get it right. And at present the Linux netdevice lifecycle is far from right. The lack or userland access to [un]register_netdevice and the unhealthy linkage between sys_ins/del_module and [un]register_netdevice is the source of a lot of bugs, races and general bogosity. We need `ifconfig plumb' and `ifconfig unplumb', like Solaris. I'll write a rant about this sometime, but I suggest it's 2.5-food. 4: Make unregister_netdevice() wait until the device is free, as Alexey suggested in his comment. That's what this patch does. It waits indefinitely for the device to become free and prints a message every ten seconds while waiting. --- ../linux-2.4.0-test7-pre4/net/core/dev.c Wed Aug 16 23:54:29 2000 +++ net/core/dev.c Sat Aug 19 20:24:36 2000 @@ -58,6 +58,7 @@ * the backlog queue. * Paul Rusty Russell : SIOCSIFNAME * Pekka Riikonen : Netdev boot-time settings code + * Andrew Morton : Make unregister_netdevice wait indefinitely on dev->refcnt */ #include @@ -2311,7 +2312,7 @@ int unregister_netdevice(struct net_device *dev) { - unsigned long now; + unsigned long now, warning_time; struct net_device *d, **dp; /* If device is running, close it first. */ @@ -2379,31 +2380,30 @@ printk("unregister_netdevice: waiting %s refcnt=%d\n", dev->name, atomic_read(&dev->refcnt)); #endif - /* EXPLANATION. If dev->refcnt is not 1 now (1 is our own reference) - it means that someone in the kernel still has reference + /* EXPLANATION. If dev->refcnt is not now 1 (our own reference) + it means that someone in the kernel still has a reference to this device and we cannot release it. "New style" devices have destructors, hence we can return from this - function and destructor will do all the work later. + function and destructor will do all the work later. As of kernel 2.4.0 + there are very few "New Style" devices. - "Old style" devices expect that device is free of any references - upon exit from this function. WE CANNOT MAKE such release - without delay. Note that it is not new feature. Referencing devices - after they are released occured in 2.0 and 2.2. - Now we just can know about each fact of illegal usage. - - So, we linger for 10*HZ (it is an arbitrary number) + "Old style" devices expect that the device is free of any references + upon exit from this function. + We cannot return from this function until all such references have + fallen away. This is because the caller of this function will probably + immediately kfree(*dev) and then be unloaded via sys_delete_module. + + So, we linger until all references fall away. The duration of the + linger is basically unbounded! It is driven by, for example, the + current setting of sysctl_ipfrag_time. After 1 second, we start to rebroadcast unregister notifications in hope that careless clients will release the device. - If timeout expired, we have no choice how to cross fingers - and return. Real alternative would be block here forever - and we will make it eventually, when all peaceful citizens - will be notified and repaired. */ - now = jiffies; + now = warning_time = jiffies; while (atomic_read(&dev->refcnt) != 1) { if ((jiffies - now) > 1*HZ) { /* Rebroadcast unregister notification */ @@ -2412,12 +2412,13 @@ current->state = TASK_INTERRUPTIBLE; schedule_timeout(HZ/4); current->state = TASK_RUNNING; - if ((jiffies - now) > 10*HZ) - break; + if ((jiffies - warning_time) > 10*HZ) { + printk(KERN_EMERG "unregister_netdevice: waiting for %s to " + "become free. Usage count = %d\n", + dev->name, atomic_read(&dev->refcnt)); + warning_time = jiffies; + } } - - if (atomic_read(&dev->refcnt) != 1) - printk("unregister_netdevice: Old style device %s leaked(refcnt=%d). Wait for crash.\n", dev->name, atomic_read(&dev->refcnt)-1); dev_put(dev); return 0; } From owner-netdev@oss.sgi.com Sat Aug 19 08:08:15 2000 Received: by oss.sgi.com id ; Sat, 19 Aug 2000 08:08:05 -0700 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:37902 "EHLO www.linux.org.uk") by oss.sgi.com with ESMTP id ; Sat, 19 Aug 2000 08:07:47 -0700 Received: from prumpf by www.linux.org.uk with local (Exim 3.13 #1) id 13QAC9-0003lm-00; Sat, 19 Aug 2000 16:06:01 +0100 Date: Sat, 19 Aug 2000 16:06:01 +0100 From: Philipp Rumpf To: Andrew Morton Cc: "David S. Miller" , Alexey Kuznetosv , "netdev@oss.sgi.com" Subject: Re: [patch] IP_FRAG_TIME versus unregister_netdevice Message-ID: <20000819160601.I23855@parcelfarce.linux.theplanet.co.uk> References: <399E6A64.CB616120@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <399E6A64.CB616120@uow.edu.au>; from andrewm@uow.edu.au on Sat, Aug 19, 2000 at 09:07:16PM +1000 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, Aug 19, 2000 at 09:07:16PM +1000, Andrew Morton wrote: > 1: Make unregister_netdevice hunt down all the skbuffs associated > with this device and release them. Sounds hard. > > 2: Make the 10 second delay equal to the max of sysctl_ipfrag_time, > sysctl_ip6frag_time, and who knows what else. This isn't > attractive because it'll require us to be able to predict > the future lifetime of all skbuffs for all protocols. > > 3: Make unregister_netdevice return -EAGAIN, so the caller > (usually /sbin/rmmod) can make a policy decision what to do. > Long term, this is the correct architecture. > > The problem with this is that the net drivers will then > need to be taught that unregister_netdevice can fail. > They will ALL need changing. > > It's not a big change, but frankly, if we're going to > churn each and every driver then we may as well get > it right. And at present the Linux netdevice lifecycle is > far from right. The lack or userland access to > [un]register_netdevice and the unhealthy linkage > between sys_ins/del_module and [un]register_netdevice > is the source of a lot of bugs, races and general > bogosity. We need `ifconfig plumb' and `ifconfig > unplumb', like Solaris. I'll write a rant about this > sometime, but I suggest it's 2.5-food. > > 4: Make unregister_netdevice() wait until the device is > free, as Alexey suggested in his comment. > > That's what this patch does. It waits indefinitely for the > device to become free and prints a message every ten seconds > while waiting. 5: Use the refcount of the module containing the network device driver rather than a separate refcount. This is what most other subsystems do (it's also horrendously ugly, but it works with the current module architecture). It'd probably require changes to all network drivers as well, but they should be rather simple - basically equivalent to the owner field in struct file_operations. This would mean you can't rmmod a network device that is even remotely busy, but if I understand the issues involved correctly, you can ifconfig it down and wait before trying to rmmod it. Philipp From owner-netdev@oss.sgi.com Sat Aug 19 08:41:46 2000 Received: by oss.sgi.com id ; Sat, 19 Aug 2000 08:41:36 -0700 Received: from pizda.ninka.net ([216.101.162.242]:44931 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Sat, 19 Aug 2000 08:41:09 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id IAA11792; Sat, 19 Aug 2000 08:29:44 -0700 Date: Sat, 19 Aug 2000 08:29:44 -0700 Message-Id: <200008191529.IAA11792@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: andrewm@uow.edu.au CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com In-reply-to: <399E6A64.CB616120@uow.edu.au> (message from Andrew Morton on Sat, 19 Aug 2000 21:07:16 +1000) Subject: Re: [patch] IP_FRAG_TIME versus unregister_netdevice References: <399E6A64.CB616120@uow.edu.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Sat, 19 Aug 2000 21:07:16 +1000 From: Andrew Morton 4: Make unregister_netdevice() wait until the device is free, as Alexey suggested in his comment. That's what this patch does. It waits indefinitely for the device to become free and prints a message every ten seconds while waiting. Being so close to 2.4.0, this is the fix I prefer as well for now. Patch applied, thanks. We can do more elaborate things in the 2.5.x tree. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sat Aug 19 10:22:45 2000 Received: by oss.sgi.com id ; Sat, 19 Aug 2000 10:22:36 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:12302 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 19 Aug 2000 10:22:20 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA12612; Sat, 19 Aug 2000 21:20:47 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008191720.VAA12612@ms2.inr.ac.ru> Subject: Re: [patch] IP_FRAG_TIME versus unregister_netdevice To: andrewm@uow.edu.au (Andrew Morton) Date: Sat, 19 Aug 2000 21:20:47 +0400 (MSK DST) Cc: davem@redhat.com, netdev@oss.sgi.com In-Reply-To: <399E6A64.CB616120@uow.edu.au> from "Andrew Morton" at Aug 19, 0 09:07:16 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 2307 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > The problem is, orphan ipv4 fragments have a reference to > the device It must release this reference. I forgot to do this last step, of course, though all the necessary code to make this in self-consitent way was ready. Any code, queueing packet to process it at an unknown moment in future, should release reference to device or hook notifier. Protocols do this, defragmenter forgets, I apologise. > 3: Make unregister_netdevice return -EAGAIN, so the caller > (usually /sbin/rmmod) can make a policy decision what to do. ... > The problem with this is that the net drivers will then > need to be taught that unregister_netdevice can fail. No. unregister_netdevice() cannot fail. It is simply used in wrong context. All that it makes, is making device inaccessible for future references, this operation cannot fail exactly like unlink() or close() of a file cannot fail, because file is "busy". Any _reasonable_ object maintanance system has two closing routines: unlink() and put(). Otherwise it cannot be free of deadlocks, races etc. Seems, it is pretty evident. Waiting overloads and breaks this principle, combining close() with unlink() to some strange mix. It is due to evident reasons though. > They will ALL need changing. It is not true. Original ancient design of modules was _right_. It was broken in details, but the idea was right. It had no "destroy" routine, but assumption that most of modules need not any special destructor is fair assumption, working for modules, which do not allocate dynamic resources, required to be kept after close(). It was broken later by some unknown reasons and it is never too late to undo these quasi-improvements. BTW nothing in modules changed since that time, only some barroque details sort of MODULE_AUTHOR appeared. 8) > bogosity. We need `ifconfig plumb' and `ifconfig > unplumb', like Solaris. I'll write a rant about this > sometime, but I suggest it's 2.5-food. Yes. > 4: Make unregister_netdevice() wait until the device is > free, as Alexey suggested in his comment. > > That's what this patch does. It waits indefinitely for the > device to become free and prints a message every ten seconds > while waiting. Right. It should be made to date of release 2.4 in any case. Alexey From owner-netdev@oss.sgi.com Sun Aug 20 03:43:51 2000 Received: by oss.sgi.com id ; Sun, 20 Aug 2000 03:43:42 -0700 Received: from [203.126.247.144] ([203.126.247.144]:48594 "EHLO zsngs001") by oss.sgi.com with ESMTP id ; Sun, 20 Aug 2000 03:43:22 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by zsngs001; Sun, 20 Aug 2000 18:43:09 +0800 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id QY5447LD; Sun, 20 Aug 2000 18:43:10 +0800 Received: from uow.edu.au (47.181.194.161 [47.181.194.161]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id RBSBA8QS; Sun, 20 Aug 2000 20:43:12 +1000 Message-ID: <399FB6E7.EF866BC@uow.edu.au> Date: Sun, 20 Aug 2000 20:45:59 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: request_region and cardbus Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Could someone please explain how this works? I've been told that for cardbus devices I shouldn't check the return value from request_region because the I/O space has already been claimed for the Cardbus socket. As far as I can see I should: - check the request_region return value for non-Cardbus devices - ignore the request_region return value for Cardbus devices (but still do it because it gets registered in /proc/ioports) - only call release_region if the corresponding request_region call succeeded. Is this correct? From owner-netdev@oss.sgi.com Sun Aug 20 03:50:41 2000 Received: by oss.sgi.com id ; Sun, 20 Aug 2000 03:50:32 -0700 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:41181 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Sun, 20 Aug 2000 03:50:25 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Sun, 20 Aug 2000 05:45:50 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id QYZF7MRK; Sun, 20 Aug 2000 05:49:23 -0500 Received: from uow.edu.au (47.181.194.161 [47.181.194.161]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id RBSBA8Q4; Sun, 20 Aug 2000 20:49:24 +1000 Message-ID: <399FB860.635954F7@uow.edu.au> Date: Sun, 20 Aug 2000 20:52:16 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: [patch] IP_FRAG_TIME versus unregister_netdevice References: <399E6A64.CB616120@uow.edu.au> from "Andrew Morton" at Aug 19, 0 09:07:16 pm <200008191720.VAA12612@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > Hello! > > > The problem is, orphan ipv4 fragments have a reference to > > the device > > It must release this reference. I forgot to do this last step, of course, > though all the necessary code to make this in self-consitent way was ready. > > Any code, queueing packet to process it at an unknown moment in future, > should release reference to device or hook notifier. > Protocols do this, defragmenter forgets, I apologise. So apart from the defragmenter, all the notifiers are currently in place to hunt down all the skbuffs and release them when a NETDEV_UNREGISTER is broadcast? That's pretty damn impressive. It would be very nice to be able to finish this work off and to get rid of the sleep altogether. Are you saying that it's too big/too late to do this for 2.4? From owner-netdev@oss.sgi.com Sun Aug 20 04:05:31 2000 Received: by oss.sgi.com id ; Sun, 20 Aug 2000 04:05:12 -0700 Received: from [203.126.247.144] ([203.126.247.144]:33491 "EHLO zsngs001") by oss.sgi.com with ESMTP id ; Sun, 20 Aug 2000 04:04:52 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by zsngs001; Sun, 20 Aug 2000 18:59:25 +0800 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id QY5447L3; Sun, 20 Aug 2000 18:59:26 +0800 Received: from uow.edu.au (47.181.194.161 [47.181.194.161]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2652.39) id RBSBA8QW; Sun, 20 Aug 2000 20:59:28 +1000 Message-ID: <399FBABC.E42015BF@uow.edu.au> Date: Sun, 20 Aug 2000 21:02:20 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Philipp Rumpf CC: "netdev@oss.sgi.com" Subject: Re: [patch] IP_FRAG_TIME versus unregister_netdevice References: <399E6A64.CB616120@uow.edu.au>, <399E6A64.CB616120@uow.edu.au>; from andrewm@uow.edu.au on Sat, Aug 19, 2000 at 09:07:16PM +1000 <20000819160601.I23855@parcelfarce.linux.theplanet.co.uk> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Philipp Rumpf wrote: > > 5: Use the refcount of the module containing the network device driver rather > than a separate refcount. This is what most other subsystems do (it's also > horrendously ugly, but it works with the current module architecture). > > It'd probably require changes to all network drivers as well, but they should > be rather simple - basically equivalent to the owner field in struct > file_operations. > > This would mean you can't rmmod a network device that is even remotely busy, > but if I understand the issues involved correctly, you can ifconfig it down > and wait before trying to rmmod it. [ Fixed my column wrap this time :) ] This is not a module issue. It's an unregister_netdevice() issue. It's quite legitimate to call unregister_netdevice in a kernel which doesn't use modules at all. The registration and unregistration of netdevices really shouldn't be linked to module constructors and destructors. If we had `ifconfig plumb' then the only place where module refcounts need be altered is in [un]register_netdevice. Of course, requiring a separate plumb/unplumb operation would cause howls of anguish from networking script developers - the operation would probably have to be grafted into ifconfig or insmod/rmmod somehow for back-compatibility. One effect of the current setup is that /sbin/rmmod can alter your routing tables! This means that you can `ifdown' a device, then ten minutes later some hosts become unreachable because cron came along and reaped the module. Wierd. From owner-netdev@oss.sgi.com Sun Aug 20 09:11:34 2000 Received: by oss.sgi.com id ; Sun, 20 Aug 2000 09:11:13 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:27654 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 20 Aug 2000 09:10:42 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA13432; Sun, 20 Aug 2000 20:09:55 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008201609.UAA13432@ms2.inr.ac.ru> Subject: Re: [patch] IP_FRAG_TIME versus unregister_netdevice To: andrewm@uow.edu.au (Andrew Morton) Date: Sun, 20 Aug 2000 20:09:55 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <399FB860.635954F7@uow.edu.au> from "Andrew Morton" at Aug 20, 0 08:52:16 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 766 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > So apart from the defragmenter, all the notifiers are currently in place to hunt down all the skbuffs and release them when a NETDEV_UNREGISTER is broadcast? > > That's pretty damn impressive. No, of course. And this case with defragmenter asserts this. 8) Why did it broadcast that message? Exactly to catch misbehaving users. > It would be very nice to be able to finish this work off and to get rid of the sleep altogether. Are you saying that it's too big/too late to do this for 2.4? Sleeping is inavoidable if caller of netdev_unregister expects synchronous operation, even if all the clients do the best efforts. Clients may hold all the objects for reasonably short time, they are allowed to sleep holding refcounted object etc. etc. Alexey From owner-netdev@oss.sgi.com Sun Aug 20 10:52:25 2000 Received: by oss.sgi.com id ; Sun, 20 Aug 2000 10:52:15 -0700 Received: from titan.bieringer.de ([195.226.187.62]:49677 "HELO convert rfc822-to-8bit titan.bieringer.de") by oss.sgi.com with SMTP id ; Sun, 20 Aug 2000 10:51:51 -0700 Received: (qmail 23437 invoked by uid 89); 20 Aug 2000 17:51:48 -0000 Received: from p3e9ecd4d.dip.t-dialin.net (62.158.205.77) by mail.bieringer.de with POP3 XMIT; 20 Aug 2000 17:51:47 -0000 Message-Id: <3.0.6.32.20000820195327.0084f910@mail.bieringer.de> X-URL: http://www.bieringer.de/pb/ X-Sender: peter@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32) Date: Sun, 20 Aug 2000 19:53:27 +0200 To: YaNan Guo , Richard Guy Briggs From: Peter Bieringer Subject: Re: FreeS/WAN redesign thoughts (KLIPS, IPSEC) Cc: Linux Ipsec mailing list , NetFilter mailing list , Linux Network Development mailing list , John Gilmore , Hugh Daniel , Henry Spencer , Hugh Redelmeier , Richard Guy Briggs In-Reply-To: References: <20000815143539.B4771@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At 16:02 18.08.00 +0800, YaNan Guo wrote: >I want to know when the FreeSwan really support IPv6? Thanks. AFAIK, Gerhard Geßler (IABG/Germany) gessler@iabg.de is working on this. Peter From owner-netdev@oss.sgi.com Tue Aug 22 06:55:19 2000 Received: by oss.sgi.com id ; Tue, 22 Aug 2000 06:55:09 -0700 Received: from poseidon.pspt.fi ([193.166.51.47]:275 "EHLO poseidon.pspt.fi") by oss.sgi.com with ESMTP id ; Tue, 22 Aug 2000 06:54:50 -0700 Received: from localhost (priikone@localhost) by poseidon.pspt.fi (8.8.7/8.8.7) with ESMTP id QAA31544; Tue, 22 Aug 2000 16:56:08 +0300 Date: Tue, 22 Aug 2000 16:56:07 +0300 (EEST) From: "Pekka Riikonen [Adm]" To: netdev@oss.sgi.com cc: "Pekka Riikonen [Adm]" Subject: IPv6 detection Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, Is it sufficient to use dev->ip6_ptr (in 2.4) to detect that a device uses IPv6 addresses? The pointer is missing from 2.2, is there other ways to detect it in 2.2? Best regards, Pekka ________________________________________________________________________ Pekka Riikonen | Email: priikone@poseidon.pspt.fi SSH Communications Security Corp. | http://poseidon.pspt.fi/~priikone Tel. +358 (0)40 580 6673 | Kasarmikatu 11 A4, SF-70110 Kuopio PGP KeyID A924ED4F: http://poseidon.pspt.fi/~priikone/pubkey.asc From owner-netdev@oss.sgi.com Tue Aug 22 23:21:23 2000 Received: by oss.sgi.com id ; Tue, 22 Aug 2000 23:21:03 -0700 Received: from dns.broadcom.com ([207.93.217.3]:33296 "EHLO viggen.broadcom.com") by oss.sgi.com with ESMTP id ; Tue, 22 Aug 2000 23:20:42 -0700 Received: from mail-irva-1.broadcom.com (mail-irva-1.broadcom.com [10.4.10.10]) by viggen.broadcom.com (8.9.3/8.9.3/FW03) with ESMTP id XAA00392 for ; Tue, 22 Aug 2000 23:20:41 -0700 (PDT) Received: from localhost.localdomain ([10.23.3.192]) by mail-irva-1.broadcom.com (Post.Office MTA v3.5.3 release 223 ID# 0-67550U3000L400S0V35) with ESMTP id com for ; Tue, 22 Aug 2000 23:20:40 -0700 Received: (from nn@localhost) by localhost.localdomain (8.9.3/8.9.1) id XAA05437 for netdev@oss.sgi.com; Tue, 22 Aug 2000 23:30:16 -0700 Date: Tue, 22 Aug 2000 23:30:16 -0700 From: nn@broadcom.com (Neal Nuckolls) Message-Id: <200008230630.XAA05437@localhost.localdomain> To: netdev@oss.sgi.com Subject: ip_build_xmit() skb_reserve() problem Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I have a linux ethernet driver which must encapsulate standard ethernet frames adding/subtracting 12 bytes of media-specific protocol headers. It calls init_etherdev() and acts like a vanilla ethernet driver. TCP skb_alloc's and skb_reserves based on MAX_HEADER so TCP packets tend to always have at least 12 bytes of skb_headroom when the driver's start routine is called. But ip_build_xmit() allocs and reserves skb headroom based on dev->hard_header_len, rounding this up to the next multiple of 16. For ethernet (dev->hard_header_len == 14), this means non-tcp packets tend to arrive at the driver's xmit start routine with only 2 bytes of headroom forcing my driver to skb_realloc_headroom() which introduces a copy of the entire packet. If I increment dev->hard_header_len by 12 as a workaround, this forces me to write my own hard_header() and rebuild_header() routines since code in eth.c and dev.c break otherwise -- ok I can do that -- but also means I cannot use the hardware header cache (hard_header_cache/header_cache_update/hard_header_parse) since hh_data[] supports up to only 16bytes. This is a bummer because now I'm adding 12byte pads in front of each ether header constructed by my hard_header() routine, immediately pulling it off in my start_xmit routine since it's there just to coax ip_build_xmit() into allocating/reserving enough headroom, and I have to disable the header cache. Could the following line in ip_build_xmit: int hh_len = (rt->u.dst.dev->hard_header_len + 15)&~15; perhaps be changed to: int hh_len = (MAX_HEADER + 15) & ~15; in the 2.4.0-pre timeframe? I don't have the luxury of kernel mods for my customers.. Any replies, please send directly -- I'm not on the alias. thanks. neal nn@techie.com From owner-netdev@oss.sgi.com Wed Aug 23 10:35:45 2000 Received: by oss.sgi.com id ; Wed, 23 Aug 2000 10:35:21 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:16132 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 23 Aug 2000 10:34:39 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA00604; Wed, 23 Aug 2000 21:31:04 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008231731.VAA00604@ms2.inr.ac.ru> Subject: Re: ip_build_xmit() skb_reserve() problem To: nn@broadcom.COM (Neal Nuckolls) Date: Wed, 23 Aug 2000 21:31:04 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <200008230630.XAA05437@localhost.localdomain> from "Neal Nuckolls" at Aug 23, 0 10:45:06 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 322 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I have a linux ethernet driver which must encapsulate standard ethernet frames > adding/subtracting 12 bytes of media-specific protocol headers. Hence, it is not an ethernet. Use different hard_header*. And, probably, different arphrd, otherwice you will have troubles with raw packet interface. Alexey From owner-netdev@oss.sgi.com Wed Aug 23 11:30:36 2000 Received: by oss.sgi.com id ; Wed, 23 Aug 2000 11:30:21 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:32004 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 23 Aug 2000 11:29:54 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA00968; Wed, 23 Aug 2000 22:27:06 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008231827.WAA00968@ms2.inr.ac.ru> Subject: Re: ip_build_xmit() skb_reserve() problem To: nn@broadcom.com (Neal Nuckolls) Date: Wed, 23 Aug 2000 22:27:06 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <200008231815.LAA09801@localhost.localdomain> from "Neal Nuckolls" at Aug 23, 0 11:15:25 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1959 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Why should tcp_do_sendmsg() use MAX_HEADER but ip_build_xmit() > use "(dev->hard_header_len + 15) & ~15)"? Shouldn't they have > the same policy, whatever that policy is? Policy is to reserve not _more_ than hard_header_len. If driver prefers to hide its hard_header_len, but wants to have whole linear skb however, it has to copy. TCP has to reserve at least 128 bytes at head of any skb in any case and it has to reserve the whole MTU at tail, so that there are no reasons to try to save anything. It is not the case for datagram sockets, and we may respect device hint for them. MAX_HEADER_LL is not hint, it is a static value, which can be used and can be ignored. TCP uses this mainly due to specific of TCP internals, i.e. by plain luck and this can be changed in future. > I wrote new hard_header() OK. > and rebuild_header() routines Not used really. > but how can I support hard_header_cache(), > header_cache_update(), and hard_header_parse() with > struct hh_cache only supporting 16bytes of hh_data[] ? hh_cache works only for 16byte headers. Sorry. > It seems to me that many of the upcoming networking technologies > (home phoneline, 802.11, bluetooth, other wireless) > transport ethernet frames as payloads, taking on a few extra > bytes/doing additional coding/etc . All want to present an > ethernet interface and are happy to hide the details under > their driver interface. This works great except for one line > in ip_build_xmit() where it pedantically reserves hard_header_len > instead of using MAX_HEADER_LL like tcp and being more forgiving. > This forces those drivers to do a lot more work than they want > or need to do, imho.. I repeat: if device wants more memory, it must give hint in hard_header_len. If it prefers conspiracy, let it to copy in secret. 8) MAX_HEADER_LL does not reflect real header length, it is not tunable static value, which has no connection to devices used by system. Alexey From owner-netdev@oss.sgi.com Wed Aug 23 23:36:58 2000 Received: by oss.sgi.com id ; Wed, 23 Aug 2000 23:36:49 -0700 Received: from apollo.nbase.co.il ([194.90.137.2]:39949 "EHLO apollo.nbase.co.il") by oss.sgi.com with ESMTP id ; Wed, 23 Aug 2000 23:36:14 -0700 Received: from nbase.co.il ([194.90.136.56]) by apollo.nbase.co.il (Post.Office MTA v3.1.2 release (PO205-101c) ID# 0-44418U200L2S100) with ESMTP id AAA508; Thu, 24 Aug 2000 09:30:07 +0200 Message-ID: <39A4C189.2FB2B10E@nbase.co.il> Date: Thu, 24 Aug 2000 06:32:41 +0000 From: Gleb Natapov Organization: NBase-Xyplex X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14vlan i686) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: Neal Nuckolls , netdev@oss.sgi.com Subject: Re: ip_build_xmit() skb_reserve() problem References: <200008231827.WAA00968@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > [...] > > I repeat: if device wants more memory, it must give hint in hard_header_len. > If it prefers conspiracy, let it to copy in secret. 8) Unfortunately a 'dev' that you use when you reserve (dev->hard_header_len + 15) & ~15) bytes in the skb may be not the same device that actually transmits packet (netfilter may reroute packet to another device for instance). Thus dev->hard_header() can't really assume that there is enough space for hardware header in the skb. hard_header() should always check that there is sufficient space in headroom if it doesn't do this it's a bug. So in current situation the hard_header_len as a hint to upper layers is useless :( -- Gleb. From owner-netdev@oss.sgi.com Thu Aug 24 01:53:49 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 01:53:30 -0700 Received: from linux.vmri.hu ([193.225.208.140]:49416 "EHLO linux.vmri.hu") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 01:53:09 -0700 Received: from ant.vmri.hu ([193.225.208.142] helo=oops.vmri.hu) by linux.vmri.hu with esmtp (Exim 3.12 #1 (Debian)) id 13RslH-0005F4-00 for ; Thu, 24 Aug 2000 10:53:24 +0200 Received: from localhost ([127.0.0.1] helo=sch.bme.hu ident=cell) by oops.vmri.hu with esmtp (Exim 3.12 #1 (Debian)) id 13Rs7y-00036Z-00 for ; Thu, 24 Aug 2000 10:12:46 +0200 Message-ID: <39A4D8FA.126B23F0@sch.bme.hu> Date: Thu, 24 Aug 2000 10:12:42 +0200 From: Marcell Gal X-Mailer: Mozilla 4.73 [en] (X11; I; Linux 2.4.0-test5 i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: performance question: delay interrupts Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello Guys, Is there any way (for some ethernet drivers at least) to delay interrupts (eg. with x00 us), so that many packets are serviced in one go instead of many proc-time-eating interrupts? Alan Kennington suggested (to me) that there was a way to do this without HW support: in case of higher traffic we disable normal interrupts and do service when timer-expires (this might require raising HZ above 100). (if no frames are coming, or less than a treshold, we can reenable interrupts). Alan says the Nicstar ATM card has HW support for [369]00 us INT delay. Any ethernet cards known to have similar smart features? Is anything like this configurable for some ethernet drivers, has anyone done some successful or unsuccessful experiments with this? thanx: Cell -- "Things are more the way they are now than they have ever been before." -Former U.S. President Dwight D. Eisenhower From owner-netdev@oss.sgi.com Thu Aug 24 02:07:39 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 02:07:19 -0700 Received: from linuxcare.com.au ([203.29.91.49]:3335 "EHLO front.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 02:06:56 -0700 Received: from halfway.linuxcare.com.au (localhost [127.0.0.1]) by front.linuxcare.com.au (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id TAA32544 for ; Thu, 24 Aug 2000 19:05:47 +1000 X-Authentication-Warning: front.linuxcare.com.au: Host localhost [127.0.0.1] claimed to be halfway.linuxcare.com.au Received: from linuxcare.com.au (localhost [127.0.0.1]) by halfway.linuxcare.com.au (Postfix) with ESMTP id 05CD9816F; Thu, 24 Aug 2000 19:06:19 +1000 (EST) From: Rusty Russell To: netdev@oss.sgi.com Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru Subject: [PATCH] no-brainer patch Date: Thu, 24 Aug 2000 19:06:19 +1000 Message-Id: <20000824090620.05CD9816F@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing rt needs to be reset if skb rerouted by route_me_harder. BTW, I'm now convinced that route_me_harder is a bad idea: doing as a part of the netfilter architecture is too hard, and should be left to the netfilter modules which actually hack the packets. But I want 2.4.0 in our lifetime (maybe 2.4.1). Rusty. diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/ip_output.c working-2.4.0-test7-7/net/ipv4/ip_output.c --- linux-2.4.0-test7-7/net/ipv4/ip_output.c Tue Mar 28 04:35:56 2000 +++ working-2.4.0-test7-7/net/ipv4/ip_output.c Thu Aug 24 16:48:33 2000 @@ -327,6 +327,7 @@ kfree_skb(skb); return -EHOSTUNREACH; } + rt = (struct rtable *)skb->dst; } #endif -- Hacking time. From owner-netdev@oss.sgi.com Thu Aug 24 02:12:49 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 02:12:40 -0700 Received: from linuxcare.com.au ([203.29.91.49]:7687 "EHLO front.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 02:12:23 -0700 Received: from halfway.linuxcare.com.au (localhost [127.0.0.1]) by front.linuxcare.com.au (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id TAA32625 for ; Thu, 24 Aug 2000 19:11:23 +1000 X-Authentication-Warning: front.linuxcare.com.au: Host localhost [127.0.0.1] claimed to be halfway.linuxcare.com.au Received: from linuxcare.com.au (localhost [127.0.0.1]) by halfway.linuxcare.com.au (Postfix) with ESMTP id 5B52F816F; Thu, 24 Aug 2000 19:11:58 +1000 (EST) From: Rusty Russell To: netdev@oss.sgi.com Cc: davem@redhat.com Subject: [PATCH] tunnel debugging cleanup patch Date: Thu, 24 Aug 2000 19:11:58 +1000 Message-Id: <20000824091158.5B52F816F@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This should suppress the bogus debug messages when CONFIG_NETFILTER_DEBUG is used with tunnels... Rusty. diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/ip_gre.c working-2.4.0-test7-7/net/ipv4/ip_gre.c --- linux-2.4.0-test7-7/net/ipv4/ip_gre.c Wed Aug 23 18:12:54 2000 +++ working-2.4.0-test7-7/net/ipv4/ip_gre.c Thu Aug 24 16:48:30 2000 @@ -632,6 +632,9 @@ #ifdef CONFIG_NETFILTER nf_conntrack_put(skb->nfct); skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif #endif ipgre_ecn_decapsulate(iph, skb); netif_rx(skb); @@ -858,6 +861,9 @@ #ifdef CONFIG_NETFILTER nf_conntrack_put(skb->nfct); skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif #endif IPTUNNEL_XMIT(); diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/ipip.c working-2.4.0-test7-7/net/ipv4/ipip.c --- linux-2.4.0-test7-7/net/ipv4/ipip.c Wed Aug 23 18:12:54 2000 +++ working-2.4.0-test7-7/net/ipv4/ipip.c Thu Aug 24 16:48:30 2000 @@ -496,6 +496,9 @@ #ifdef CONFIG_NETFILTER nf_conntrack_put(skb->nfct); skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif #endif ipip_ecn_decapsulate(iph, skb); netif_rx(skb); @@ -639,6 +642,9 @@ #ifdef CONFIG_NETFILTER nf_conntrack_put(skb->nfct); skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif #endif IPTUNNEL_XMIT(); diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv6/sit.c working-2.4.0-test7-7/net/ipv6/sit.c --- linux-2.4.0-test7-7/net/ipv6/sit.c Wed Aug 23 18:12:54 2000 +++ working-2.4.0-test7-7/net/ipv6/sit.c Thu Aug 24 16:48:30 2000 @@ -401,6 +401,9 @@ #ifdef CONFIG_NETFILTER nf_conntrack_put(skb->nfct); skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif #endif ipip6_ecn_decapsulate(iph, skb); netif_rx(skb); @@ -567,6 +570,9 @@ #ifdef CONFIG_NETFILTER nf_conntrack_put(skb->nfct); skb->nfct = NULL; +#ifdef CONFIG_NETFILTER_DEBUG + skb->nf_debug = 0; +#endif #endif IPTUNNEL_XMIT(); -- Hacking time. From owner-netdev@oss.sgi.com Thu Aug 24 02:27:30 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 02:27:19 -0700 Received: from linuxcare.com.au ([203.29.91.49]:17159 "EHLO front.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 02:26:49 -0700 Received: from halfway.linuxcare.com.au (localhost [127.0.0.1]) by front.linuxcare.com.au (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id TAA00329 for ; Thu, 24 Aug 2000 19:25:40 +1000 X-Authentication-Warning: front.linuxcare.com.au: Host localhost [127.0.0.1] claimed to be halfway.linuxcare.com.au Received: from linuxcare.com.au (localhost [127.0.0.1]) by halfway.linuxcare.com.au (Postfix) with ESMTP id 19DA6816F; Thu, 24 Aug 2000 19:26:11 +1000 (EST) From: Rusty Russell To: torvalds@transmeta.com Cc: netfilter@lists.samba.org, netdev@oss.sgi.com Subject: [PATCH] more netfilter bugfixen Date: Thu, 24 Aug 2000 19:26:11 +1000 Message-Id: <20000824092611.19DA6816F@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linus, please apply (finished dredging old bug reports). This brings us completely in sync with all known bugs which will be resolved for 2.4, bar one ftp problem. Anything else can wait for 2.4.1/2.5, unless some showstoppers come up. Contains these multi-line fixes: o Extra implicit NAT (ssh fix) (Rusty) o Local NAT fix (Rusty) No-brainers: o Warning about NAT interference (Alexey) o depmod fix for compat modules (Rusty) o kfree_skb fixes (Arnaldo Carvalho de Melo) o compile warning fix for ipt_LOG.c (Rusty) o 16-bit xchg removed (for non-x86) (Rusty) Passes testsuite with flying colors on my SMP test box, so if it's broken after this, expect it to be broken in 2.4. Thanks, Rusty. --- linux-2.4.0-test7-7/Documentation/Configure.help Thu Aug 24 16:36:55 2000 +++ working-2.4.0-test7-7/Documentation/Configure.help Thu Aug 24 19:15:20 2000 @@ -1944,6 +1944,9 @@ Full NAT CONFIG_IP_NF_NAT + Do not select "Y", if you are not sure that you will use this. + It may affect performance and system robustness negatively. + The Full NAT option allows masquerading, port forwarding and other forms of full Network Address Port Translation. It is controlled by the `nat' table in iptables: see the man page for iptables(8). diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/netfilter/ip_fw_compat.c working-2.4.0-test7-7/net/ipv4/netfilter/ip_fw_compat.c --- linux-2.4.0-test7-7/net/ipv4/netfilter/ip_fw_compat.c Wed Aug 23 18:12:54 2000 +++ working-2.4.0-test7-7/net/ipv4/netfilter/ip_fw_compat.c Wed Aug 23 19:00:33 2000 @@ -15,6 +15,10 @@ #include #include +/* Theoretically, we could one day use 2.4 helpers, but for now it + just confuses depmod --RR */ +EXPORT_NO_SYMBOLS; + static struct firewall_ops *fwops; /* From ip_fw_compat_redir.c */ diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/netfilter/ip_nat_core.c working-2.4.0-test7-7/net/ipv4/netfilter/ip_nat_core.c --- linux-2.4.0-test7-7/net/ipv4/netfilter/ip_nat_core.c Wed Aug 23 18:14:12 2000 +++ working-2.4.0-test7-7/net/ipv4/netfilter/ip_nat_core.c Thu Aug 24 17:50:48 2000 @@ -438,8 +438,27 @@ conntrack)); ret = 1; goto clear_fulls; + } else if (HOOK2MANIP(hooknum) == IP_NAT_MANIP_DST) { + /* Try implicit source NAT; protocol + may be able to play with ports to + make it unique. */ + struct ip_nat_range r + = { IP_NAT_RANGE_MAP_IPS, + tuple->src.ip, tuple->src.ip, + { 0 }, { 0 } }; + DEBUGP("Trying implicit mapping\n"); + if (proto->unique_tuple(tuple, &r, + IP_NAT_MANIP_SRC, + conntrack)) { + /* Must be unique. */ + IP_NF_ASSERT(!ip_nat_used_tuple + (tuple, conntrack)); + ret = 1; + goto clear_fulls; + } } - DEBUGP("Protocol can't get unique tuple.\n"); + DEBUGP("Protocol can't get unique tuple %u.\n", + hooknum); } /* Eliminate that from range, and try again. */ @@ -466,10 +485,11 @@ } /* Where to manip the reply packets (will be reverse manip). */ +/* FIXME: really should use LOCAL_IN hook for replies to LOCAL_OUT --RR */ static unsigned int opposite_hook[NF_IP_NUMHOOKS] = { [NF_IP_PRE_ROUTING] = NF_IP_POST_ROUTING, [NF_IP_POST_ROUTING] = NF_IP_PRE_ROUTING, - [NF_IP_LOCAL_OUT] = NF_IP_POST_ROUTING + [NF_IP_LOCAL_OUT] = NF_IP_PRE_ROUTING }; unsigned int @@ -704,6 +724,16 @@ struct ip_nat_helper *helper; enum ip_conntrack_dir dir = CTINFO2DIR(ctinfo); + /* Cosmetic: don't want to see mangled packets in SOCK_PACKET + sockets (not for TCP code: expects sk != NULL) */ + if (info->num_manips && !(*pskb)->sk && skb_cloned(*pskb)) { + struct sk_buff *new = skb_copy(*pskb, GFP_ATOMIC); + if (!new) + return NF_DROP; + kfree_skb(*pskb); + *pskb = new; + } + /* Need nat lock to protect against modification, but neither conntrack (referenced) and helper (deleted with synchronize_bh()) can vanish. */ @@ -736,6 +766,14 @@ } else return NF_ACCEPT; } +/* For local ICMPs (LOCAL_OUT), we need to do POST_ROUTING manips + here, otherwise they won't get done --RR */ +static unsigned int icmp_hook[NF_IP_NUMHOOKS] += { [NF_IP_PRE_ROUTING] = NF_IP_POST_ROUTING, + [NF_IP_POST_ROUTING] = NF_IP_PRE_ROUTING, + [NF_IP_LOCAL_OUT] = NF_IP_POST_ROUTING +}; + unsigned int icmp_reply_translation(struct sk_buff *skb, struct ip_conntrack *conntrack, @@ -793,7 +831,7 @@ packet, except it was never src/dst reversed, so where we would normally apply a dst manip, we apply a src, and vice versa. */ - if (info->manips[i].hooknum == opposite_hook[hooknum]) { + if (info->manips[i].hooknum == icmp_hook[hooknum]) { DEBUGP("icmp_reply: inner %s -> %u.%u.%u.%u %u\n", info->manips[i].maniptype == IP_NAT_MANIP_SRC ? "DST" : "SRC", diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/netfilter/ip_queue.c working-2.4.0-test7-7/net/ipv4/netfilter/ip_queue.c --- linux-2.4.0-test7-7/net/ipv4/netfilter/ip_queue.c Wed Aug 23 18:14:12 2000 +++ working-2.4.0-test7-7/net/ipv4/netfilter/ip_queue.c Wed Aug 23 19:03:00 2000 @@ -414,7 +414,7 @@ return skb; nlmsg_failure: if (skb) - kfree(skb); + kfree_skb(skb); *errp = 0; printk(KERN_ERR "ip_queue: error creating netlink message\n"); return NULL; diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/netfilter/ip_tables.c working-2.4.0-test7-7/net/ipv4/netfilter/ip_tables.c --- linux-2.4.0-test7-7/net/ipv4/netfilter/ip_tables.c Wed Aug 23 18:12:54 2000 +++ working-2.4.0-test7-7/net/ipv4/netfilter/ip_tables.c Wed Aug 23 19:03:39 2000 @@ -89,10 +89,8 @@ unsigned int hook_entry[NF_IP_NUMHOOKS]; unsigned int underflow[NF_IP_NUMHOOKS]; - char padding[SMP_ALIGN((NF_IP_NUMHOOKS*2+2)*sizeof(unsigned int))]; - /* ipt_entry tables: one per CPU */ - char entries[0]; + char entries[0] __attribute__((aligned(SMP_CACHE_BYTES))); }; static LIST_HEAD(ipt_target); @@ -1359,7 +1357,7 @@ int ret; struct ipt_table_info *newinfo; static struct ipt_table_info bootstrap - = { 0, 0, { 0 }, { 0 }, { }, { } }; + = { 0, 0, { 0 }, { 0 }, { } }; MOD_INC_USE_COUNT; newinfo = vmalloc(sizeof(struct ipt_table_info) diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/netfilter/ipt_LOG.c working-2.4.0-test7-7/net/ipv4/netfilter/ipt_LOG.c --- linux-2.4.0-test7-7/net/ipv4/netfilter/ipt_LOG.c Wed Aug 23 18:14:12 2000 +++ working-2.4.0-test7-7/net/ipv4/netfilter/ipt_LOG.c Thu Aug 24 14:01:54 2000 @@ -288,7 +288,8 @@ if (in && !out) { /* MAC logging for input chain only. */ printk("MAC="); - if ((*pskb)->dev && (*pskb)->dev->hard_header_len && (*pskb)->mac.raw != iph) { + if ((*pskb)->dev && (*pskb)->dev->hard_header_len + && (*pskb)->mac.raw != (unsigned char *)iph) { int i; unsigned char *p = (*pskb)->mac.raw; for (i = 0; i < (*pskb)->dev->hard_header_len; i++,p++) diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/netfilter/ipt_MIRROR.c working-2.4.0-test7-7/net/ipv4/netfilter/ipt_MIRROR.c --- linux-2.4.0-test7-7/net/ipv4/netfilter/ipt_MIRROR.c Tue Jul 11 12:08:17 2000 +++ working-2.4.0-test7-7/net/ipv4/netfilter/ipt_MIRROR.c Wed Aug 23 19:03:00 2000 @@ -89,7 +89,7 @@ dst->neighbour->output(skb); else { printk(KERN_DEBUG "khm in MIRROR\n"); - kfree(skb); + kfree_skb(skb); } } diff -urN -X /tmp/filezRIWiT --minimal linux-2.4.0-test7-7/net/ipv4/netfilter/ipt_REJECT.c working-2.4.0-test7-7/net/ipv4/netfilter/ipt_REJECT.c --- linux-2.4.0-test7-7/net/ipv4/netfilter/ipt_REJECT.c Wed Aug 23 18:14:12 2000 +++ working-2.4.0-test7-7/net/ipv4/netfilter/ipt_REJECT.c Wed Aug 23 19:03:32 2000 @@ -27,6 +27,7 @@ struct tcphdr *otcph, *tcph; struct rtable *rt; unsigned int otcplen; + u_int16_t tmp; int needs_ack; /* IP header checks: fragment, too short. */ @@ -64,8 +65,11 @@ tcph = (struct tcphdr *)((u_int32_t*)nskb->nh.iph + nskb->nh.iph->ihl); + /* Swap source and dest */ nskb->nh.iph->daddr = xchg(&nskb->nh.iph->saddr, nskb->nh.iph->daddr); - tcph->source = xchg(&tcph->dest, tcph->source); + tmp = tcph->source; + tcph->source = tcph->dest; + tcph->dest = tmp; /* Truncate to length (no data) */ tcph->doff = sizeof(struct tcphdr)/4; -- Hacking time. From owner-netdev@oss.sgi.com Thu Aug 24 06:41:40 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 06:41:20 -0700 Received: from www.mcab.se ([194.165.225.5]:63750 "EHLO brum.mcab.se") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 06:40:58 -0700 Received: from mcab.se (gateway.mcab.se [194.165.225.11]) by brum.mcab.se (8.9.3/8.8.7) with ESMTP id QAA27822 for ; Thu, 24 Aug 2000 16:46:31 +0200 Message-ID: <39A52773.BD115DDB@mcab.se> Date: Thu, 24 Aug 2000 15:47:31 +0200 From: Markus Westergren X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686) X-Accept-Language: sv, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Netlink/ethertap and 100Mbit Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi I have some problem with ethertap and 100Mbit networks. I'm working on a VPN solution that uses the ethertap device. Everything is working in a 10Mbit network. When I try it in a 100Mbit network I get much lower transfer rates( < 100KB/s). The VPN solution splits large packets so that the users can run their own VLAN's over the VPN. This generates packets which are sent back-to-back. The VPN computer is a Celeron 533 64 MB and runs kernel 2.2.14 with LRP Materhorn. A lousy figure of the system: Client--VPN---VPN--Client I have the same problem when I use FTP to transfer data between two computers connected with TapTunnel (author Lennart Poettering). This test was done on two other computers running kernel 2.2.14 and Mandrake 7.0. I have done some tracing in the kernel and found the following: 10Mbit 100Mbit rtl8139_rx: got 872 bytes rtl8139_rx: got 872 bytes net_bh: eth0: 858 bytes rtl8139_rx: got 808 bytes ip_rcv: 858 bytes net_bh: eth0: 858 bytes inet_recvmsg: 806 bytes ip_rcv: 858 bytes ethertap_rx_skb: tap0: 802 bytes net_bh: eth0: 794 bytes rtl8139_rx: got 808 bytes ip_rcv: 794 bytes net_bh: eth0: 794 bytes inet_recvmsg: 1548 bytes ip_rcv: 794 bytes ethertap_rx_skb: tap0: 802 bytes net_bh: tap0: 800 bytes net_bh: tap0: 800 bytes my_proto_rx: part one 800 bytes my_proto_rx: part one 800 bytes inet_recvmsg: 742 bytes ethertap_rx_skb: tap0: 738 bytes net_bh: tap0: 736 bytes my_proto_rx: part two 736 bytes The second fragment almost always disapears in the 100Mbit case. I can make this problem go away by lowering the MTU on the clients so that I don't have to split the packages. It seems like the problem appears when packets are sent back-to-back. I have tried to follow the path of a packet in the kernel without much success. Have anyone any idea what could be the problem here? I would be happy to fix this if only I could understand the connection between IP/netlink/ethertap. Thanks /Markus ----------------------------------------------------------------------- Markus Westergren Home: maw@acc.umu.se Biologigränd 17 Work: markus@mcab.se S-907 32 Umeå Sweden From owner-netdev@oss.sgi.com Thu Aug 24 07:04:09 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 07:03:59 -0700 Received: from adsl-151-196-242-25.bellatlantic.net ([151.196.242.25]:40946 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 07:03:46 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id KAA01330; Thu, 24 Aug 2000 10:04:02 -0400 Date: Thu, 24 Aug 2000 10:04:02 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Marcell Gal cc: netdev@oss.sgi.com Subject: Re: performance question: delay interrupts In-Reply-To: <39A4D8FA.126B23F0@sch.bme.hu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 24 Aug 2000, Marcell Gal wrote: > Is there any way (for some ethernet drivers at least) to delay > interrupts (eg. with x00 us), so that many packets are serviced in > one go instead of many proc-time-eating interrupts? Sure, and drivers already do this. > Alan Kennington suggested (to me) that there was a way to do this > without > HW support: in case of higher traffic we disable normal interrupts > and do service when timer-expires (this might require raising HZ above > 100). (if no frames are coming, or less than a treshold, we can > reenable interrupts). If you work through the problem, doing this purely in software is complicated and might not be worth the extra complexity for general purpose systems. If you have such a overwhelming load, just buy hardware with hardware interrupt mitigation. Many drivers do transmit interrupt mitigation in software: they don't raise an interrupt for each Tx-done event. This might result in out-of-sync Tx statistics, but in normal use you usually get a receive interrupt soon after transmitting a packet. (Note: this is not true when you need accurate statistics the most: when you are tracking down network problems.) Doing software receive interrupt mitigation results in very high latency. This is especially bad for protocols with short packet ping-pong communication, and biases network access to large packet, large window communication. The "hot ticket" in Ethernet designs used to be exactly the opposite approach: predictive Rx interrupts. The hardware raised an interrupt before the packet had completely arrived. You might take two interrupts per packet, but the resulting performance for certain protocols and environments won benchmarks. > Alan says the Nicstar ATM card has HW support for [369]00 us INT delay. > Any ethernet cards known to have similar smart features? The 21143 is a common, low-cost example. All gigabit Ethernet boards have some sort of mitigation. > Is anything like this configurable for some ethernet drivers, has > anyone done some successful or unsuccessful experiments with this? See the work by Josip Loncaric in the Tulip mailing list achieve for a write-up on the 21143 mitigation. I've also done experiments, but the result have usually gone directly into the drivers. I've learned that writing up the results is usually wasted time: A few years ago I did profiles of the typical transmit queue length and packet dwell period. I wrote up the results, complete with charts and graphs and Xs and Os on the back, but no one on the driver mailing lists was interested. It's pretty mundane topic, and wouldn't make a substantial conference paper. So now I just use the final numbers. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf-II Cluster Distribution Annapolis MD 21403 From owner-netdev@oss.sgi.com Thu Aug 24 08:19:21 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 08:19:01 -0700 Received: from wirespeed.solidum.com ([207.35.224.226]:2280 "EHLO solidum.com") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 08:18:26 -0700 Received: from phobos.solidum.com (phobos.solidum.com [192.168.1.13]) by solidum.com (8.8.7/8.8.7) with ESMTP id LAA18336 for ; Thu, 24 Aug 2000 11:17:55 -0400 Message-Id: <200008241517.LAA18336@solidum.com> To: netdev@oss.sgi.com Subject: Re: performance question: delay interrupts In-Reply-To: Your message of "Thu, 24 Aug 2000 10:12:42 +0200." <39A4D8FA.126B23F0@sch.bme.hu> Mime-Version: 1.0 (generated by tm-edit 1.5) Content-Type: text/plain; charset=US-ASCII Date: Thu, 24 Aug 2000 11:17:55 -0400 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Marcell" == Marcell Gal writes: Marcell> Alan says the Nicstar ATM card has HW support for [369]00 us INT delay. Marcell> Any ethernet cards known to have similar smart features? Marcell> Is anything like this configurable for some ethernet drivers, has Marcell> anyone done some successful or unsuccessful experiments with this? The AMD PCnet32 will not post an interrupt unless you set a bit in the descriptor ring. You can also have it provide an interrupt based upon a timer. The Alteon and SysKonnect GBE have similar features. :!mcr!: | Solidum Systems Corporation, http://www.solidum.com Michael Richardson |For a better connected world,where data flows faster Personal: http://www.sandelman.ottawa.on.ca/People/Michael_Richardson/Bio.html mailto:mcr@sandelman.ottawa.on.ca mailto:mcr@solidum.com From owner-netdev@oss.sgi.com Thu Aug 24 09:19:10 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 09:18:51 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:19980 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 24 Aug 2000 09:18:26 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA17086; Thu, 24 Aug 2000 20:07:23 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008241607.UAA17086@ms2.inr.ac.ru> Subject: Re: ip_build_xmit() skb_reserve() problem To: gleb@nbase.co.il (Gleb Natapov) Date: Thu, 24 Aug 2000 20:07:23 +0400 (MSK DST) Cc: nn@broadcom.com, netdev@oss.sgi.com In-Reply-To: <39A4C189.2FB2B10E@nbase.co.il> from "Gleb Natapov" at Aug 24, 0 06:32:41 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 667 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > (dev->hard_header_len + 15) & ~15) bytes in the skb may be not the same > device that actually transmits packet (netfilter may reroute packet to > another device for instance). This is problem of netfilter. If it wants to do such crap, it should make it correctly at least. > Thus dev->hard_header() can't really > assume that there is enough space for hardware header in the skb. > hard_header() should always check that there is sufficient space in > headroom if it doesn't do this it's a bug. Nope. Caller must reserve enough of space. If netfilter does not check hard_header_len at packet head rewriting packets, it is plain fatal bug. Alexey From owner-netdev@oss.sgi.com Thu Aug 24 09:53:50 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 09:53:41 -0700 Received: from battlejitney.wdhq.scyld.com ([216.254.93.178]:51700 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 09:53:23 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id MAA02586; Thu, 24 Aug 2000 12:53:38 -0400 Date: Thu, 24 Aug 2000 12:53:38 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Michael Richardson cc: netdev@oss.sgi.com Subject: Re: performance question: delay interrupts In-Reply-To: <200008241517.LAA18336@solidum.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 24 Aug 2000, Michael Richardson wrote: > >>>>> "Marcell" == Marcell Gal writes: > Marcell> Alan says the Nicstar ATM card has HW support for [369]00 us INT delay. > Marcell> Any ethernet cards known to have similar smart features? > Marcell> Is anything like this configurable for some ethernet drivers, has > Marcell> anyone done some successful or unsuccessful experiments with this? > > The AMD PCnet32 will not post an interrupt unless you set a bit in the > descriptor ring. This is the common software Tx interrupt mitigation. Most descriptor based drivers use this. > You can also have it provide an interrupt based upon a > timer. Timers are common, but not ubiquitous. A usual issue with Ethernet adapter hardware timers is that the timers are not consistent. They are usually referenced to variable time bases such the PCI bus clock or the transceiver clock. The transceiver clock is the worst -- it changes in a non-obvious way (not 10x) between 10Mbps and 100Mbps. > The Alteon and SysKonnect GBE have similar features. Hardware interrupt mitigation is a different beast: it can usually do things such as "I just raised an interrupt a few 10s of microseconds ago. I'll defer this one for a bit" or "I'll wait to see that this next packet is not for me before raising the Rx interrupt". These cannot be efficiently emulated with software. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf-II Cluster Distribution Annapolis MD 21403 From owner-netdev@oss.sgi.com Thu Aug 24 10:39:11 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 10:39:01 -0700 Received: from shoe.tuxtops.com ([208.184.141.200]:50957 "EHLO shoe.tuxtops.com") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 10:38:23 -0700 Received: from localhost (brad@localhost) by shoe.tuxtops.com (8.9.3/8.9.3) with ESMTP id KAA29982 for ; Thu, 24 Aug 2000 10:20:36 -0700 Date: Thu, 24 Aug 2000 10:20:36 -0700 (PDT) From: To: netdev@oss.sgi.com Subject: Net Connect Detection Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello all, I'm working on a small project that requires that I be able to detect a network card's link status (link light) or some equivalent. The project involves extending/rewriting Debian's divine project. I've looked through the devel kernel tree and I was unable to find any "hooks" for such a feature. I was under the impression that network cards would generate a signal when the interface was down, but perhaps I am wrong and each vendor deals with this in their own way if it can be reported at all. If anyone has any suggestions, I would greatly apprectiate them. Otherwise, I fear I will have to fall back to something much less graceful. Thanks, Brad Douglas brad@tuxtops.com From owner-netdev@oss.sgi.com Thu Aug 24 10:41:41 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 10:41:31 -0700 Received: from fox.doc.ic.ac.uk ([146.169.1.1]:47371 "EHLO fox.doc.ic.ac.uk") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 10:41:17 -0700 Received: from flibble.doc.ic.ac.uk ([146.169.5.42] helo=doc.ic.ac.uk ident=ncet) by fox.doc.ic.ac.uk with esmtp (Exim 2.05 #1) id 13S0yp-0000nC-00 for netdev@oss.sgi.com; Thu, 24 Aug 2000 18:39:55 +0100 Message-ID: <39A55DEB.94242945@doc.ic.ac.uk> Date: Thu, 24 Aug 2000 18:39:55 +0100 From: Nick Towers Organization: Dept. of Computing, Imperial College, UK X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.17pre14 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Implementing 802.3ad-2000 link aggregation under Linux Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello, I am looking at starting a project to implment 802.3ad ethernet link aggregation under Linux. For those of you that have never heard of it I've given a description and a few pointers at the end. Before I launch into it I was wondering if there is anyone else out there who 1) Has started work on an implementation 2) Would like to help or be involved 3) Knows somewhere/one else I should be talking about this Also, if anyone knows why this is an obvious non-starter, or thinks theres a certain way to go about this I'd love to hear from you - I've got time to try to have a serious go at this, and plenty of machines/net cards/802.3ad compliant switches, but I don't have a lot of Linux kernel hacking experience.... Some background on why an implementation would be a good thing: 802.3ad is now an approved (currently being published) IEEE extension to the ethernet standard. It allows for aggregaion of multiple links for resiliance and for increased bandwidth. Ie. To the outside world it appears as a single link. Any links can be aggregated into a logical link as long as they are full duplex links of the same speed. The standard must be purchased but the minutes of the meetings and other bits and pieces are available: http://www.manta.ieee.org/groups/802/3/ad/ In order to be involved at this stage, it really requires a copy of the spec (and the spec refers back to the 802.3-1998 spec in many places) In many ways this can be seen as a (considerable) extension to the bonding driver - supporting the standard as opposed to the proprietary bonding schemes. 802.3ad is however considerably more complex if the full specification is to be implemented due to the support of automatic use of aggregation if possible and support for removing and adding parts of the logical link at any time. The spec does allow for partial implementation to not include all of the autonegotiation, an obvious first step to aim for. Obviously this has excellent potential for high availablity servers as well as for provision of a progression path between 100Mbit and Gigabit ethernet without the admin hassles of multiple interfaces. Please let me know if you're interested... Nick -- Nick Towers : Systems developer, Dept. of Computing, Imperial College n.towers@doc.ic.ac.uk or mailto:ncet@doc.ic.ac.uk for point and click If you feel lucky visit my web site - http://www.doc.ic.ac.uk/~ncet/ From owner-netdev@oss.sgi.com Thu Aug 24 13:37:54 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 13:37:44 -0700 Received: from battlejitney.wdhq.scyld.com ([216.254.93.178]:26103 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 13:37:25 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id QAA03394; Thu, 24 Aug 2000 16:37:30 -0400 Date: Thu, 24 Aug 2000 16:37:30 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: brad@tuxtops.com cc: netdev@oss.sgi.com Subject: Re: Net Connect Detection In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 24 Aug 2000 brad@tuxtops.com wrote: > I'm working on a small project that requires that I be able to detect a > network card's link status (link light) or some equivalent. The project > involves extending/rewriting Debian's divine project. You want 'mii-diag' http://www.scyld.com/diag/index.html > I've looked through the devel kernel tree and I was unable to find any > "hooks" for such a feature. I was under the impression that network cards > would generate a signal when the interface was down, but perhaps I am > wrong and each vendor deals with this in their own way if it can be > reported at all. For the fully general case, it's not possible to detect link status. Consider coax -- you can detect a correctly terminated link only by either receiving or transmitting successfully, but you can't be certain a link is broken. For the very common case of twisted pair media, 10baseT and 100baseTx, there is usually a link beat indication. But not always e.g. NE2000 clones, which use the 10baseT link beat signal only to switch from the AUI port. The most useful case, common and standardized, is MII MDI (Management Data Interface) registers. They report the link beat along with extensive other information about the link status. The downside is that the usual interface to the MII registers is a serial bit stream. Reading a register can take 50usec of locked CPU time to generate. So the drivers minimize the implicit MII register reads, instead require explicit MII reads. I stress this because there have been repeated proposals to show the MII registers in /proc/net/* or /proc/bus/mii/eth0. There are multiple problems with this, but 2*32*50usec. of CPU time is the clearest argument against it. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf-II Cluster Distribution Annapolis MD 21403 From owner-netdev@oss.sgi.com Thu Aug 24 17:37:05 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 17:36:55 -0700 Received: from ns1144.munich.netsurf.de ([195.180.235.144]:40452 "HELO fred.muc.de") by oss.sgi.com with SMTP id ; Thu, 24 Aug 2000 17:36:31 -0700 Received: by fred.muc.de (Postfix, from userid 500) id F1140E3449; Thu, 24 Aug 2000 14:25:39 +0200 (CEST) Date: Thu, 24 Aug 2000 14:25:39 +0200 From: Andi Kleen To: Gleb Natapov Cc: kuznet@ms2.inr.ac.ru, Neal Nuckolls , netdev@oss.sgi.com Subject: Re: ip_build_xmit() skb_reserve() problem Message-ID: <20000824142539.A1688@fred.muc.de> References: <200008231827.WAA00968@ms2.inr.ac.ru> <39A4C189.2FB2B10E@nbase.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <39A4C189.2FB2B10E@nbase.co.il>; from gleb@nbase.co.il on Thu, Aug 24, 2000 at 08:40:55AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, Aug 24, 2000 at 08:40:55AM +0200, Gleb Natapov wrote: > kuznet@ms2.inr.ac.ru wrote: > > > [...] > > > > I repeat: if device wants more memory, it must give hint in hard_header_len. > > If it prefers conspiracy, let it to copy in secret. 8) > > Unfortunately a 'dev' that you use when you reserve > (dev->hard_header_len + 15) & ~15) bytes in the skb may be not the same > device that actually transmits packet (netfilter may reroute packet to > another device for instance). Thus dev->hard_header() can't really > assume that there is enough space for hardware header in the skb. > hard_header() should always check that there is sufficient space in > headroom if it doesn't do this it's a bug. When you're doing NAT you just have to eat the skb_realloc_headroom(). NAT does not come for free. When you care about performance do not use it. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Thu Aug 24 17:37:05 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 17:36:45 -0700 Received: from ns1144.munich.netsurf.de ([195.180.235.144]:40708 "HELO fred.muc.de") by oss.sgi.com with SMTP id ; Thu, 24 Aug 2000 17:36:25 -0700 Received: by fred.muc.de (Postfix, from userid 500) id 35B5FE38DF; Thu, 24 Aug 2000 20:22:03 +0200 (CEST) Date: Thu, 24 Aug 2000 20:22:03 +0200 From: Andi Kleen To: Nick Towers Cc: netdev@oss.sgi.com Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux Message-ID: <20000824202203.A4710@fred.muc.de> References: <39A55DEB.94242945@doc.ic.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <39A55DEB.94242945@doc.ic.ac.uk>; from ncet@doc.ic.ac.uk on Thu, Aug 24, 2000 at 07:43:29PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, Aug 24, 2000 at 07:43:29PM +0200, Nick Towers wrote: > Hello, > > I am looking at starting a project to implment 802.3ad ethernet link > aggregation under Linux. For those of you that have never heard of it > I've given a description and a few pointers at the end. Before I > launch into it I was wondering if there is anyone else out there who > > 1) Has started work on an implementation I started work some time ago. It is basically an user space problem. -Andi From owner-netdev@oss.sgi.com Thu Aug 24 19:03:06 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 19:02:46 -0700 Received: from pizda.ninka.net ([216.101.162.242]:44425 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 19:02:21 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id SAA02979; Thu, 24 Aug 2000 18:50:15 -0700 Date: Thu, 24 Aug 2000 18:50:15 -0700 Message-Id: <200008250150.SAA02979@pizda.ninka.net> From: "David S. Miller" To: rusty@linuxcare.com.au CC: netdev@oss.sgi.com In-reply-to: <20000824091158.5B52F816F@halfway.linuxcare.com.au> (message from Rusty Russell on Thu, 24 Aug 2000 19:11:58 +1000) Subject: Re: [PATCH] tunnel debugging cleanup patch References: <20000824091158.5B52F816F@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Patch applied. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Thu Aug 24 19:03:36 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 19:03:26 -0700 Received: from pizda.ninka.net ([216.101.162.242]:45449 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 19:03:12 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id SAA02976; Thu, 24 Aug 2000 18:50:08 -0700 Date: Thu, 24 Aug 2000 18:50:08 -0700 Message-Id: <200008250150.SAA02976@pizda.ninka.net> From: "David S. Miller" To: rusty@linuxcare.com.au CC: netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru In-reply-to: <20000824090620.05CD9816F@halfway.linuxcare.com.au> (message from Rusty Russell on Thu, 24 Aug 2000 19:06:19 +1000) Subject: Re: [PATCH] no-brainer patch References: <20000824090620.05CD9816F@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Patch applied. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Thu Aug 24 19:13:16 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 19:12:56 -0700 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:18446 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 19:12:42 -0700 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.9.3) with ESMTP id TAA27931; Thu, 24 Aug 2000 19:58:25 -0700 Message-ID: <39A5E0D1.A1994489@candelatech.com> Date: Thu, 24 Aug 2000 19:58:25 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.16 i586) X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: Nick Towers , netdev@oss.sgi.com Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux References: <39A55DEB.94242945@doc.ic.ac.uk> <20000824202203.A4710@fred.muc.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andi Kleen wrote: > > On Thu, Aug 24, 2000 at 07:43:29PM +0200, Nick Towers wrote: > > Hello, > > > > I am looking at starting a project to implment 802.3ad ethernet link > > aggregation under Linux. For those of you that have never heard of it > > I've given a description and a few pointers at the end. Before I > > launch into it I was wondering if there is anyone else out there who > > > > 1) Has started work on an implementation > > I started work some time ago. It is basically an user space problem. How would it be user-space? Doesn't it aggregate several physical layers together into one interface? That seems like a kernel level thing to me.... Ben > > -Andi -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu Aug 24 21:16:56 2000 Received: by oss.sgi.com id ; Thu, 24 Aug 2000 21:16:35 -0700 Received: from adsl-151-196-242-3.bellatlantic.net ([151.196.242.3]:14580 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 24 Aug 2000 21:16:10 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id AAA05193; Fri, 25 Aug 2000 00:16:31 -0400 Date: Fri, 25 Aug 2000 00:16:31 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Ben Greear cc: Andi Kleen , Nick Towers , netdev@oss.sgi.com Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux In-Reply-To: <39A5E0D1.A1994489@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 24 Aug 2000, Ben Greear wrote: > Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux > Andi Kleen wrote: > > On Thu, Aug 24, 2000 at 07:43:29PM +0200, Nick Towers wrote: > > > I am looking at starting a project to implment 802.3ad ethernet link > > > aggregation under Linux. For those of you that have never heard of it > > > I've given a description and a few pointers at the end. Before I > > > launch into it I was wondering if there is anyone else out there who > > I started work some time ago. It is basically an user space problem. > > How would it be user-space? Doesn't it aggregate several physical > layers together into one interface? That seems like a kernel > level thing to me.... 802.3ad is not just channel bonding (the kernel-level mechanism), it also including sending packets down each link to detect topology and verify that the connection continues to work. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Beowulf-II Cluster Distribution Annapolis MD 21403 From owner-netdev@oss.sgi.com Fri Aug 25 00:03:45 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 00:03:35 -0700 Received: from esebh02nok.ntc.nokia.com ([131.228.118.151]:11020 "EHLO esebh02nok.ntc.nokia.com") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 00:03:06 -0700 Received: by esebh02nok with Internet Mail Service (5.5.2650.10) id ; Fri, 25 Aug 2000 10:01:20 +0300 Message-ID: <2F6E594290A4D211B6580008C7894C860266C1A2@oueis04nok> From: jussi.ohenoja@nokia.com To: netdev@oss.sgi.com Subject: query for IPv6 implemented features list Date: Fri, 25 Aug 2000 10:01:17 +0300 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.10) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I am currently doing a cross-evaluation of IPv6 stack implementations on several different platforms. What is the current state of Linux IPv6 implementation? what is the status of IPv4 -> IPv6 migration and dual stack features? What kind of work is there still to be done? Is there a bug track list? Has anyone compiled a feature list describing the status of the stack against the requirements of RFCs? (Required/Recommended/Optional) I would appreciate any pointers, links and help on the subject. - Juice - jussi.ohenoja@nokia.com -- vaxi j{{ pois k{yt|st{ -- juice@swagman.org -- Did you know there is a vorlon inside you? -- From owner-netdev@oss.sgi.com Fri Aug 25 01:35:25 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 01:35:05 -0700 Received: from linuxcare.com.au ([203.29.91.49]:29192 "EHLO front.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 01:34:35 -0700 Received: from halfway.linuxcare.com.au (localhost [127.0.0.1]) by front.linuxcare.com.au (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id SAA29473 for ; Fri, 25 Aug 2000 18:33:16 +1000 X-Authentication-Warning: front.linuxcare.com.au: Host localhost [127.0.0.1] claimed to be halfway.linuxcare.com.au Received: from linuxcare.com.au (localhost [127.0.0.1]) by halfway.linuxcare.com.au (Postfix) with ESMTP id 0CD20816F; Fri, 25 Aug 2000 18:33:53 +1000 (EST) From: Rusty Russell To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: ip_build_xmit() skb_reserve() problem In-reply-to: Your message of "Thu, 24 Aug 2000 20:07:23 +0400." <200008241607.UAA17086@ms2.inr.ac.ru> Date: Fri, 25 Aug 2000 18:33:53 +1000 Message-Id: <20000825083354.0CD20816F@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In message <200008241607.UAA17086@ms2.inr.ac.ru> you write: > Hello! > > > (dev->hard_header_len + 15) & ~15) bytes in the skb may be not the same > > device that actually transmits packet (netfilter may reroute packet to > > another device for instance). > > This is problem of netfilter. If it wants to do such crap, > it should make it correctly at least. Yes, that's a `don't do that' for 2.4.0. It's pretty obscure, unless someone wants to fix it. Rusty. -- Hacking time. From owner-netdev@oss.sgi.com Fri Aug 25 03:42:07 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 03:41:47 -0700 Received: from fox.doc.ic.ac.uk ([146.169.1.1]:29451 "EHLO fox.doc.ic.ac.uk") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 03:41:14 -0700 Received: from flibble.doc.ic.ac.uk ([146.169.5.42] helo=doc.ic.ac.uk ident=ncet) by fox.doc.ic.ac.uk with esmtp (Exim 2.05 #1) id 13SGtr-0007Vf-00; Fri, 25 Aug 2000 11:39:51 +0100 Message-ID: <39A64CF7.43F7EC3@doc.ic.ac.uk> Date: Fri, 25 Aug 2000 11:39:51 +0100 From: Nick Towers Organization: Dept. of Computing, Imperial College, UK X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.17pre14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Donald Becker , Ben Greear CC: Andi Kleen , netdev@oss.sgi.com Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Donald Becker wrote: > > On Thu, 24 Aug 2000, Ben Greear wrote: > > > Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux > > Andi Kleen wrote: > > > On Thu, Aug 24, 2000 at 07:43:29PM +0200, Nick Towers wrote: > > > > I am looking at starting a project to implment 802.3ad ethernet link > > > > aggregation under Linux. For those of you that have never heard of it > > > I started work some time ago. It is basically an user space problem. > > > > How would it be user-space? Doesn't it aggregate several physical > > layers together into one interface? That seems like a kernel > > level thing to me.... > > 802.3ad is not just channel bonding (the kernel-level mechanism), it also > including sending packets down each link to detect topology and verify that > the connection continues to work. My original thought is that the actual aggregating and frame collection / distribution is certainly a kernel issue, as with the current bonding driver. As for the link detection and automatic aggregation of links, this is possibly a user space daemon - the number of packets is limited to an ethernet "slow protocol" of max 5 packets per second so as to give minimal effect to links which do not provide aggregation. However, the protocol assumes almost immediate response from links which do implement aggregation. Ie. Handling of the frame types is certainly a kernel based issue. The whole automatic topology side of the protocol is quite complicated and not supported in all switches - I would say it is certainly a "version 2" feature. Nick -- Nick Towers : Systems developer, Dept. of Computing, Imperial College n.towers@doc.ic.ac.uk or mailto:ncet@doc.ic.ac.uk for point and click If you feel lucky visit my web site - http://www.doc.ic.ac.uk/~ncet/ From owner-netdev@oss.sgi.com Fri Aug 25 03:46:36 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 03:46:17 -0700 Received: from luna.tlmat.unican.es ([193.144.186.2]:35078 "EHLO luna.tlmat.unican.es") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 03:45:55 -0700 Received: from centauro (lira.tlmat.unican.es [193.144.186.27]) by luna.tlmat.unican.es with SMTP (8.7.6/8.7.1) id NAA01636 for ; Fri, 25 Aug 2000 13:01:32 +0200 (METDST) Message-ID: <006501c00e81$e4fb0500$1bba90c1@tlmat.unican.es> From: =?iso-8859-1?B?UmFt824gQWf8ZXJv?= To: Subject: jiffies undeclared Date: Fri, 25 Aug 2000 12:47:37 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0062_01C00E92.A5DCD0E0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. ------=_NextPart_000_0062_01C00E92.A5DCD0E0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi all I'm writing a little module for Linux kernel, and when I have tried = to cope with timers, I had a seroius problem with jiffies. Everithing works fine, but when I try to get the value of jiffies, I = get a compiling error 'jiffies_Rsmp_0da02d67' undeclared (first use in this function) I am including the sched.h file, so I can't see what I am doing = wrong, could anybody tell me where is the problem... Thanks in advance... Ram=F3n ------=_NextPart_000_0062_01C00E92.A5DCD0E0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
    Hi = all


    I'm=20 writing a little module for Linux kernel, and when I have tried = to
cope with=20 timers, I had a seroius problem with jiffies.

   =20 Everithing works fine, but when I try to get the value of jiffies, I = get
a=20 compiling error

    'jiffies_Rsmp_0da02d67' = undeclared=20 (first use in this function)

    I am including = the=20 sched.h file, so I can't see what I am doing wrong,
could anybody = tell me=20 where is the problem...


    Thanks in=20 advance...

    = Ram=F3n

------=_NextPart_000_0062_01C00E92.A5DCD0E0-- From owner-netdev@oss.sgi.com Fri Aug 25 04:35:56 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 04:35:36 -0700 Received: from tml.hut.fi ([130.233.44.1]:64773 "EHLO tml-gw.tml.hut.fi") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 04:35:22 -0700 Received: (from smap@localhost) by tml-gw.tml.hut.fi (8.8.7/8.8.7) id OAA03334 for ; Fri, 25 Aug 2000 14:34:50 +0300 Received: from caffeine.tml.hut.fi(130.233.45.27) by tml-gw.tml.hut.fi via smap (V2.0) id xma003332; Fri, 25 Aug 00 14:34:29 +0300 Received: from morphine.tml.hut.fi (morphine.tml.hut.fi [130.233.45.7]) by caffeine.tml.hut.fi (8.10.2/8.10.2) with ESMTP id e7PBYhu07670 for ; Fri, 25 Aug 2000 14:34:43 +0300 (EET DST) Received: from localhost (lpetande@localhost) by morphine.tml.hut.fi (8.9.2/8.7.1) with ESMTP id OAA14791 for ; Fri, 25 Aug 2000 14:34:10 +0300 (EET DST) X-Authentication-Warning: morphine.tml.hut.fi: lpetande owned process doing -bs Date: Fri, 25 Aug 2000 14:34:10 +0300 (EET DST) From: Lars Henrik Petander To: "netdev@oss.sgi.com" Subject: manipulating the IPv6 routing table Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi! I am working on Mobile IPv6 for kernels 2.4.x and as a part of that I'm trying to make the changing of the default route "smoother". Currently in our system the routes related to the previous subnet (the default route and network and host routes) are deleted (with fib6_clean_tree) before the new default route is installed. This works fine except for a short break when there is no default route. I'm trying to change the system to delete the old routes only after installing the new ones. I've tried adding a hook to the ndisc_router_discovery function's end to delete the old routes. This does not work in spite of the routing table looks correct. Does anyone have an idea of what I'm doing wrong or any suggestions for an alternative? TIA, Henrik Petander From owner-netdev@oss.sgi.com Fri Aug 25 04:36:06 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 04:35:56 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:2778 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 04:35:16 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id HAA01701; Fri, 25 Aug 2000 07:34:45 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id HAA06521; Fri, 25 Aug 2000 07:34:44 -0400 (EDT) Date: Fri, 25 Aug 2000 07:34:44 -0400 (EDT) From: jamal To: Donald Becker cc: Marcell Gal , netdev@oss.sgi.com Subject: Re: performance question: delay interrupts In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi Donald, On Thu, 24 Aug 2000, Donald Becker wrote: > On Thu, 24 Aug 2000, Marcell Gal wrote: > > > Alan says the Nicstar ATM card has HW support for [369]00 us INT delay. > > Any ethernet cards known to have similar smart features? > > The 21143 is a common, low-cost example. > All gigabit Ethernet boards have some sort of mitigation. > > > Is anything like this configurable for some ethernet drivers, has > > anyone done some successful or unsuccessful experiments with this? > > See the work by Josip Loncaric in the Tulip mailing list achieve for a > write-up on the 21143 mitigation. > > I've also done experiments, but the result have usually gone directly into > the drivers. I've learned that writing up the results is usually wasted > time: A few years ago I did profiles of the typical transmit queue length > and packet dwell period. I wrote up the results, complete with charts and > graphs and Xs and Os on the back, but no one on the driver mailing lists was > interested. It's pretty mundane topic, and wouldn't make a substantial > conference paper. So now I just use the final numbers. > I will be interested and i think it would make a good Linux conference paper at least. BTW, i couldnt find Josip Loncaric's thread on this ... do you have a specific URL? (in any case that piece of code hasnt been propogated to the current 2.4 tulip driver; small patch below) In my experimentation, i found that hard-coding a specific mitigation value was not a very good idea because the ideal value really depended on the system/network load. So, i devised a dynamically adjusting mitigation value which is selected based on how overloaded the system is (stolen from TCP and based on observing tulip_max_interrupt_work history). I didnt spend a lot of time really tuning this so there could be better ways of doing this ... Of course this is the stuff i presented on at the OLS (amongst other things); i'll post a 2.4 patch on the least controvesial pieces when i get the time (maybe in the next week or so). I am interested in Josip's reasoning though. cheers, jamal The tulip patch -------- --- interrupt.c 2000/08/19 16:01:20 1.1 +++ interrupt.c 2000/08/25 11:19:32 @@ -314,12 +314,11 @@ oi++; } if (csr5 & TimerInt) { -#if 0 + if (tulip_debug > 2) printk(KERN_ERR "%s: Re-enabling interrupts, %8.8x.\n", dev->name, csr5); outl(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7); -#endif tp->ttimer = 0; oi++; } @@ -327,14 +326,19 @@ if (tulip_debug > 1) printk(KERN_WARNING "%s: Too much work during an interrupt, " "csr5=0x%8.8x. (%lu) (%d,%d,%d)\n", dev->name, csr5, tp->nir, tx, rx, oi); - /* Acknowledge all interrupt sources. */ -#if 0 - /* Clear all interrupting sources, set timer to re-enable. */ - outl(((~csr5) & 0x0001ebef) | NormalIntr | AbnormalIntr | TimerInt, - ioaddr + CSR7); - outl(12, ioaddr + CSR11); - tp->ttimer = 1; -#endif + + /* Acknowledge all interrupt sources. */ + outl(0x8001ffff, ioaddr + CSR5); + if (tp->flags & HAS_INTR_MITIGATION) { + /* Josip Loncaric at ICASE did extensive experimentation + to develop a good interrupt mitigation setting.*/ + outl(0x8b240000, ioaddr + CSR11); + } else { + /* Mask all interrupting sources, set timer to + re-enable. */ + outl(((~csr5) & 0x0001ebef) | AbnormalIntr | TimerInt, ioaddr + CSR7); + outl(0x0012, ioaddr + CSR11); + } break; } } while (work_count-- > 0); @@ -366,4 +370,3 @@ dev->name, inl(ioaddr + CSR5)); } - From owner-netdev@oss.sgi.com Fri Aug 25 06:11:46 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 06:11:36 -0700 Received: from mail.zmailer.org ([194.252.70.162]:22281 "EHLO zmailer.org") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 06:11:10 -0700 Received: (from localhost user: 'mea' uid#500 fake: STDIN (mea@zmailer.org)) by mail.zmailer.org id ; Fri, 25 Aug 2000 16:10:24 +0300 Date: Fri, 25 Aug 2000 16:10:24 +0300 From: Matti Aarnio To: Ben Greear Cc: netdev@oss.sgi.com Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux Message-ID: <20000825161024.N22907@mea-ext.zmailer.org> References: <39A55DEB.94242945@doc.ic.ac.uk> <20000824202203.A4710@fred.muc.de> <39A5E0D1.A1994489@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <39A5E0D1.A1994489@candelatech.com>; from greearb@candelatech.com on Thu, Aug 24, 2000 at 07:58:25PM -0700 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, Aug 24, 2000 at 07:58:25PM -0700, Ben Greear wrote: > Andi Kleen wrote: > > On Thu, Aug 24, 2000 at 07:43:29PM +0200, Nick Towers wrote: > > > Hello, > > > > > > I am looking at starting a project to implment 802.3ad ethernet link > > > aggregation under Linux. For those of you that have never heard of it > > > I've given a description and a few pointers at the end. Before I > > > launch into it I was wondering if there is anyone else out there who > > > > > > 1) Has started work on an implementation > > > > I started work some time ago. It is basically an user space problem. > > How would it be user-space? Doesn't it aggregate several physical > layers together into one interface? That seems like a kernel > level thing to me.... Like most of such things, there is the fast-path (of packet forwarding), and there is the management protocol. (Comparing to 802.1Q, a lot more than half of the specification is about the management protocol!) It would be nice if we could rid the kernel from routing protocols/ managing processes, and move all those into appropriate userspace daemons. (I mean here 802 spanning-tree bridging, VLAN management, IP multicast, etc.) After all, it is the purpose of such daemons to fill appropriate forwarding tables which kernel then uses to do the grunt work, while the daemons act as the last choice backup. > Ben > > -Andi > -- > Ben Greear (greearb@candelatech.com) http://www.candelatech.com /Matti Aarnio From owner-netdev@oss.sgi.com Fri Aug 25 07:04:47 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 07:04:37 -0700 Received: from fox.doc.ic.ac.uk ([146.169.1.1]:46351 "EHLO fox.doc.ic.ac.uk") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 07:04:12 -0700 Received: from flibble.doc.ic.ac.uk ([146.169.5.42] helo=doc.ic.ac.uk ident=ncet) by fox.doc.ic.ac.uk with esmtp (Exim 2.05 #1) id 13SK4U-0001Lv-00; Fri, 25 Aug 2000 15:03:02 +0100 Message-ID: <39A67C95.ABECEFC9@doc.ic.ac.uk> Date: Fri, 25 Aug 2000 15:03:01 +0100 From: Nick Towers Organization: Dept. of Computing, Imperial College, UK X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.17pre14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Matti Aarnio CC: netdev@oss.sgi.com Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux References: <39A55DEB.94242945@doc.ic.ac.uk> <20000824202203.A4710@fred.muc.de> <39A5E0D1.A1994489@candelatech.com> <20000825161024.N22907@mea-ext.zmailer.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Matti Aarnio wrote: > > > I started work some time ago. It is basically an user space problem. > > > > How would it be user-space? Doesn't it aggregate several physical > > layers together into one interface? That seems like a kernel > > level thing to me.... > > Like most of such things, there is the fast-path (of packet > forwarding), and there is the management protocol. > (Comparing to 802.1Q, a lot more than half of the specification > is about the management protocol!) Likewise with 802.3ad the specification for LACPDU packets is the major part of the spec, once all of the IEEE requirements stuff is jumped over. > It would be nice if we could rid the kernel from routing protocols/ > managing processes, and move all those into appropriate userspace > daemons. (I mean here 802 spanning-tree bridging, VLAN management, > IP multicast, etc.) In that case a logical split for development would be to split off the link aggregation control to a userland daemon - for those with a copy of the spec have a look at the diagram on page 97. As an aside, have the IEEE ever put forward a statement on how much discussion of their specs is allowed in open forums (I assume the ethernet and VLAN developments must have come up against this)? Nick -- Nick Towers : Systems developer, Dept. of Computing, Imperial College n.towers@doc.ic.ac.uk or mailto:ncet@doc.ic.ac.uk for point and click If you feel lucky visit my web site - http://www.doc.ic.ac.uk/~ncet/ From owner-netdev@oss.sgi.com Fri Aug 25 22:18:35 2000 Received: by oss.sgi.com id ; Fri, 25 Aug 2000 22:18:25 -0700 Received: from gondor.apana.org.au ([203.14.152.114]:54280 "EHLO gondor.apana.org.au") by oss.sgi.com with ESMTP id ; Fri, 25 Aug 2000 22:17:52 -0700 Received: (from herbert@localhost) by gondor.apana.org.au (8.11.0.Beta1/8.11.0.Beta1/Debian 8.11.0-1) id e7Q5H5f05579; Sat, 26 Aug 2000 15:17:05 +1000 From: Herbert Xu Date: Sat, 26 Aug 2000 15:17:04 +1000 To: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [PATCH] socket(2) should return EAFNOSUPPORT if no family found Message-ID: <20000826151704.A5552@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="UlVJffcvxoiEqYs2" Content-Disposition: inline User-Agent: Mutt/1.2i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing --UlVJffcvxoiEqYs2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Here is a trivial patch to make socket(2) return -EAFNOSUPPORT instead of -EINVAL if the family can't found. SuS specifies EAFNOSUPPORT for socket(2) in this situation. -- Debian GNU/Linux 2.2 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --UlVJffcvxoiEqYs2 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=p --- linux/net/socket.c.orig Sat Aug 26 15:08:29 2000 +++ linux/net/socket.c Sat Aug 26 15:09:36 2000 @@ -855,7 +855,7 @@ net_family_read_lock(); if (net_families[family] == NULL) { - i = -EINVAL; + i = -EAFNOSUPPORT; goto out; } --UlVJffcvxoiEqYs2-- From owner-netdev@oss.sgi.com Sat Aug 26 16:48:27 2000 Received: by oss.sgi.com id ; Sat, 26 Aug 2000 16:48:02 -0700 Received: from ns1046.munich.netsurf.de ([195.180.235.46]:49156 "HELO fred.muc.de") by oss.sgi.com with SMTP id ; Sat, 26 Aug 2000 16:47:44 -0700 Received: by fred.muc.de (Postfix, from userid 500) id 3AA24E38E0; Fri, 25 Aug 2000 13:34:45 +0200 (CEST) Date: Fri, 25 Aug 2000 13:34:45 +0200 From: Andi Kleen To: Ben Greear Cc: Andi Kleen , Nick Towers , netdev@oss.sgi.com Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux Message-ID: <20000825133445.A1630@fred.muc.de> References: <39A55DEB.94242945@doc.ic.ac.uk> <20000824202203.A4710@fred.muc.de> <39A5E0D1.A1994489@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <39A5E0D1.A1994489@candelatech.com>; from greearb@candelatech.com on Fri, Aug 25, 2000 at 04:11:53AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, Aug 25, 2000 at 04:11:53AM +0200, Ben Greear wrote: > Andi Kleen wrote: > > > > On Thu, Aug 24, 2000 at 07:43:29PM +0200, Nick Towers wrote: > > > Hello, > > > > > > I am looking at starting a project to implment 802.3ad ethernet link > > > aggregation under Linux. For those of you that have never heard of it > > > I've given a description and a few pointers at the end. Before I > > > launch into it I was wondering if there is anyone else out there who > > > > > > 1) Has started work on an implementation > > > > I started work some time ago. It is basically an user space problem. > > How would it be user-space? Doesn't it aggregate several physical > layers together into one interface? That seems like a kernel > level thing to me.... The required kernel level mechanisms already exit: bounding, teql, equal cost multipath routing. Only multipath routing is strictly conforming to the SPEC of keeping an "conversation" on the same line. Anyways, you just need a routing daemon in user space that listens to the multicast group and sets up the required bundles via one of these mechanisms. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Sat Aug 26 16:48:41 2000 Received: by oss.sgi.com id ; Sat, 26 Aug 2000 16:48:27 -0700 Received: from ns1046.munich.netsurf.de ([195.180.235.46]:49412 "HELO fred.muc.de") by oss.sgi.com with SMTP id ; Sat, 26 Aug 2000 16:47:45 -0700 Received: by fred.muc.de (Postfix, from userid 500) id E7EBAE3911; Fri, 25 Aug 2000 13:36:57 +0200 (CEST) Date: Fri, 25 Aug 2000 13:36:57 +0200 From: Andi Kleen To: Donald Becker Cc: Ben Greear , Andi Kleen , Nick Towers , netdev@oss.sgi.com Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux Message-ID: <20000825133657.B1630@fred.muc.de> References: <39A5E0D1.A1994489@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from becker@scyld.com on Fri, Aug 25, 2000 at 06:15:37AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, Aug 25, 2000 at 06:15:37AM +0200, Donald Becker wrote: > On Thu, 24 Aug 2000, Ben Greear wrote: > > > Subject: Re: Implementing 802.3ad-2000 link aggregation under Linux > > Andi Kleen wrote: > > > On Thu, Aug 24, 2000 at 07:43:29PM +0200, Nick Towers wrote: > > > > I am looking at starting a project to implment 802.3ad ethernet link > > > > aggregation under Linux. For those of you that have never heard of it > > > > I've given a description and a few pointers at the end. Before I > > > > launch into it I was wondering if there is anyone else out there who > > > I started work some time ago. It is basically an user space problem. > > > > How would it be user-space? Doesn't it aggregate several physical > > layers together into one interface? That seems like a kernel > > level thing to me.... > > 802.3ad is not just channel bonding (the kernel-level mechanism), it also > including sending packets down each link to detect topology and verify that > the connection continues to work. The kernel already has all necessary mechanisms for that in place: queryable neighbour states, after going into slow path the protocol does its own checking anyways. I believe you'll be able to do a fine 802.3ad implementation without any kernel changes. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Sat Aug 26 18:05:52 2000 Received: by oss.sgi.com id ; Sat, 26 Aug 2000 18:05:33 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:16901 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Sat, 26 Aug 2000 18:05:06 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id UAA32196 for netdev@oss.sgi.com; Sat, 26 Aug 2000 20:04:34 -0500 Message-ID: <20000826200433.B32166@doit.wisc.edu> Date: Sat, 26 Aug 2000 20:04:33 -0500 From: "James R. Leu" To: netdev@oss.sgi.com Subject: skb->dst->output and ip_send() Reply-To: jleu@mindspring.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing While figuring out how to get IP fragmentation to take into account the label stack on an outgoing LSP, I came across the code for ip_send() (in net/ip.h). I was disappointed to see that ip_send() doesn't use skb->dst->output. Right now ip_send() either sends the skb to be fragmented (with each fragment being delivered to ip_finish_output()) or directly to ip_finish_output(). I would have expected it to use skb->dst->output() (as is used most any other place a packet needs to go to the output portion of the IP stack) I understand that ip_output() sends the skb through some NAT code before reaching ip_output_finish() and the path through the kernel that ends up at ip_send() would already have gone through that code (or an equivlanet), but can't this be "cleaned up"? The input and output function pointers in the skb exists for a reason and if a particular protocol doesn't respect them, the whole scheme falls apart. For example right now my MPLS code has to put a very ugly check in ip_finish_output() that wouldn't have to be there if IPv4 respected the output function pointer. Ofcourse it could also be that case that I just don't understand all of the reasons why ip_send() refers to ip_finish_output(). I have a patch to "fix" this, but I haven't taken into account the NAT code. Jim -- James R. Leu From owner-netdev@oss.sgi.com Sat Aug 26 18:14:52 2000 Received: by oss.sgi.com id ; Sat, 26 Aug 2000 18:14:32 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:17413 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Sat, 26 Aug 2000 18:14:14 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id UAA32208 for netdev@oss.sgi.com; Sat, 26 Aug 2000 20:13:43 -0500 Message-ID: <20000826201342.C32166@doit.wisc.edu> Date: Sat, 26 Aug 2000 20:13:42 -0500 From: "James R. Leu" To: netdev@oss.sgi.com Subject: Protocol specific data in the FIB Reply-To: jleu@mindspring.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing For protocols that live somewhere between layer 2 and layer 3 they have some unique needs from the layer 3 protocols. To be specific MPLS needs to store labelling information for each entry in the FIB. If the FIB had a protocol specific field (or set of fields) and a set of access functions for these fields then MPLS wouldn't have to be so intrusive in the FIB code. I have code that already meets the needs of MPLS but would like to get feedback from the list: -Are there other protocols that could take advantage of protocol specific fields in the FIB? -Is this even a good idea? Jim -- James R. Leu From owner-netdev@oss.sgi.com Sat Aug 26 19:53:52 2000 Received: by oss.sgi.com id ; Sat, 26 Aug 2000 19:53:32 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:54705 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Sat, 26 Aug 2000 19:53:01 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id WAA03085; Sat, 26 Aug 2000 22:52:31 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id WAA10170; Sat, 26 Aug 2000 22:52:30 -0400 (EDT) Date: Sat, 26 Aug 2000 22:52:30 -0400 (EDT) From: jamal To: "James R. Leu" cc: netdev@oss.sgi.com Subject: Re: skb->dst->output and ip_send() In-Reply-To: <20000826200433.B32166@doit.wisc.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi Jim, On Sat, 26 Aug 2000, James R. Leu wrote: > While figuring out how to get IP fragmentation to take into account the label > stack on an outgoing LSP, I came across the code for ip_send() (in net/ip.h). > I was disappointed to see that ip_send() doesn't use skb->dst->output. > ip_send() is actually in the data path of dst->input() which would be part of say an LERs path. All outgoing packets use dst->output() i.e no deviation. What are you trying to do? I think your code works fine (although its been a while since i'd seen it) cheers, jamal From owner-netdev@oss.sgi.com Sat Aug 26 19:58:02 2000 Received: by oss.sgi.com id ; Sat, 26 Aug 2000 19:57:43 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:24242 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Sat, 26 Aug 2000 19:57:29 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id WAA03597; Sat, 26 Aug 2000 22:56:58 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id WAA10184; Sat, 26 Aug 2000 22:56:57 -0400 (EDT) Date: Sat, 26 Aug 2000 22:56:57 -0400 (EDT) From: jamal To: "James R. Leu" cc: netdev@oss.sgi.com Subject: Re: Protocol specific data in the FIB In-Reply-To: <20000826201342.C32166@doit.wisc.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, 26 Aug 2000, James R. Leu wrote: > For protocols that live somewhere between layer 2 and layer 3 they have some > unique needs from the layer 3 protocols. > > To be specific MPLS needs to store labelling information for each entry in the > FIB. If the FIB had a protocol specific field (or set of fields) and a set > of access functions for these fields then MPLS wouldn't have to be so intrusive > in the FIB code. > > I have code that already meets the needs of MPLS but would like to get > feedback from the list: > > -Are there other protocols that could take advantage of protocol specific > fields in the FIB? > -Is this even a good idea? > If i remember correctly you (could you post the URL?) only mod-ed fib_nh and fib_result to contain the mpls specific data? fib_nh makes sense because a label is a nh specific detail and fib_result because you need the result for further processing. post the URL cheers, jamal From owner-netdev@oss.sgi.com Sat Aug 26 20:17:43 2000 Received: by oss.sgi.com id ; Sat, 26 Aug 2000 20:17:33 -0700 Received: from pizda.ninka.net ([216.101.162.242]:34724 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Sat, 26 Aug 2000 20:17:07 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id UAA05203; Sat, 26 Aug 2000 20:04:20 -0700 Date: Sat, 26 Aug 2000 20:04:20 -0700 Message-Id: <200008270304.UAA05203@pizda.ninka.net> From: "David S. Miller" To: herbert@gondor.apana.org.au CC: netdev@oss.sgi.com, linux-kernel@vger.kernel.org In-reply-to: <20000826151704.A5552@gondor.apana.org.au> (message from Herbert Xu on Sat, 26 Aug 2000 15:17:04 +1000) Subject: Re: [PATCH] socket(2) should return EAFNOSUPPORT if no family found References: <20000826151704.A5552@gondor.apana.org.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Patch applied, thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Aug 27 05:03:56 2000 Received: by oss.sgi.com id ; Sun, 27 Aug 2000 05:03:36 -0700 Received: from colin.muc.de ([193.149.48.1]:46345 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Sun, 27 Aug 2000 05:03:02 -0700 Received: by colin.muc.de id <140576-1>; Sun, 27 Aug 2000 14:02:02 +0200 Message-ID: <20000827140156.44836@colin.muc.de> From: Andi Kleen To: jleu@mindspring.com Cc: netdev@oss.sgi.com Subject: Re: Protocol specific data in the FIB References: <20000826201342.C32166@doit.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <20000826201342.C32166@doit.wisc.edu>; from James R. Leu on Sun, Aug 27, 2000 at 03:18:51AM +0200 Date: Sun, 27 Aug 2000 14:01:57 +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, Aug 27, 2000 at 03:18:51AM +0200, James R. Leu wrote: > For protocols that live somewhere between layer 2 and layer 3 they have some > unique needs from the layer 3 protocols. > > To be specific MPLS needs to store labelling information for each entry in the > FIB. If the FIB had a protocol specific field (or set of fields) and a set > of access functions for these fields then MPLS wouldn't have to be so intrusive > in the FIB code. > > I have code that already meets the needs of MPLS but would like to get > feedback from the list: > > -Are there other protocols that could take advantage of protocol specific > fields in the FIB? IPSec. > -Is this even a good idea? I think it is, but you usually do not want to access the FIB directly, but only do it via the destination cache (and hide the setup in rtnetlink) It wouldn't be that hard to extend the current fib/dst_entry attribute mechanism to store more information. For speed you want preallocated positions though, nothing dynamic. -Andi From owner-netdev@oss.sgi.com Sun Aug 27 11:28:08 2000 Received: by oss.sgi.com id ; Sun, 27 Aug 2000 11:27:58 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:24325 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Sun, 27 Aug 2000 11:27:33 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id NAA32650; Sun, 27 Aug 2000 13:27:00 -0500 Message-ID: <20000827132700.B32642@doit.wisc.edu> Date: Sun, 27 Aug 2000 13:27:00 -0500 From: "James R. Leu" To: jamal Cc: netdev@oss.sgi.com Subject: Re: Protocol specific data in the FIB Reply-To: jleu@mindspring.com References: <20000826201342.C32166@doit.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 In-Reply-To: ; from jamal on Sat, Aug 26, 2000 at 10:56:57PM -0400 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing The URL for MPLS for Linux is: http://nero.doit.wisc.edu/mpls-linux/ Follow the link to the FTP site and grab linux-mpls-ldp-0.202.tar.gz. If you look at linux-mpls-ldp/kernel/mpls.diff you'll see the modifications to fib_hash.c and fib_rules.c. Jim On Sat, Aug 26, 2000 at 10:56:57PM -0400, jamal wrote: > > > On Sat, 26 Aug 2000, James R. Leu wrote: > > > For protocols that live somewhere between layer 2 and layer 3 they have some > > unique needs from the layer 3 protocols. > > > > To be specific MPLS needs to store labelling information for each entry in the > > FIB. If the FIB had a protocol specific field (or set of fields) and a set > > of access functions for these fields then MPLS wouldn't have to be so intrusive > > in the FIB code. > > > > I have code that already meets the needs of MPLS but would like to get > > feedback from the list: > > > > -Are there other protocols that could take advantage of protocol specific > > fields in the FIB? > > -Is this even a good idea? > > > > If i remember correctly you (could you post the URL?) only mod-ed fib_nh > and fib_result to contain the mpls specific data? > fib_nh makes sense because a label is a nh specific detail and fib_result > because you need the result for further processing. > post the URL > > cheers, > jamal -- James R. Leu From owner-netdev@oss.sgi.com Sun Aug 27 11:40:49 2000 Received: by oss.sgi.com id ; Sun, 27 Aug 2000 11:40:29 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:25605 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Sun, 27 Aug 2000 11:39:54 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id NAA32667; Sun, 27 Aug 2000 13:39:22 -0500 Message-ID: <20000827133921.C32642@doit.wisc.edu> Date: Sun, 27 Aug 2000 13:39:21 -0500 From: "James R. Leu" To: jamal Cc: netdev@oss.sgi.com Subject: Re: skb->dst->output and ip_send() Reply-To: jleu@mindspring.com References: <20000826200433.B32166@doit.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 In-Reply-To: ; from jamal on Sat, Aug 26, 2000 at 10:52:30PM -0400 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, Aug 26, 2000 at 10:52:30PM -0400, jamal wrote: > > Hi Jim, > > On Sat, 26 Aug 2000, James R. Leu wrote: > > > While figuring out how to get IP fragmentation to take into account the label > > stack on an outgoing LSP, I came across the code for ip_send() (in net/ip.h). > > I was disappointed to see that ip_send() doesn't use skb->dst->output. > > > > ip_send() is actually in the data path of dst->input() which would be part > of say an LERs path. Correct, and the IP fragmentation calculation is done in ip_send(). But instead of it refering to dst->output() (which in LER mode is set to mpls_output()) it refers to ip_finish_output(). So I had to modify ip_finish_output() to recognize skb's that need to be redirected to mpls_output(). > All outgoing packets use dst->output() i.e no deviation. > What are you trying to do? I think your code works fine (although its been > a while since i'd seen it) The "fix" I have made is to set dst->output() to ip_finish_output() in ip_route_input_slow(). Then changed ip_send() to refer to dst->output(). So far it works well for normal IP forwarding and MPLS LER forwarding. Unfortunaly I don't know how this will affect the other features of the IP stack. With the above fix the MPLS forwarding path only needs to modify the IPv4 stack in 3 files (fib_hash.c fib_result.c are modified to store label binding in the FIB and route.c is modified to setup route cache entries that refer to the MPLS stack.) I can post a patch for the change I'm suggesting, I would like to get feedback from others as to what this will break. Jim -- James R. Leu From owner-netdev@oss.sgi.com Sun Aug 27 14:54:59 2000 Received: by oss.sgi.com id ; Sun, 27 Aug 2000 14:54:49 -0700 Received: from sitemail.everyone.net ([216.200.145.35]:1802 "HELO omta01.mta.everyone.net") by oss.sgi.com with SMTP id ; Sun, 27 Aug 2000 14:54:22 -0700 Received: from sitemail.everyone.net (reports [216.200.145.62]) by omta01.mta.everyone.net (Postfix) with ESMTP id 76178217A for ; Sun, 27 Aug 2000 14:53:51 -0700 (PDT) Received: by sitemail.everyone.net (Postfix, from userid 60001) id 09791E0B5; Sun, 27 Aug 2000 14:53:50 -0700 (PDT) Content-Type: text/plain Content-Disposition: inline Mime-Version: 1.0 X-Mailer: MIME-tools 4.104 (Entity 4.117) Date: Sun, 27 Aug 2000 14:53:50 -0700 (PDT) From: Adam Slattery To: netdev@oss.sgi.com Subject: Specifying source IP in Linux 2.4.0-test Reply-To: aslattery@staticedge.com X-Originating-Ip: [63.26.215.171] Message-Id: <20000827215350.09791E0B5@sitemail.everyone.net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi. Hopefully this is the correct mailing list... Anyway, recently I've been programming with raw IP sockets. I've been working with Linux 2.4.0-test{1,6}, and have come across a problem. I have verified that my code is in fact correct because it works on 2.2.13 and 2.0.38 (I happened to have the source for those kernels). The problem is that the kernel seems to be ignoring my setsockopt() call that is supposed to tell the kernel to use my IP header rather than creating one on it's own. When I try to specify a bogus source IP address in my own header, the kernel always changes it to whatever IP is on the interface the packet will go out on(I haven't tried this on an interface with aliases). I have played with a few DoS exploits and whatever other code I could get my hands on to see if IP spoofing worked or not, and I have found that it does not. This definately seems like a bug (feature?) in 2.4.0 to me. I've looked for the place where the problem might lie for the past week with no luck. I'm starting school again and no longer have much time to do any coding, so I figured I would ask some people on a mailing list about this. If you would like some code to prove my statements above, just ask and i'll include it in another posting (or mail it to you directly). BTW, this seems like a pretty major bug, so I'm extremely suprised that I haven't found anybody else with this problem yet. Thanks, Adam Slattery _____________________________________________________________ Get free email for life! ---> StaticEdge.com From owner-netdev@oss.sgi.com Mon Aug 28 03:52:01 2000 Received: by oss.sgi.com id ; Mon, 28 Aug 2000 03:51:41 -0700 Received: from luna.tlmat.unican.es ([193.144.186.2]:774 "EHLO luna.tlmat.unican.es") by oss.sgi.com with ESMTP id ; Mon, 28 Aug 2000 03:51:17 -0700 Received: from centauro (lira.tlmat.unican.es [193.144.186.27]) by luna.tlmat.unican.es with SMTP (8.7.6/8.7.1) id NAA03934 for ; Mon, 28 Aug 2000 13:06:58 +0200 (METDST) Message-ID: <003401c010de$28d7f680$1bba90c1@tlmat.unican.es> From: =?iso-8859-1?B?UmFt824gQWf8ZXJv?= To: Subject: Delayed use of ip_rcv Date: Mon, 28 Aug 2000 12:53:11 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0031_01C010EE.EBE3A740" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. ------=_NextPart_000_0031_01C010EE.EBE3A740 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi all, I'm working in the development of a little module within Linux = kernel 2.2.14, which deals with networking issues. One of the things I = have to be able to do is retransmit packets after a certain delay. To do = it I have done the following... 1) When I want to retransmit a certain packet, I copy the = sk_buff. (the new sk_buff is stored in fwd) 2) The normal packet goes correctly to its destin.... 3) Then I initialize a timer_list structure to manage the delayed = transmission: a) struct timer_list *timer; /* I reserve space for = this structure */ b) init_timer(timer); c) timer->expires=3Djiffies+5*HZ; d) timer->function=3Drxmit; e) timer->data=3D(unsigned long)fwd; f) add_timer(timer); 4) Within the rxmit function what I do is the following.. a) struct sk_buff *fwd=3D(struct sk_buff *)data; = /* data is the parameter (unsigned long) that I pass to this = function */ b) fwd->h.raw=3Dfwd->data; c) fwd->nh.raw=3Dfwd->data; d) ip_rcv(fwd,fwd->dev,NULL); The timer stuff works correctly (if I do not include the ip_rcv = line) everithing goes all right (I can see my debug messages).. On the = other hand, if I include the ip_rcv order just after the copying, not = dealing with the delay it works fine, so I can't see what is the = problem. Both things (the timer delay and the ip_rcv function) works = fine by their own, but when I join them, the kernel crashes... (however, = the ip datagram is received by the destin). I would be grateful if someone gives me a piece of help with this = issue.. Best regards... Ram=F3n PD.- Please I wish to be personally CC'ed the answers/comments posted to = the list in response to my posting =20 ------=_NextPart_000_0031_01C010EE.EBE3A740 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
    Hi all,
 
    I'm working in the development of = a little=20 module within Linux kernel 2.2.14, which deals with networking issues. = One of=20 the things I have to be able to do is retransmit packets after a certain = delay.=20 To do it I have done the following...
 
    1)    When I want = to=20 retransmit a certain packet, I copy the sk_buff. (the new sk_buff is = stored in=20 fwd)
    2)    The normal = packet goes=20 correctly to its destin....
    3)    Then I = initialize a=20 timer_list structure to manage the delayed transmission:
        =    =20     a)    struct timer_list *timer; /* I = reserve=20 space for this structure */
          &nbs= p;    =20 b)    init_timer(timer);
          &nbs= p;    =20 c)    timer->expires=3Djiffies+5*HZ;
        =    =20     d)    = timer->function=3Drxmit;
        =    =20     e)    timer->data=3D(unsigned=20 long)fwd;
        =    =20     f)    add_timer(timer);
    4)    Within the = rxmit=20 function what I do is the following..
        =    =20     a)    struct sk_buff *fwd=3D(struct = sk_buff=20 *)data;            /* data = is the=20 parameter (unsigned long) that I pass to this function */
        =    =20     b)    = fwd->h.raw=3Dfwd->data;
        =    =20     c)   =20 fwd->nh.raw=3Dfwd->data;
        =    =20     d)   =20 ip_rcv(fwd,fwd->dev,NULL);
 
    The timer stuff works correctly = (if I do=20 not include the ip_rcv line) everithing goes all right (I can see my = debug=20 messages).. On the other hand, if I include the ip_rcv order just after = the=20 copying, not dealing with the delay it works fine, so I can't see what = is the=20 problem. Both things (the timer delay and the ip_rcv function) works = fine by=20 their own, but when I join them, the kernel crashes... (however, the ip = datagram=20 is received by the destin).
 
    I would be grateful if someone = gives me a=20 piece of help with this issue..
 
    Best regards...
 
    Ram=F3n
 
 
PD.- Please I wish to be personally CC'ed the=20 answers/comments posted to the list in response to my=20 posting
 
   
------=_NextPart_000_0031_01C010EE.EBE3A740-- From owner-netdev@oss.sgi.com Mon Aug 28 05:38:02 2000 Received: by oss.sgi.com id ; Mon, 28 Aug 2000 05:37:52 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:35821 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 28 Aug 2000 05:37:26 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id IAA23635; Mon, 28 Aug 2000 08:36:55 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id IAA11510; Mon, 28 Aug 2000 08:36:55 -0400 (EDT) Date: Mon, 28 Aug 2000 08:36:55 -0400 (EDT) From: jamal To: "James R. Leu" cc: netdev@oss.sgi.com Subject: Re: skb->dst->output and ip_send() In-Reply-To: <20000827133921.C32642@doit.wisc.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, 27 Aug 2000, James R. Leu wrote: > Correct, and the IP fragmentation calculation is done in ip_send(). But > instead of it refering to dst->output() (which in LER mode is set to > mpls_output()) it refers to ip_finish_output(). So I had to modify > ip_finish_output() to recognize skb's that need to be redirected to > mpls_output(). > > > All outgoing packets use dst->output() i.e no deviation. > > What are you trying to do? I think your code works fine (although its been > > a while since i'd seen it) > > The "fix" I have made is to set dst->output() to ip_finish_output() > in ip_route_input_slow(). Then changed ip_send() to refer to dst->output(). > So far it works well for normal IP forwarding and MPLS LER forwarding. > Unfortunaly I don't know how this will affect the other features of the IP > stack. > Why dont you post the patch/fix? Hopefully, other people can make comments then. From looking at the above, it seems to me infact you totaly avoided fragmentation by changing ip_send() ;-> The real clean way for you to do it, and conforming to the current "tradition", is to have a dst->input() as well as an output() i.e not having both of them pointing to mpls_output(); however, that is a lot of code replication and reason why the little dirty hacks might be acceptable. cheers, jamal From owner-netdev@oss.sgi.com Mon Aug 28 10:07:52 2000 Received: by oss.sgi.com id ; Mon, 28 Aug 2000 10:07:12 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:17159 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 28 Aug 2000 10:06:38 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA04443; Mon, 28 Aug 2000 21:04:44 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008281704.VAA04443@ms2.inr.ac.ru> Subject: Re: skb->dst->output and ip_send() To: jleu@mindspring.COM Date: Mon, 28 Aug 2000 21:04:44 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000826200433.B32166@doit.wisc.edu> from "James R. Leu" at Aug 27, 0 05:15:01 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 989 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Ofcourse it could also be that case that I just don't understand all of the > reasons why ip_send() refers to ip_finish_output(). There are no special reasons. It is simply silly to use indirect call, when we know where it is indirected. Moreover, dst->output is never used for packet not generated locally. Rule is simple: packets, arrived from network use dst->input(), locally generated packets use dst->output(). The only case, when both of methods are used is when packet is looped back. Actually, dst->output for forwarding routes could be set to ip_rt_bug(). What's about your problem, I simply do not understand it. If you want to do something with packet, use netfilter. Paul designed this amazing engine exactly to allow you to do everything. Even if you prefer to hack something in core by some reasons, ip_finish_output is the only place, where to do this. All the packets go through it. I even do not understand, what you planned to do with dst->output. Alexey From owner-netdev@oss.sgi.com Mon Aug 28 10:10:21 2000 Received: by oss.sgi.com id ; Mon, 28 Aug 2000 10:10:12 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:18695 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 28 Aug 2000 10:09:58 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA04476; Mon, 28 Aug 2000 21:07:53 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200008281707.VAA04476@ms2.inr.ac.ru> Subject: Re: Specifying source IP in Linux 2.4.0-test To: aslattery@staticedge.COM Date: Mon, 28 Aug 2000 21:07:53 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000827215350.09791E0B5@sitemail.everyone.net> from "Adam Slattery" at Aug 28, 0 02:15:00 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 257 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > The problem is that the kernel seems to be ignoring my setsockopt() > call that is supposed to tell the kernel to use my IP header rather than > creating one on it's own. When I try to specify a bogus source IP address Disable netfilter. Alexey From owner-netdev@oss.sgi.com Mon Aug 28 22:09:48 2000 Received: by oss.sgi.com id ; Mon, 28 Aug 2000 22:09:38 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:7172 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Mon, 28 Aug 2000 22:09:10 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id OAA04825 for ; Tue, 29 Aug 2000 14:07:32 +0900 To: netdev@oss.sgi.com Subject: How to get addresses in kernel code? From: Hideaki YOSHIFUJI X-Mailer: Mew version 1.94 on XEmacs 21.1 (Capitol Reef) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20000829140731L.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Tue, 29 Aug 2000 14:07:31 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 18 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I'd like to dump addresses (ipv6, ipv4 etc.) in kernel. It seems that I can use rtnetlink_dump_all() defined in net/core/rtnetlink.c for this. Could anyone tell me how to use this function? What is the meaning of netlink_callback *cb, and what value shuld I set to it? Thanks in advance. -- Hideaki YOSHIFUJI @ USAGI Project Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Mon Aug 28 22:13:27 2000 Received: by oss.sgi.com id ; Mon, 28 Aug 2000 22:13:07 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:4 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Mon, 28 Aug 2000 22:12:49 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id OAA04825 for ; Tue, 29 Aug 2000 14:07:32 +0900 To: netdev@oss.sgi.com Subject: How to get addresses in kernel code? From: Hideaki YOSHIFUJI X-Mailer: Mew version 1.94 on XEmacs 21.1 (Capitol Reef) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20000829140731L.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Tue, 29 Aug 2000 14:07:31 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 18 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I'd like to dump addresses (ipv6, ipv4 etc.) in kernel. It seems that I can use rtnetlink_dump_all() defined in net/core/rtnetlink.c for this. Could anyone tell me how to use this function? What is the meaning of netlink_callback *cb, and what value shuld I set to it? Thanks in advance. -- Hideaki YOSHIFUJI @ USAGI Project Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Tue Aug 29 15:53:24 2000 Received: by oss.sgi.com id ; Tue, 29 Aug 2000 15:53:04 -0700 Received: from [212.84.236.131] ([212.84.236.131]:26116 "HELO convergence.de") by oss.sgi.com with SMTP id ; Tue, 29 Aug 2000 15:52:43 -0700 Received: (qmail 2287 invoked by uid 100); 29 Aug 2000 22:52:59 -0000 Date: Wed, 30 Aug 2000 00:52:59 +0200 From: Felix von Leitner To: linux-kernel@vger.kernel.org Cc: netdev@oss.sgi.com Subject: [2.4.0test6] IPv6 link-local TCP still slightly broken Message-ID: <20000830005259.A2246@convergence.de> Mail-Followup-To: Felix von Leitner , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing The application that triggers this problem is npoll (to be found at http://www.fefe.de/ncp/). Basically, it consists of two parts, npush and npoll. npush will send IPv6 multicast packets to the network and npoll will use them to see where it has to connect(). The multicast packets are send out on eth0, and (I had a bug report about this earlier) the IPv6 scope_id now signals the interface as eth0, too. npoll then tries to connect to my own link local address on eth0. This fails horribly. Here is a tcpdump: 00:14:38.506140 ::1.32796 > fe80::260:67ff:fe33:d15b.8002: S 3266263975:3266263975(0) win 32144 00:14:38.506180 fe80::260:67ff:fe33:d15b.8002 > ::1.32796: S 3269115708:3269115708(0) ack 3266263976 win 32144 00:14:38.506191 ::1.32796 > fe80::260:67ff:fe33:d15b.8002: R 3266263976:3266263976(0) win 0 00:14:41.502287 ::1.32796 > fe80::260:67ff:fe33:d15b.8002: S 3266263975:3266263975(0) win 32144 00:14:41.502316 fe80::260:67ff:fe33:d15b.8002 > ::1.32796: R 1025851588:1025851588(0) ack 1 win 0 Due to another bug that I haven't isolated yet, the packets are actually received twice by tcpdump. I edited the duplicates away. Please note that this traffic was captured off lo, not eth0, so although I specified eth0 as interface, the SYN packet is sent over lo. And, the source IP is ::1, the loopback IP. Apparently, through some major kludge somewhere, the kernel is allowed to send a packet from an eth0 link-local address to ::1 on lo, but the receiver code understands that this can't be right and tries to reset the connection. Please note that the reset is apparently never seen by the other side, though, so the connect() system call eventually times out. This is just a dump of the first seconds. I don't think that "::1" is the proper source address. This should be changed to fe80::260:67ff:fe33:d15b, my link-local address. Then, the lo code should be changed to allow packets to and from any of my link-local addresses. Any comments? Does this make any sense? I am not sure if my resubscription on linux-kernel worked (haven't received a confirmation yet), and I am not subscribed to netdev yet, so please carbon-copy comments to me. Thanks. Felix From owner-netdev@oss.sgi.com Tue Aug 29 16:01:34 2000 Received: by oss.sgi.com id ; Tue, 29 Aug 2000 16:01:24 -0700 Received: from pizda.ninka.net ([216.101.162.242]:32128 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Tue, 29 Aug 2000 16:01:06 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA01028; Tue, 29 Aug 2000 15:48:50 -0700 Date: Tue, 29 Aug 2000 15:48:50 -0700 Message-Id: <200008292248.PAA01028@pizda.ninka.net> From: "David S. Miller" To: felix@convergence.de CC: linux-kernel@vger.kernel.org, netdev@oss.sgi.com In-reply-to: <20000830005259.A2246@convergence.de> (message from Felix von Leitner on Wed, 30 Aug 2000 00:52:59 +0200) Subject: Re: [2.4.0test6] IPv6 link-local TCP still slightly broken References: <20000830005259.A2246@convergence.de> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Wed, 30 Aug 2000 00:52:59 +0200 From: Felix von Leitner Due to another bug that I haven't isolated yet, the packets are actually received twice by tcpdump. This is not a bug at all. Over loopback, you will see the packet twice because it is captured at two points 1) where it is transmitted over loopback 2) where it is received over loopback, just as it would be for ethernet transmit/receive. This behavior is consistent with how every other interface captures packets and it isn't going to change or be "fixed". Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Aug 29 19:04:15 2000 Received: by oss.sgi.com id ; Tue, 29 Aug 2000 19:04:05 -0700 Received: from linuxcare.com.au ([203.29.91.49]:9227 "EHLO front.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Tue, 29 Aug 2000 19:03:32 -0700 Received: from halfway.linuxcare.com.au (localhost [127.0.0.1]) by front.linuxcare.com.au (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id NAA15037 for ; Wed, 30 Aug 2000 13:02:02 +1100 X-Authentication-Warning: front.linuxcare.com.au: Host localhost [127.0.0.1] claimed to be halfway.linuxcare.com.au Received: from linuxcare.com.au (localhost [127.0.0.1]) by halfway.linuxcare.com.au (Postfix) with ESMTP id 6699E8172; Wed, 30 Aug 2000 13:02:57 +1100 (EST) From: Rusty Russell To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: Specifying source IP in Linux 2.4.0-test In-reply-to: Your message of "Mon, 28 Aug 2000 21:07:53 +0400." <200008281707.VAA04476@ms2.inr.ac.ru> Date: Wed, 30 Aug 2000 13:02:57 +1100 Message-Id: <20000830020257.6699E8172@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In message <200008281707.VAA04476@ms2.inr.ac.ru> you write: > Hello! > > > The problem is that the kernel seems to be ignoring my setsockopt() > > call that is supposed to tell the kernel to use my IP header rather than > > creating one on it's own. When I try to specify a bogus source IP address > > Disable netfilter. Well, disable the NAT module, at least. Rusty. -- Hacking time. From owner-netdev@oss.sgi.com Thu Aug 31 04:29:30 2000 Received: by oss.sgi.com id ; Thu, 31 Aug 2000 04:29:19 -0700 Received: from linuxcare.com.au ([203.29.91.49]:31239 "EHLO front.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 31 Aug 2000 04:29:02 -0700 Received: from halfway.linuxcare.com.au (localhost [127.0.0.1]) by front.linuxcare.com.au (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id WAA21550 for ; Thu, 31 Aug 2000 22:28:11 +1100 X-Authentication-Warning: front.linuxcare.com.au: Host localhost [127.0.0.1] claimed to be halfway.linuxcare.com.au Received: from linuxcare.com.au (localhost [127.0.0.1]) by halfway.linuxcare.com.au (Postfix) with ESMTP id D927B816F; Thu, 31 Aug 2000 22:29:11 +1100 (EST) From: Rusty Russell To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com, ges@liscon.com, netfilter@us4.samba.org Subject: Re: nfmark routing in ip_route_output() In-reply-to: Your message of "Sun, 13 Aug 2000 20:30:21 +0400." <200008131630.UAA04346@ms2.inr.ac.ru> Date: Thu, 31 Aug 2000 22:29:11 +1100 Message-Id: <20000831112911.D927B816F@halfway.linuxcare.com.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In message <200008131630.UAA04346@ms2.inr.ac.ru> you write: > If you do not want to depend on skb, add new function using > rt_key as argument. You may even replace ip_route_output() > with this new function everywhere, it will be a bit slower, > but it is worth to do, because has lots of useful applications > not bound to nfmark. OK. This is minimal source level change, so I don't break routing code this close to 2.4.0. Responsibility for rerouting is now handled by netfilter module which alters the packet: this fixes my major design mistake, and removes route_me_harder from IP stack code. It works (netfilter testsuite/00netfilter/10localmangle.sh): you can now change mark for LOCAL_OUT packets and they get rerouted like users want. Rusty. diff -urN -X /tmp/file5G2Cgt --minimal linux-2.4.0-test8-1/include/net/route.h working-2.4.0-test8-1/include/net/route.h --- linux-2.4.0-test8-1/include/net/route.h Wed Aug 30 19:40:05 2000 +++ working-2.4.0-test8-1/include/net/route.h Wed Aug 30 23:51:06 2000 @@ -94,12 +94,13 @@ extern struct ip_rt_acct *ip_rt_acct; +struct in_device; extern void ip_rt_init(void); extern void ip_rt_redirect(u32 old_gw, u32 dst, u32 new_gw, u32 src, u8 tos, struct net_device *dev); extern void ip_rt_advice(struct rtable **rp, int advice); extern void rt_cache_flush(int how); -extern int ip_route_output(struct rtable **, u32 dst, u32 src, u32 tos, int oif); +extern int ip_route_output_key(struct rtable **, const struct rt_key *key); extern int ip_route_input(struct sk_buff*, u32 dst, u32 src, u8 tos, struct net_device *devin); extern unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu); extern void ip_rt_update_pmtu(struct dst_entry *dst, unsigned mtu); @@ -110,6 +111,15 @@ extern int ip_rt_ioctl(unsigned int cmd, void *arg); extern void ip_rt_get_source(u8 *src, struct rtable *rt); extern int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb); + +/* Deprecated: use ip_route_output_key directly */ +extern __inline__ int ip_route_output(struct rtable **rp, + u32 daddr, u32 saddr, u32 tos, int oif) +{ + struct rt_key key = { dst:daddr, src:saddr, oif:oif, tos:tos }; + + return ip_route_output_key(rp, &key); +} extern __inline__ void ip_rt_put(struct rtable * rt) diff -urN -X /tmp/file5G2Cgt --minimal linux-2.4.0-test8-1/net/ipv4/route.c working-2.4.0-test8-1/net/ipv4/route.c --- linux-2.4.0-test8-1/net/ipv4/route.c Sun Aug 27 15:11:01 2000 +++ working-2.4.0-test8-1/net/ipv4/route.c Wed Aug 30 23:14:18 2000 @@ -1610,7 +1610,7 @@ * Major route resolver routine. */ -int ip_route_output_slow(struct rtable **rp, u32 daddr, u32 saddr, u32 tos, int oif) +int ip_route_output_slow(struct rtable **rp, const struct rt_key *oldkey) { struct rt_key key; struct fib_result res; @@ -1620,25 +1620,31 @@ unsigned hash; int free_res = 0; int err; + u32 tos; - tos &= IPTOS_RT_MASK|RTO_ONLINK; - key.dst = daddr; - key.src = saddr; + tos = oldkey->tos & (IPTOS_RT_MASK|RTO_ONLINK); + key.dst = oldkey->dst; + key.src = oldkey->src; key.tos = tos&IPTOS_RT_MASK; key.iif = loopback_dev.ifindex; - key.oif = oif; + key.oif = oldkey->oif; +#ifdef CONFIG_IP_ROUTE_FWMARK + key.fwmark = oldkey->fwmark; +#endif key.scope = (tos&RTO_ONLINK) ? RT_SCOPE_LINK : RT_SCOPE_UNIVERSE; res.fi = NULL; #ifdef CONFIG_IP_MULTIPLE_TABLES res.r = NULL; #endif - if (saddr) { - if (MULTICAST(saddr) || BADCLASS(saddr) || ZERONET(saddr)) + if (oldkey->src) { + if (MULTICAST(oldkey->src) + || BADCLASS(oldkey->src) + || ZERONET(oldkey->src)) return -EINVAL; /* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */ - dev_out = ip_dev_find(saddr); + dev_out = ip_dev_find(oldkey->src); if (dev_out == NULL) return -EINVAL; @@ -1650,8 +1656,8 @@ of another iface. --ANK */ - if (oif == 0 && - (MULTICAST(daddr) || daddr == 0xFFFFFFFF)) { + if (oldkey->oif == 0 + && (MULTICAST(oldkey->dst) || oldkey->dst == 0xFFFFFFFF)) { /* Special hack: user can direct multicasts and limited broadcast via necessary interface without fiddling with IP_MULTICAST_IF or IP_PKTINFO. @@ -1674,8 +1680,8 @@ dev_put(dev_out); dev_out = NULL; } - if (oif) { - dev_out = dev_get_by_index(oif); + if (oldkey->oif) { + dev_out = dev_get_by_index(oldkey->oif); if (dev_out == NULL) return -ENODEV; if (__in_dev_get(dev_out) == NULL) { @@ -1683,15 +1689,15 @@ return -ENODEV; /* Wrong error code */ } - if (LOCAL_MCAST(daddr) || daddr == 0xFFFFFFFF) { + if (LOCAL_MCAST(oldkey->dst) || oldkey->dst == 0xFFFFFFFF) { if (!key.src) key.src = inet_select_addr(dev_out, 0, RT_SCOPE_LINK); goto make_route; } if (!key.src) { - if (MULTICAST(daddr)) + if (MULTICAST(oldkey->dst)) key.src = inet_select_addr(dev_out, 0, key.scope); - else if (!daddr) + else if (!oldkey->dst) key.src = inet_select_addr(dev_out, 0, RT_SCOPE_HOST); } } @@ -1712,7 +1718,7 @@ if (fib_lookup(&key, &res)) { res.fi = NULL; - if (oif) { + if (oldkey->oif) { /* Apparently, routing tables are wrong. Assume, that the destination is on link. @@ -1800,7 +1806,7 @@ } else if (res.type == RTN_MULTICAST) { flags |= RTCF_MULTICAST|RTCF_LOCAL; read_lock(&inetdev_lock); - if (!__in_dev_get(dev_out) || !ip_check_mc(__in_dev_get(dev_out), daddr)) + if (!__in_dev_get(dev_out) || !ip_check_mc(__in_dev_get(dev_out), oldkey->dst)) flags &= ~RTCF_LOCAL; read_unlock(&inetdev_lock); /* If multicast route do not exist use @@ -1819,18 +1825,21 @@ atomic_set(&rth->u.dst.__refcnt, 1); rth->u.dst.flags= DST_HOST; - rth->key.dst = daddr; + rth->key.dst = oldkey->dst; rth->key.tos = tos; - rth->key.src = saddr; + rth->key.src = oldkey->src; rth->key.iif = 0; - rth->key.oif = oif; + rth->key.oif = oldkey->oif; +#ifdef CONFIG_IP_ROUTE_FWMARK + rth->key.fwmark = oldkey->fwmark; +#endif rth->rt_dst = key.dst; rth->rt_src = key.src; #ifdef CONFIG_IP_ROUTE_NAT rth->rt_dst_map = key.dst; rth->rt_src_map = key.src; #endif - rth->rt_iif = oif ? : dev_out->ifindex; + rth->rt_iif = oldkey->oif ? : dev_out->ifindex; rth->u.dst.dev = dev_out; dev_hold(dev_out); rth->rt_gateway = key.dst; @@ -1850,7 +1859,7 @@ if (res.type == RTN_MULTICAST) { struct in_device *in_dev = in_dev_get(dev_out); if (in_dev) { - if (IN_DEV_MFORWARD(in_dev) && !LOCAL_MCAST(daddr)) { + if (IN_DEV_MFORWARD(in_dev) && !LOCAL_MCAST(oldkey->dst)) { rth->u.dst.input = ip_mr_input; rth->u.dst.output = ip_mc_output; } @@ -1864,7 +1873,7 @@ rth->rt_flags = flags; - hash = rt_hash_code(daddr, saddr^(oif<<5), tos); + hash = rt_hash_code(oldkey->dst, oldkey->src^(oldkey->oif<<5), tos); err = rt_intern_hash(hash, rth, rp); done: if (free_res) @@ -1881,21 +1890,24 @@ goto done; } -int ip_route_output(struct rtable **rp, u32 daddr, u32 saddr, u32 tos, int oif) +int ip_route_output_key(struct rtable **rp, const struct rt_key *key) { unsigned hash; struct rtable *rth; - hash = rt_hash_code(daddr, saddr^(oif<<5), tos); + hash = rt_hash_code(key->dst, key->src^(key->oif<<5), key->tos); read_lock_bh(&rt_hash_table[hash].lock); for (rth=rt_hash_table[hash].chain; rth; rth=rth->u.rt_next) { - if (rth->key.dst == daddr && - rth->key.src == saddr && + if (rth->key.dst == key->dst && + rth->key.src == key->src && rth->key.iif == 0 && - rth->key.oif == oif && - !((rth->key.tos^tos)&(IPTOS_RT_MASK|RTO_ONLINK)) && - ((tos&RTO_TPROXY) || !(rth->rt_flags&RTCF_TPROXY)) + rth->key.oif == key->oif && +#ifdef CONFIG_IP_ROUTE_FWMARK + rth->key.fwmark == key->fwmark && +#endif + !((rth->key.tos^key->tos)&(IPTOS_RT_MASK|RTO_ONLINK)) && + ((key->tos&RTO_TPROXY) || !(rth->rt_flags&RTCF_TPROXY)) ) { rth->u.dst.lastuse = jiffies; dst_hold(&rth->u.dst); @@ -1907,8 +1919,8 @@ } read_unlock_bh(&rt_hash_table[hash].lock); - return ip_route_output_slow(rp, daddr, saddr, tos, oif); -} + return ip_route_output_slow(rp, key); +} #ifdef CONFIG_RTNETLINK diff -urN -X /tmp/file5G2Cgt --minimal linux-2.4.0-test8-1/net/netsyms.c working-2.4.0-test8-1/net/netsyms.c --- linux-2.4.0-test8-1/net/netsyms.c Sun Aug 27 15:11:01 2000 +++ working-2.4.0-test8-1/net/netsyms.c Wed Aug 30 23:22:35 2000 @@ -212,7 +212,7 @@ EXPORT_SYMBOL(inetdev_lock); EXPORT_SYMBOL(inet_add_protocol); EXPORT_SYMBOL(inet_del_protocol); -EXPORT_SYMBOL(ip_route_output); +EXPORT_SYMBOL(ip_route_output_key); EXPORT_SYMBOL(ip_route_input); EXPORT_SYMBOL(icmp_send); EXPORT_SYMBOL(icmp_reply); diff -urN -X /tmp/file5G2Cgt --minimal linux-2.4.0-test8-1/net/ipv4/igmp.c working-2.4.0-test8-1/net/ipv4/igmp.c --- linux-2.4.0-test8-1/net/ipv4/igmp.c Sat Aug 12 00:23:39 2000 +++ working-2.4.0-test8-1/net/ipv4/igmp.c Wed Aug 30 23:18:04 2000 @@ -184,7 +184,10 @@ #define IGMP_SIZE (sizeof(struct igmphdr)+sizeof(struct iphdr)+4) -static inline int igmp_send_report2(struct sk_buff *skb) +/* Don't just hand NF_HOOK skb->dst->output, in case netfilter hook + changes route */ +static inline int +output_maybe_reroute(struct sk_buff *skb) { return skb->dst->output(skb); } @@ -247,7 +250,7 @@ ih->csum=ip_compute_csum((void *)ih, sizeof(struct igmphdr)); return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev, - igmp_send_report2); + output_maybe_reroute); } diff -urN -X /tmp/file5G2Cgt --minimal linux-2.4.0-test8-1/net/ipv4/ip_output.c working-2.4.0-test8-1/net/ipv4/ip_output.c --- linux-2.4.0-test8-1/net/ipv4/ip_output.c Tue Aug 29 14:39:28 2000 +++ working-2.4.0-test8-1/net/ipv4/ip_output.c Wed Aug 30 23:18:15 2000 @@ -107,42 +107,11 @@ return 0; } -#ifdef CONFIG_NETFILTER -/* To preserve the cute illusion that a locally-generated packet can - be mangled before routing, we actually reroute if a hook altered - the packet. -RR */ -static int route_me_harder(struct sk_buff *skb) -{ - struct iphdr *iph = skb->nh.iph; - struct rtable *rt; - - if (ip_route_output(&rt, iph->daddr, iph->saddr, - RT_TOS(iph->tos) | RTO_CONN, - skb->sk ? skb->sk->bound_dev_if : 0)) { - printk("route_me_harder: No more route.\n"); - return -EINVAL; - } - - /* Drop old route. */ - dst_release(skb->dst); - - skb->dst = &rt->u.dst; - return 0; -} -#endif - -/* Do route recalc if netfilter changes skb. */ +/* Don't just hand NF_HOOK skb->dst->output, in case netfilter hook + changes route */ static inline int output_maybe_reroute(struct sk_buff *skb) { -#ifdef CONFIG_NETFILTER - if (skb->nfcache & NFC_ALTERED) { - if (route_me_harder(skb) != 0) { - kfree_skb(skb); - return -EINVAL; - } - } -#endif return skb->dst->output(skb); } @@ -311,25 +280,6 @@ struct rtable *rt = (struct rtable *)skb->dst; struct net_device *dev; struct iphdr *iph = skb->nh.iph; - -#ifdef CONFIG_NETFILTER - /* BLUE-PEN-FOR-ALEXEY. I don't understand; you mean I can't - hold the route as I pass the packet to userspace? -- RR - - You may hold it, if you really hold it. F.e. if netfilter - does not destroy handed skb with skb->dst attached, it - will be held. When it was stored in info->arg, then - it was not held apparently. Now (without second arg) it is evident, - that it is clean. --ANK - */ - if (rt==NULL || (skb->nfcache & NFC_ALTERED)) { - if (route_me_harder(skb) != 0) { - kfree_skb(skb); - return -EHOSTUNREACH; - } - rt = (struct rtable *)skb->dst; - } -#endif dev = rt->u.dst.dev; diff -urN -X /tmp/file5G2Cgt --minimal linux-2.4.0-test8-1/net/ipv4/netfilter/ip_nat_standalone.c working-2.4.0-test8-1/net/ipv4/netfilter/ip_nat_standalone.c --- linux-2.4.0-test8-1/net/ipv4/netfilter/ip_nat_standalone.c Fri Jul 28 21:36:46 2000 +++ working-2.4.0-test8-1/net/ipv4/netfilter/ip_nat_standalone.c Wed Aug 30 21:31:11 2000 @@ -161,6 +161,31 @@ return ip_nat_fn(hooknum, pskb, in, out, okfn); } +/* FIXME: change in oif may mean change in hh_len. Check and realloc + --RR */ +static int +route_me_harder(struct sk_buff *skb) +{ + struct iphdr *iph = skb->nh.iph; + struct rtable *rt; + struct rt_key key = { dst:iph->daddr, + src:iph->saddr, + oif:skb->sk ? skb->sk->bound_dev_if : 0, + tos:RT_TOS(iph->tos)|RTO_CONN, + fwmark:skb->nfmark }; + + if (ip_route_output_key(&rt, &key) != 0) { + printk("route_me_harder: No more route.\n"); + return -EINVAL; + } + + /* Drop old route. */ + dst_release(skb->dst); + + skb->dst = &rt->u.dst; + return 0; +} + static unsigned int ip_nat_local_fn(unsigned int hooknum, struct sk_buff **pskb, @@ -168,12 +193,23 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { + u_int32_t saddr, daddr; + unsigned int ret; + /* root is playing with raw sockets. */ if ((*pskb)->len < sizeof(struct iphdr) || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) return NF_ACCEPT; - return ip_nat_fn(hooknum, pskb, in, out, okfn); + saddr = (*pskb)->nh.iph->saddr; + daddr = (*pskb)->nh.iph->daddr; + + ret = ip_nat_fn(hooknum, pskb, in, out, okfn); + if (ret != NF_DROP && ret != NF_STOLEN + && ((*pskb)->nh.iph->saddr != saddr + || (*pskb)->nh.iph->daddr != daddr)) + return route_me_harder(*pskb) == 0 ? ret : NF_DROP; + return ret; } /* We must be after connection tracking and before packet filtering. */ diff -urN -X /tmp/file5G2Cgt --minimal linux-2.4.0-test8-1/net/ipv4/netfilter/iptable_mangle.c working-2.4.0-test8-1/net/ipv4/netfilter/iptable_mangle.c --- linux-2.4.0-test8-1/net/ipv4/netfilter/iptable_mangle.c Tue May 23 02:32:57 2000 +++ working-2.4.0-test8-1/net/ipv4/netfilter/iptable_mangle.c Wed Aug 30 23:51:40 2000 @@ -5,6 +5,11 @@ */ #include #include +#include +#include +#include +#include +#include #define MANGLE_VALID_HOOKS ((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_OUT)) @@ -86,6 +91,31 @@ return ipt_do_table(pskb, hook, in, out, &packet_mangler, NULL); } +/* FIXME: change in oif may mean change in hh_len. Check and realloc + --RR */ +static int +route_me_harder(struct sk_buff *skb) +{ + struct iphdr *iph = skb->nh.iph; + struct rtable *rt; + struct rt_key key = { dst:iph->daddr, + src:iph->saddr, + oif:skb->sk ? skb->sk->bound_dev_if : 0, + tos:RT_TOS(iph->tos)|RTO_CONN, + fwmark:skb->nfmark }; + + if (ip_route_output_key(&rt, &key) != 0) { + printk("route_me_harder: No more route.\n"); + return -EINVAL; + } + + /* Drop old route. */ + dst_release(skb->dst); + + skb->dst = &rt->u.dst; + return 0; +} + static unsigned int ipt_local_out_hook(unsigned int hook, struct sk_buff **pskb, @@ -93,6 +123,11 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { + unsigned int ret; + u_int8_t tos; + u_int32_t saddr, daddr; + unsigned long nfmark; + /* root is playing with raw sockets. */ if ((*pskb)->len < sizeof(struct iphdr) || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) { @@ -101,7 +136,22 @@ return NF_ACCEPT; } - return ipt_do_table(pskb, hook, in, out, &packet_mangler, NULL); + /* Save things which could affect route */ + nfmark = (*pskb)->nfmark; + saddr = (*pskb)->nh.iph->saddr; + daddr = (*pskb)->nh.iph->daddr; + tos = (*pskb)->nh.iph->tos; + + ret = ipt_do_table(pskb, hook, in, out, &packet_mangler, NULL); + /* Reroute for ANY change. */ + if (ret != NF_DROP && ret != NF_STOLEN + && ((*pskb)->nh.iph->saddr != saddr + || (*pskb)->nh.iph->daddr != daddr + || (*pskb)->nfmark != nfmark + || (*pskb)->nh.iph->tos != tos)) + return route_me_harder(*pskb) == 0 ? ret : NF_DROP; + + return ret; } static struct nf_hook_ops ipt_ops[] -- Hacking time. From owner-netdev@oss.sgi.com Thu Aug 31 16:32:13 2000 Received: by oss.sgi.com id ; Thu, 31 Aug 2000 16:32:03 -0700 Received: from pizda.ninka.net ([216.101.162.242]:11908 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 31 Aug 2000 16:31:41 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA04248; Thu, 31 Aug 2000 16:16:03 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14766.59187.494713.745149@pizda.ninka.net> Date: Thu, 31 Aug 2000 16:16:03 -0700 (PDT) To: Rusty Russell CC: netdev@oss.sgi.com, ges@liscon.com, netfilter@us4.samba.org, kuznet@ms2.inr.ac.ru Subject: Re: nfmark routing in ip_route_output() In-Reply-To: <20000831112911.D927B816F@halfway.linuxcare.com.au> References: <200008131630.UAA04346@ms2.inr.ac.ru> <20000831112911.D927B816F@halfway.linuxcare.com.au> X-Mailer: VM 6.75 under Emacs 20.7.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rusty Russell writes: > In message <200008131630.UAA04346@ms2.inr.ac.ru> you write: > > If you do not want to depend on skb, add new function using > > rt_key as argument. You may even replace ip_route_output() > > with this new function everywhere, it will be a bit slower, > > but it is worth to do, because has lots of useful applications > > not bound to nfmark. > > OK. This is minimal source level change, so I don't break routing > code this close to 2.4.0. > This patch looks fine, I've applied it. Alexey can complain next week when he comes back online. :-) Later, David S. Miller davem@redhat.com