From owner-netdev@oss.sgi.com Tue Jan 1 06:34:27 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g01EYRE21178 for netdev-outgoing; Tue, 1 Jan 2002 06:34:27 -0800 Received: from gw.osaru.yi.org (fw134121.kitanet.ne.jp [210.237.134.121]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g01EYJg21174 for ; Tue, 1 Jan 2002 06:34:19 -0800 Received: from [::1] (helo=dom.osaru.yi.org.osaru.yi.org) by gw.osaru.yi.org with esmtp (Exim 3.12 #2) id 16LP3X-00038E-00; Tue, 01 Jan 2002 22:34:16 +0900 Date: Tue, 01 Jan 2002 22:34:15 +0900 Message-ID: From: KANDA Mitsuru / =?ISO-2022-JP?B?GyRCP0BFRBsoQiAbJEI9PBsoQg==?= To: netdev@oss.sgi.com Cc: usagi-core@linux-ipv6.org Subject: USAGI stable release User-Agent: Wanderlust/2.8.1 (Something) SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-4?Q?Unebigory=F2mae?=) APEL/10.3 Emacs/21.1 (i386-debian-linux-gnu) MULE/5.0 (SAKAKI) X-GnuPG-fingerprint: 9A35 D378 F084 9EA4 EFBA 925B 1C93 B376 F0EF BE59 X-URL: http://www.osaru.yi.org/~mk/ X-My-AutoMobile: M2-1001 chassis#030 X-Using-IP-Version: IP version 6 MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk A Happy New Year! We are glad that we can announce the 3rd STABLE RELEASE of USAGI (UniverSAl playGround for Ipv6)[1] product on January 1st, 2002. On this release, we provide ipv6 enhanced kernel (based on linux-2.2.20 and/or linux-2.4.13) and basic IPv6 libraries and applications. The improved features are listed below. - ICMPv6 Node Information Queries - Privacy Extensions (RFC 3041)(kernel-2.4 only) - IPv6 khttpd - Better source address selection - Per-device statistics for SNMP - IPv4/IPv6 socket binding on the same port - Dropping IPv6 packets with malicious address(es) - Enabling default route when IPv6 forwarding is enabled - Improving SO_REUSEADDR behavior - Fixing bugs in NDP(Neighbor Discovery Protocol) - Fixing bugs in Stateless Address Auto-configuration - Catching up and implementing RFC2553 / RFC2553bis APIs including IPV6_V6ONLY socket option - Catching up and implementing RFC2292 / RFC2292bis APIs - Making many basic applications IPv6 ready. You can get our source codes from the following URL. We also provide our code in the form divided into the patch against the main-line kernel and the tool. We plan to provide the binary packages for some distributions. They will appear under within several weeks. We announce latest information via web. Please check our web site . We also manage the mailing list for USAGI users. If you have questions, please join the mailing list. Comments and advises are also welcome on that mailing list. Please visit for further information. Thanks. About USAGI Project: The USAGI Project is managed by volunteers and aims to provide better IPv6 environment on Linux freely. We are tightly collaborating with WIDE Project[2], KAME Project[3] and TAHI Project[4], and trying improving Linux kernel, IPv6 related libraries and IPv6 applications. Our products are released every two weeks and stable release several times a year. Please check our web site for the latest detailed information. References: [1] USAGI Project [2] WIDE Project [3] KAME Project [4] TAHI Project -- USAGI Project From owner-netdev@oss.sgi.com Wed Jan 2 00:00:04 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g02804I03724 for netdev-outgoing; Wed, 2 Jan 2002 00:00:04 -0800 Received: from netbank.com.br (IDENT:postfix@garrincha.netbank.com.br [200.203.199.88]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g027xxg03688 for ; Tue, 1 Jan 2002 23:59:59 -0800 Received: from brinquedo.distro.conectiva (1-121.ctame701-2.telepar.net.br [200.181.138.121]) by netbank.com.br (Postfix) with ESMTP id CF69A46819; Wed, 2 Jan 2002 04:57:34 -0200 (BRDT) Received: by brinquedo.distro.conectiva (Postfix, from userid 501) id 533C0C487; Wed, 2 Jan 2002 05:00:02 -0200 (BRST) Date: Wed, 2 Jan 2002 05:00:02 -0200 From: Arnaldo Carvalho de Melo To: "David S. Miller" , SteveW@ACM.org, jschlst@samba.org, ncorbic@sangoma.com, eis@baty.hanse.de, dag@brattli.net, torvalds@transmeta.com, marcelo@conectiva.com.br, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: [PATCH][RFC 5] cleaning up struct sock Message-ID: <20020102050001.A19285@conectiva.com.br> Mail-Followup-To: Arnaldo Carvalho de Melo , "David S. Miller" , SteveW@ACM.org, jschlst@samba.org, ncorbic@sangoma.com, eis@baty.hanse.de, dag@brattli.net, torvalds@transmeta.com, marcelo@conectiva.com.br, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.23i X-Url: http://advogato.org/person/acme Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi, This one turns IP_SK and IP6_PINFO usage the style TCP_PINFO and to some extent IP6_PINFO (and its previous equivalents, sk->tp_pinfo.af_tcp and sk->net_pinfo.ipv6), i.e., using a local variable to hold the result of IP_SK/IP6_PINFO/TCP_PINFO and use this variable instead of the ugly MACRO()->struct_member style. It also fixed a simple error in IP6_PINFO that was causing oopses on IPv6 connections (it was using the tcp area). The fs unbork patch by Daniel Phillips also uses the same approach wrt local variables. It still doesn't make the IPv6 family protocols use each a private slabcache, i.e., there's still only one slabcache for all IPv6 protocols, I'll work on this RSN. Patch available at: http://www.kernel.org/pub/linux/kernel/people/acme/v2.5/2.5.2-pre6 sock.cleanup-2.5.2-pre6.bz2 Comments and test results are welcome. - Arnaldo From owner-netdev@oss.sgi.com Fri Jan 4 01:00:48 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0490mI15640 for netdev-outgoing; Fri, 4 Jan 2002 01:00:48 -0800 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0490gg15637 for ; Fri, 4 Jan 2002 01:00:42 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id g0480XZ21810; Fri, 4 Jan 2002 10:00:33 +0200 Date: Fri, 4 Jan 2002 10:00:32 +0200 (EET) From: Pekka Savola To: cc: Chris Rankin Subject: [PATCH]iver (a new ISA PnP ID) (fwd) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Netdev people might be interested (it's a better place for patches than linux-net). -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords ---------- Forwarded message ---------- Date: Thu, 3 Jan 2002 23:21:22 +0000 (GMT) From: Chris Rankin To: p_gortmaker@yahoo.com Cc: linux-net@vger.kernel.org Subject: [PATCH]iver (a new ISA PnP ID) Hi, Did you know that NetGear are still manufacturing ISA PnP network cards? The EA201 is worryingly jumperless but works fine :-). Here is a patch for the ne.o module so that it is correctly identified. BTW, I suspect that the 'EDI0216' entry has an incorrect ISAPNP_CARD_ID() line. However, I cannot prove this since I don't have one of those cards. Cheers, Chris --- linux-2.4.17/drivers/net/ne.c.orig Thu Jan 3 13:40:16 2002 +++ linux-2.4.17/drivers/net/ne.c Thu Jan 3 16:55:28 2002 @@ -75,7 +75,20 @@ }; #endif +/* + * Example from /proc/isapnp + * + * Card 1 'AXE2011:NETGEAR EA201 Ethernet Card' PnP version 1.0 + * Logical device 0 'AXE2011:Unknown' + * + * The first line gives the ISAPNP_CARD_ID of AXE2011, the second line + * gives the ISAPNP_DEVICE_ID (i.e. VENDOR and FUNCTION), also AXE2011 + * in this case. + */ static struct isapnp_device_id isapnp_clone_list[] __initdata = { + { ISAPNP_CARD_ID('A','X','E',0x2011), + ISAPNP_VENDOR('A','X','E'), ISAPNP_FUNCTION(0x2011), + (long) "NetGear EA201" }, { ISAPNP_ANY_ID, ISAPNP_ANY_ID, ISAPNP_VENDOR('E','D','I'), ISAPNP_FUNCTION(0x0216), (long) "NN NE2000" }, - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From owner-netdev@oss.sgi.com Fri Jan 4 05:35:52 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g04DZqq20584 for netdev-outgoing; Fri, 4 Jan 2002 05:35:52 -0800 Received: from web13105.mail.yahoo.com (web13105.mail.yahoo.com [216.136.174.150]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g04DZgg20581 for ; Fri, 4 Jan 2002 05:35:42 -0800 Message-ID: <20020104123541.17259.qmail@web13105.mail.yahoo.com> Received: from [62.188.139.27] by web13105.mail.yahoo.com via HTTP; Fri, 04 Jan 2002 04:35:41 PST Date: Fri, 4 Jan 2002 04:35:41 -0800 (PST) From: Chris Rankin Subject: Fwd: [PATCH]iver (a new ISA PnP ID) (fwd) To: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Just covering my bases ;-). Hopefully the right person has already picked this patch up. Chris --- Pekka Savola wrote: > Date: Fri, 4 Jan 2002 10:00:32 +0200 (EET) > From: Pekka Savola > To: > CC: Chris Rankin > Subject: [PATCH]iver (a new ISA PnP ID) (fwd) > > Netdev people might be interested (it's a better > place for patches than > linux-net). > > -- > Pekka Savola "Tell me of > difficulties surmounted, > Netcore Oy not those you stumble > over and fall" > Systems. Networks. Security. -- Robert Jordan: A > Crown of Swords > > ---------- Forwarded message ---------- > Date: Thu, 3 Jan 2002 23:21:22 +0000 (GMT) > From: Chris Rankin > To: p_gortmaker@yahoo.com > Cc: linux-net@vger.kernel.org > Subject: [PATCH]iver (a new ISA PnP ID) > > Hi, > > Did you know that NetGear are still manufacturing > ISA PnP network > cards? The EA201 is worryingly jumperless but works > fine :-). Here is > a patch for the ne.o module so that it is correctly > identified. BTW, > I suspect that the 'EDI0216' entry has an incorrect > ISAPNP_CARD_ID() > line. However, I cannot prove this since I don't > have one of those > cards. > > Cheers, > Chris > > --- linux-2.4.17/drivers/net/ne.c.orig Thu Jan 3 > 13:40:16 2002 > +++ linux-2.4.17/drivers/net/ne.c Thu Jan 3 > 16:55:28 2002 > @@ -75,7 +75,20 @@ > }; > #endif > > +/* > + * Example from /proc/isapnp > + * > + * Card 1 'AXE2011:NETGEAR EA201 Ethernet Card' PnP > version 1.0 > + * Logical device 0 'AXE2011:Unknown' > + * > + * The first line gives the ISAPNP_CARD_ID of > AXE2011, the second line > + * gives the ISAPNP_DEVICE_ID (i.e. VENDOR and > FUNCTION), also AXE2011 > + * in this case. > + */ > static struct isapnp_device_id isapnp_clone_list[] > __initdata = { > + { ISAPNP_CARD_ID('A','X','E',0x2011), > + ISAPNP_VENDOR('A','X','E'), > ISAPNP_FUNCTION(0x2011), > + (long) "NetGear EA201" }, > { ISAPNP_ANY_ID, ISAPNP_ANY_ID, > ISAPNP_VENDOR('E','D','I'), > ISAPNP_FUNCTION(0x0216), > (long) "NN NE2000" }, > - > To unsubscribe from this list: send the line > "unsubscribe linux-net" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at > http://vger.kernel.org/majordomo-info.html > __________________________________________________ Do You Yahoo!? Send your FREE holiday greetings online! http://greetings.yahoo.com From owner-netdev@oss.sgi.com Fri Jan 4 12:49:13 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g04KnDm21670 for netdev-outgoing; Fri, 4 Jan 2002 12:49:13 -0800 Received: from docomolabs-usa.com (fridge.docomo-usa.com [216.98.102.228]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g04KnBg21666 for ; Fri, 4 Jan 2002 12:49:11 -0800 Received: from VAIOHE (dhcp5.docomo-usa.com [172.21.96.5]) by docomolabs-usa.com (8.11.3/8.11.3) with ESMTP id g04Jn4S11944 for ; Fri, 4 Jan 2002 11:49:04 -0800 (PST) Reply-To: From: "Xiaoning He" To: Subject: Netmeeting Date: Fri, 4 Jan 2002 11:47:58 -0800 Organization: NTT-Docomo USA Labs Message-ID: <003701c19558$b6164ce0$056015ac@VAIOHE> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id g04KnBg21667 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi I am looking for a IPv6 supported Linux based interactive application which can function like a netmeeting. Is there any such application available?   Thank you Xiaoning He From owner-netdev@oss.sgi.com Fri Jan 4 12:55:23 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g04KtNx21931 for netdev-outgoing; Fri, 4 Jan 2002 12:55:23 -0800 Received: from dibbler.ne.mediaone.net (IDENT:root@dibbler.ne.mediaone.net [24.218.57.139]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g04KtKg21928 for ; Fri, 4 Jan 2002 12:55:20 -0800 Received: (from rodrigc@localhost) by dibbler.ne.mediaone.net (8.11.0/8.11.0) id g04JtNV12461; Fri, 4 Jan 2002 14:55:23 -0500 Date: Fri, 4 Jan 2002 14:55:23 -0500 From: Craig Rodrigues To: Xiaoning He Cc: netdev@oss.sgi.com Subject: Re: Netmeeting Message-ID: <20020104145523.A12458@mediaone.net> References: <003701c19558$b6164ce0$056015ac@VAIOHE> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <003701c19558$b6164ce0$056015ac@VAIOHE>; from xiaoning@docomolabs-usa.com on Fri, Jan 04, 2002 at 11:47:58AM -0800 X-MIME-Autoconverted: from 8bit to quoted-printable by dibbler.ne.mediaone.net id g04JtNV12461 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id g04KtLg21929 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Jan 04, 2002 at 11:47:58AM -0800, Xiaoning He wrote: > Hi > > I am looking for a IPv6 supported Linux based interactive application > which can function like a netmeeting. Is there any such application > available?   There is GNOME meeting: http://www.gnomemeeting.org, but I don't know if it supports IPv6. -- Craig Rodrigues http://www.gis.net/~craigr rodrigc@mediaone.net From owner-netdev@oss.sgi.com Sun Jan 6 14:38:31 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g06McVr19530 for netdev-outgoing; Sun, 6 Jan 2002 14:38:31 -0800 Received: from ALPHA8.CC.MONASH.EDU.AU (alpha8.cc.monash.edu.au [130.194.1.8]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g06McRg19527 for ; Sun, 6 Jan 2002 14:38:28 -0800 Received: from splat.its.monash.edu.au ([130.194.1.73]) by vaxh.cc.monash.edu.au (PMDF V5.2-31 #39306) with ESMTP id <01KCS7R8378Y8WXWXB@vaxh.cc.monash.edu.au> for netdev@oss.sgi.com; Mon, 7 Jan 2002 08:38:19 +1100 Received: from localhost (localhost [127.0.0.1]) by splat.its.monash.edu.au (Postfix) with ESMTP id 9CEEA12C006; Mon, 07 Jan 2002 08:38:17 +1100 (EST) Received: from eng.monash.edu.au (knuth.eng.monash.edu.au [130.194.137.189]) by splat.its.monash.edu.au (Postfix) with ESMTP id 0D6F912C003; Mon, 07 Jan 2002 08:38:17 +1100 (EST) Date: Mon, 07 Jan 2002 08:41:15 +1100 From: Greg Daley Subject: Re: Netmeeting To: Craig Rodrigues Cc: Xiaoning He , netdev@oss.sgi.com Reply-to: greg.daley@eng.monash.edu.au Message-id: <3C38C47B.193760A7@eng.monash.edu.au> Organization: Monash University MIME-version: 1.0 X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.10mobile i686) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT X-Accept-Language: en References: <003701c19558$b6164ce0$056015ac@VAIOHE> <20020104145523.A12458@mediaone.net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Craig Rodrigues wrote: > > On Fri, Jan 04, 2002 at 11:47:58AM -0800, Xiaoning He wrote: > > Hi > > > > I am looking for a IPv6 supported Linux based interactive application > > which can function like a netmeeting. Is there any such application > > available? > > There is GNOME meeting: http://www.gnomemeeting.org, but I don't know if > it supports IPv6. I think that it doesn't support IPv6, but we have a student here (at Monash Uni) looking at the OpenH323/PWlib components now, for protocol dependencies. Greg Daley From owner-netdev@oss.sgi.com Sun Jan 6 20:38:39 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g074cdK26376 for netdev-outgoing; Sun, 6 Jan 2002 20:38:39 -0800 Received: from docomolabs-usa.com (fridge.docomo-usa.com [216.98.102.228]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g074cYg26373 for ; Sun, 6 Jan 2002 20:38:34 -0800 Received: from VAIOHE (dhcp53.docomo-usa.com [172.21.96.53]) by docomolabs-usa.com (8.11.3/8.11.3) with ESMTP id g073cGS18343; Sun, 6 Jan 2002 19:38:16 -0800 (PST) Reply-To: From: "Xiaoning He" To: , "'Craig Rodrigues'" Cc: Subject: RE: Netmeeting Date: Sun, 6 Jan 2002 19:37:09 -0800 Organization: NTT-Docomo USA Labs Message-ID: <000101c1972c$96a6bef0$356015ac@VAIOHE> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 In-Reply-To: <3C38C47B.193760A7@eng.monash.edu.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Thank you for the valuable in formations. There is another software called vic http://www-mice.cs.ucl.ac.uk/multimedia/software/ which claims it has been tested over IPv6. However, it seems that they tested it on a very old kernel version. I will be very grateful if I can be informed about the results of OpenH323 evaluation. Thank you and best regards Xiaoning He > -----Original Message----- > From: owner-netdev@oss.sgi.com [mailto:owner-netdev@oss.sgi.com] On Behalf > Of Greg Daley > Sent: Sunday, January 06, 2002 1:41 PM > To: Craig Rodrigues > Cc: Xiaoning He; netdev@oss.sgi.com > Subject: Re: Netmeeting > > Craig Rodrigues wrote: > > > > On Fri, Jan 04, 2002 at 11:47:58AM -0800, Xiaoning He wrote: > > > Hi > > > > > > I am looking for a IPv6 supported Linux based interactive application > > > which can function like a netmeeting. Is there any such application > > > available? > > > > There is GNOME meeting: http://www.gnomemeeting.org, but I don't know if > > it supports IPv6. > > I think that it doesn't support IPv6, > but we have a student here (at Monash Uni) > looking at the OpenH323/PWlib components now, > for protocol dependencies. > > Greg Daley From owner-netdev@oss.sgi.com Tue Jan 8 02:48:32 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g08AmWg27356 for netdev-outgoing; Tue, 8 Jan 2002 02:48:32 -0800 Received: from melanieb.vtt.fi (melanieb.vtt.fi [130.188.1.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g08AmOg27353 for ; Tue, 8 Jan 2002 02:48:24 -0800 Received: from mailgw.vtt.fi (localhost [127.0.0.1]) by melanieb.vtt.fi (8.9.3/8.9.3) with ESMTP id LAA26288 for ; Tue, 8 Jan 2002 11:48:21 +0200 (EET) Received: from vttmail.vtt.fi (vttmail.vtt.fi [130.188.1.4]) by mailgw.vtt.fi (8.9.3/8.9.3) with ESMTP id LAA23172 for ; Tue, 8 Jan 2002 11:48:20 +0200 (EET) Received: from there (tte3168.tte.vtt.fi [130.188.71.92]) by vttmail.vtt.fi (8.9.3/8.9.3) with SMTP id LAA15338 for ; Tue, 8 Jan 2002 11:48:19 +0200 (EET) Message-Id: <200201080948.LAA15338@vttmail.vtt.fi> Content-Type: text/plain; charset="iso-8859-1" From: Sami Ponkanen Organization: VTT Information Technology To: netdev@oss.sgi.com Subject: [BUG] Kernel oops with slip+dnat Date: Tue, 8 Jan 2002 11:40:54 +0200 X-Mailer: KMail [version 1.3.2] MIME-Version: 1.0 X-MIME-Autoconverted: from 8bit to quoted-printable by melanieb.vtt.fi id LAA26288 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id g08AmPg27354 Sender: owner-netdev@oss.sgi.com Precedence: bulk I posted this yesterday to netfilter-devel and linux networking lists, but I was instructed that this list might suit better. Sami I wrote on netfilter-devel: Hello, As I wrote earlier on netfilter list and more recently on linux networking list, there is a bug that results in a kernel oops when using DNAT or REDIRECT rule in the OUTPUT chain on a host with SLIP interfaces. The bug is reproducible on atleast 2.4.7, 2.4.16 and 2.4.17. Here's how to do it: 1. modprobe slip 2. slattach -p slip -s 1200 /dev/ttyS0 3. ifconfig sl0 192.168.1.2 pointopoint 192.168.1.1 4. iptables -t nat -A OUTPUT -d 192.168.1.1 -j REDIRECT or iptables -t nat -A OUTPUT -d 192.168.1.1 -j DNAT --to-destination 192.168.1.2 5. ping 192.168.1.1 or send a UDP packet to 192.168.1.1 6. Oops! I've traced the problem and it seems that the problem is following: A buffer for the packet is reserved in ip_build_xmit() (net/ipv4/ip_output.c:627) and the correct size for the buffer is calculated on line 667: int hh_len = (rt->u.dst.dev->hard_header_len + 15)&~15; Now here (I think, correct me if I'm wrong) the hard_header_len is just 1 byte, the SLIP header byte. Later on the control goes through nf_hook_slow() (net/core/netfilter.c:445) where the packet is put on another output device (skb->dev changes). The new device has a different hard_header_len, but skb has only space for the 1-byte SLIP header! Am I on the right tracks here? Ok, again few steps forward and the control reaches neigh_resolve_output() (net/core/neighbour.c:950). Here the function dev->hard_header() is called and consequently ether_header() (net/ethernet/eth.c:75) is called (why?). Right in the beginning of the function the call to skb_push(skb, ETH_HLEN) results in skb_under_panic() and BUG() and consequently the system crashes. A quick fix is to reserve few extra bytes in ip_build_xmit(). I tried changing line 676 in ip_output.c from this: int hh_len = (rt->u.dst.dev->hard_header_len + 15)&~15; into this: int hh_len = (rt->u.dst.dev->hard_header_len + 31)&~15; and voila, no more oopses. Well, this is definitely not the correct way to fix the problem, but it works for now. Now, a few questions came to my mind while debugging the problem. Firstly, why do you put an ethernet header on a packet that is sent via the loopback device? Secondly why call skb_under_panic() in skb_push()? Shouldn't the packet rather be just silently discarded? Anyway I think we all agree that it should not crash the whole kernel, right? Regards, Sami Pönkänen From owner-netdev@oss.sgi.com Tue Jan 8 07:35:03 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g08FZ3L08176 for netdev-outgoing; Tue, 8 Jan 2002 07:35:03 -0800 Received: from dea.linux-mips.net (localhost [127.0.0.1]) by oss.sgi.com (8.11.2/8.11.3) with ESMTP id g08FYwg08173 for ; Tue, 8 Jan 2002 07:34:58 -0800 Received: (from ralf@localhost) by dea.linux-mips.net (8.11.1/8.11.1) id g08EYst18693 for netdev@oss.sgi.com; Tue, 8 Jan 2002 12:34:54 -0200 Received: from smtp014.mail.yahoo.com (smtp014.mail.yahoo.com [216.136.173.58]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g08EGTg06276 for ; Tue, 8 Jan 2002 06:16:29 -0800 Received: from ptil-10-145-ban.primus-india.net (HELO iwave014) (203.196.145.10) by smtp.mail.vip.sc5.yahoo.com with SMTP; 8 Jan 2002 13:16:25 -0000 Message-ID: <012a01c19846$f8d33b10$9502a8c0@iwave014> From: "Abdul Khaliq" To: Subject: How to recieve an Ipv6 packet Date: Tue, 8 Jan 2002 18:42:50 +0530 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0125_01C19874.465EF0E0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6700 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Sender: owner-netdev@oss.sgi.com Precedence: bulk This is a multi-part message in MIME format. ------=_NextPart_000_0125_01C19874.465EF0E0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear sir /madam I have some queries regarding IPv6. I am still in the learning stage. = So, my question may look very simple for you, but if you answer it, that = will help me a lot. Please answer the following questions. 1. the route table entry consists of "Destination , Next Hop, Flags, Metric, Ref, Use, Iface". If i recieve an IPv6 packet the function ip6_rcv=20 I would like to know to what are the functions the function pointers = input and output in the struct dst_entry structure(Destination)?. The function ip6_input processes the Ipv6 packet and calls the upper = layer functions.=20 The fucntion ip6_input is called twice=20 1) when a multi cast packet is recieved to deliver a packet to host, and 2) rt->u.dst.input =3D ip6_input;=20 it is mapped while adding the address [int ip6_rt_addr_add(struct = in6_addr *addr, struct device *dev)] The address add routine[ip6_rt_addr_add] above is called when a new = address is to be added when the flag is RTM_NEWADDR , and called by the = function=20 static void sit_add_v4_addrs(struct inet6_dev *idev). But when an Ipv6 address is added to a host, the function ip6_input is = not mapped to dst->input, in which case the host cannot recive the = packet with that IPv6 address? Thanks and Regards Abdul Khaliq ------=_NextPart_000_0125_01C19874.465EF0E0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Dear sir /madam
I have some queries regarding IPv6. I = am still in=20 the learning stage. So, my question may look very simple for you, but if = you=20 answer it, that will help me a lot. Please answer the following=20 questions.


1. the route table entry  = consists=20 of  "Destination , Next Hop,
Flags, Metric, Ref, Use, =20 Iface".
If i recieve an IPv6 packet the = function ip6_rcv=20
 
I would like to know to what are the = functions=20 the function pointers input and output in the struct dst_entry=20 structure(Destination)?.
 
The function  ip6_input processes = the Ipv6=20 packet and calls the upper layer functions. 
 
The fucntion ip6_input is called twice=20
1) when a multi cast packet is recieved = to deliver=20 a packet to host, and
 
2) rt->u.dst.input =3D = ip6_input;=20
it is mapped while adding = the address [int=20 ip6_rt_addr_add(struct in6_addr *addr, struct device *dev)]
 
The address add = routine[ip6_rt_addr_add] above is=20 called when a new address is to be added when the flag is RTM_NEWADDR , = and=20 called by the function
static void sit_add_v4_addrs(struct = inet6_dev=20 *idev).
 
But when an Ipv6 address is added to a = host, the=20 function ip6_input is not mapped to dst->input, in which case the = host cannot=20 recive the packet with that IPv6 address?
 
Thanks and Regards
Abdul Khaliq
------=_NextPart_000_0125_01C19874.465EF0E0-- _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com From owner-netdev@oss.sgi.com Tue Jan 8 08:16:36 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g08GGaC09597 for netdev-outgoing; Tue, 8 Jan 2002 08:16:36 -0800 Received: from nero.doit.wisc.edu (nero.doit.wisc.edu [128.104.17.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g08GGTg09594 for ; Tue, 8 Jan 2002 08:16:29 -0800 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.9.3/8.9.3) id LAA10183; Tue, 8 Jan 2002 11:11:43 -0600 Date: Tue, 8 Jan 2002 11:11:43 -0600 From: "James R. Leu" To: Abdul Khaliq Cc: netdev@oss.sgi.com Subject: Re: How to recieve an Ipv6 packet Message-ID: <20020108111143.A10174@nero.doit.wisc.edu> Reply-To: jleu@mindspring.com References: <012a01c19846$f8d33b10$9502a8c0@iwave014> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <012a01c19846$f8d33b10$9502a8c0@iwave014>; from instkhaliq@yahoo.com on Tue, Jan 08, 2002 at 06:42:50PM +0530 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk If I understand you correcly you want to know how an IPv6 packet makes it into the IPv6 stack (ip6_rcv). If the packet is coming via PPP or Ethernet the protocol id from the L2 frame is used by the lower layer network stack (net_if_rx) to figure out what L3 protocol handler should be called. Each L3 protocol handler registers with the kernel via dev_add_pack(). Look at linux/include/linux/if_ether.h to see what the L2 header on an ethernet frame looks like. Jim On Tue, Jan 08, 2002 at 06:42:50PM +0530, Abdul Khaliq wrote: > Dear sir /madam > I have some queries regarding IPv6. I am still in the learning stage. So, my question may look very simple for you, but if you answer it, that will help me a lot. Please answer the following questions. > > > 1. the route table entry consists of "Destination , Next Hop, > Flags, Metric, Ref, Use, Iface". > If i recieve an IPv6 packet the function ip6_rcv > > I would like to know to what are the functions the function pointers input and output in the struct dst_entry structure(Destination)?. > > The function ip6_input processes the Ipv6 packet and calls the upper layer functions. > > The fucntion ip6_input is called twice > 1) when a multi cast packet is recieved to deliver a packet to host, and > > 2) rt->u.dst.input = ip6_input; > it is mapped while adding the address [int ip6_rt_addr_add(struct in6_addr *addr, struct device *dev)] > > The address add routine[ip6_rt_addr_add] above is called when a new address is to be added when the flag is RTM_NEWADDR , and called by the function > static void sit_add_v4_addrs(struct inet6_dev *idev). > > But when an Ipv6 address is added to a host, the function ip6_input is not mapped to dst->input, in which case the host cannot recive the packet with that IPv6 address? > > Thanks and Regards > Abdul Khaliq -- James R. Leu From owner-netdev@oss.sgi.com Wed Jan 9 13:05:36 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g09L5aU15765 for netdev-outgoing; Wed, 9 Jan 2002 13:05:36 -0800 Received: from docomolabs-usa.com (fridge.docomo-usa.com [216.98.102.228]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g09L5Vg15761 for ; Wed, 9 Jan 2002 13:05:32 -0800 Received: from VAIOHE (dhcp5.docomo-usa.com [172.21.96.5]) by docomolabs-usa.com (8.11.3/8.11.3) with ESMTP id g09K5OS19794 for ; Wed, 9 Jan 2002 12:05:24 -0800 (PST) Reply-To: From: "Xiaoning He" To: Subject: Another stupid question regarding the RA/RS in IPv6 Date: Wed, 9 Jan 2002 12:04:15 -0800 Organization: NTT-Docomo USA Labs Message-ID: <000701c19948$d0b76720$056015ac@VAIOHE> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 1 (Highest) X-MSMail-Priority: High X-Mailer: Microsoft Outlook, Build 10.0.2627 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: High Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi I have some stupid questions regarding the Router Solicitation and Router Advertisement implementation for RedHat 7.1. Since I am new to Linux programming, it will be very helpful if you can answer my questions. Thank you in advance. 1. In the Red Hat 7.1, where is the code of sending out router solicitation and router advertisement? Where is their location? 2. Is there any deamons in Red Hat 7.1 to handle the RA and RS? I downloaded radvd from the web site but it seems it is not a integrated part of Red Hat 7.1. The task is to modify the router solicitation and router advertisement message in Red Hat 7.1. Could you please let me know where is the code which constructs the router solicitation and router advertisement messages. Thank you. Xiaoning He From owner-netdev@oss.sgi.com Wed Jan 9 13:12:05 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g09LC5C16211 for netdev-outgoing; Wed, 9 Jan 2002 13:12:05 -0800 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g09LC0g16207 for ; Wed, 9 Jan 2002 13:12:01 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id g09KBni18388; Wed, 9 Jan 2002 22:11:49 +0200 Date: Wed, 9 Jan 2002 22:11:48 +0200 (EET) From: Pekka Savola To: Xiaoning He cc: Subject: Re: Another stupid question regarding the RA/RS in IPv6 In-Reply-To: <000701c19948$d0b76720$056015ac@VAIOHE> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, 9 Jan 2002, Xiaoning He wrote: > I have some stupid questions regarding the Router Solicitation and > Router Advertisement implementation for RedHat 7.1. Since I am new to > Linux programming, it will be very helpful if you can answer my > questions. Thank you in advance. > > 1. In the Red Hat 7.1, where is the code of sending out router > solicitation and router advertisement? Where is their location? Sending RS is mainly done in kernel. Reacting to them and sending RA's is done at a user-space routing advertising daemon, e.g. radvd. > 2. Is there any deamons in Red Hat 7.1 to handle the RA and RS? I > downloaded radvd from the web site but it seems it is not a integrated > part of Red Hat 7.1. There are RPM's on the site. Radvd is also integrated with RHL72. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Jan 9 14:09:51 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g09M9pu17415 for netdev-outgoing; Wed, 9 Jan 2002 14:09:51 -0800 Received: from docomolabs-usa.com (fridge.docomo-usa.com [216.98.102.228]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g09M9jg17412 for ; Wed, 9 Jan 2002 14:09:45 -0800 Received: from VAIOHE (dhcp5.docomo-usa.com [172.21.96.5]) by docomolabs-usa.com (8.11.3/8.11.3) with ESMTP id g09L9bS22425; Wed, 9 Jan 2002 13:09:37 -0800 (PST) Reply-To: From: "Xiaoning He" To: "'Pekka Savola'" Cc: Subject: RE: Another stupid question regarding the RA/RS in IPv6 Date: Wed, 9 Jan 2002 13:08:29 -0800 Organization: NTT-Docomo USA Labs Message-ID: <000001c19951$c95c88d0$056015ac@VAIOHE> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal In-Reply-To: Sender: owner-netdev@oss.sgi.com Precedence: bulk Thank you. I installed a stand alone radvd from the web site. And it works. However, I can not find the definition of following structures: struct nd_router_advert struct nd_router_advert If you are familiar with the Linux, could you please let me know the dir which contains such structure. Also, when you said radvd is integrated with RHL72, are you saying I can get the code from RHL72's CD? Thank you very much. Xiaoning > -----Original Message----- > From: Pekka Savola [mailto:pekkas@netcore.fi] > Sent: Wednesday, January 09, 2002 12:12 PM > To: Xiaoning He > Cc: netdev@oss.sgi.com > Subject: Re: Another stupid question regarding the RA/RS in IPv6 > > On Wed, 9 Jan 2002, Xiaoning He wrote: > > I have some stupid questions regarding the Router Solicitation and > > Router Advertisement implementation for RedHat 7.1. Since I am new to > > Linux programming, it will be very helpful if you can answer my > > questions. Thank you in advance. > > > > 1. In the Red Hat 7.1, where is the code of sending out router > > solicitation and router advertisement? Where is their location? > > Sending RS is mainly done in kernel. Reacting to them and sending RA's is > done at a user-space routing advertising daemon, e.g. radvd. > > > 2. Is there any deamons in Red Hat 7.1 to handle the RA and RS? I > > downloaded radvd from the web site but it seems it is not a integrated > > part of Red Hat 7.1. > > There are RPM's on the site. Radvd is also integrated with RHL72. > > -- > Pekka Savola "Tell me of difficulties surmounted, > Netcore Oy not those you stumble over and fall" > Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Jan 9 14:18:35 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g09MIZJ17643 for netdev-outgoing; Wed, 9 Jan 2002 14:18:35 -0800 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g09MIVg17640 for ; Wed, 9 Jan 2002 14:18:32 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id g09LIMN18927; Wed, 9 Jan 2002 23:18:22 +0200 Date: Wed, 9 Jan 2002 23:18:21 +0200 (EET) From: Pekka Savola To: Xiaoning He cc: Subject: RE: Another stupid question regarding the RA/RS in IPv6 In-Reply-To: <000001c19951$c95c88d0$056015ac@VAIOHE> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, 9 Jan 2002, Xiaoning He wrote: > I installed a stand alone radvd from the web site. And it works. > However, I can not find the definition of following structures: > > struct nd_router_advert > struct nd_router_advert > > If you are familiar with the Linux, could you please let me know the dir > which contains such structure. /usr/include/netinet/icmp6.h > Also, when you said radvd is integrated with RHL72, are you saying I can > get the code from RHL72's CD? Yes. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Jan 9 18:49:36 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0A2naS25118 for netdev-outgoing; Wed, 9 Jan 2002 18:49:36 -0800 Received: from mail.telcom.net ([157.238.95.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0A2nTg25114 for ; Wed, 9 Jan 2002 18:49:29 -0800 Received: from gont.gont.com.ar (DU159-145.fibertel.com.ar [200.49.145.159]) by mail.telcom.net (8.11.6/8.11.6) with ESMTP id g0A1n9t12526 for ; Wed, 9 Jan 2002 20:49:10 -0500 (EST) Message-Id: <4.3.2.7.2.20020109225440.00de0350@mail.sitanium.com> X-Sender: ingroupcomar-fgont@mail.sitanium.com X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Wed, 09 Jan 2002 22:56:18 -0300 To: netdev@oss.sgi.com From: Fernando Gont Subject: socket() returns "Invalid argument". Why? (UNPv1) Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi! I compiled this simple TCP daytime client, expecting to get a EPFNOSUPPORT error. #include "unp.h" int main(int argc, char **argv) { int sockfd, n; char recvline[MAXLINE + 1]; struct sockaddr_in servaddr; if (argc != 2) err_quit("usage: a.out "); if ( (sockfd = socket(9999, SOCK_STREAM, 0)) < 0){ printf("errno: %d\n", errno); err_sys("socket error"); } bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(13); /* daytime server */ if (inet_pton(AF_INET, argv[1], &servaddr.sin_addr) <= 0) err_quit("inet_pton error for %s", argv[1]); if (connect(sockfd, (SA *) &servaddr, sizeof(servaddr)) < 0) err_sys("connect error"); while ( (n = read(sockfd, recvline, MAXLINE)) > 0) { recvline[n] = 0; /* null terminate */ if (fputs(recvline, stdout) == EOF) err_sys("fputs error"); } if (n < 0) err_sys("read error"); exit(0); } But after compiling and running this program, I see that errno gets the value 22, and having a look at the header files, I see: #define EINVAL 22 /* Invalid argument */ #define EPFNOSUPPORT 96 /* Protocol family not supported */ #define EAFNOSUPPORT 97 /* Address family not supported by protocol */ My qustion is: if I don't get a EPFNOSUPPORT error for this example, when would I get it? Greetings, Fernando (fernando@gont.com.ar) "I believe it is a Human impossibility to obtain complete peace of mind in this dimension. There's too much suffering and pain -- particularly for children." -Dolores O'Riordan From owner-netdev@oss.sgi.com Wed Jan 9 21:58:17 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0A5wHH29703 for netdev-outgoing; Wed, 9 Jan 2002 21:58:17 -0800 Received: from smtp011.mail.yahoo.com (smtp011.mail.yahoo.com [216.136.173.31]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0A5w8g29699 for ; Wed, 9 Jan 2002 21:58:08 -0800 Received: from ptil-10-145-ban.primus-india.net (HELO iwave014) (203.196.145.10) by smtp.mail.vip.sc5.yahoo.com with SMTP; 10 Jan 2002 04:58:03 -0000 Message-ID: <004d01c19993$aeec0470$9502a8c0@iwave014> From: "Abdul Khaliq" To: References: <012a01c19846$f8d33b10$9502a8c0@iwave014> <20020108111143.A10174@nero.doit.wisc.edu> Subject: How to forward an Ipv6 packet to upper layer Date: Thu, 10 Jan 2002 10:30:04 +0530 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2919.6700 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Sender: owner-netdev@oss.sgi.com Precedence: bulk Dear Jim Thanks for the reply. I have understood that the packet makes into the Ipv6 stack from the function ipv6_rcv. I have not understood the functon ip6_route_input clearly. How the route table is formed and stored when a new route is added? How the route table is formed and stored when a new address is added? please correct the statement and justify it. Depending on the destination (struct dst_entry ) the function ip6_route_input returns, the packet is processed or it is forwarded. Thanks & Regards Abdul Khaliq ----- Original Message ----- From: "James R. Leu" To: "Abdul Khaliq" Cc: Sent: Tuesday, January 08, 2002 10:41 PM Subject: Re: How to recieve an Ipv6 packet > If I understand you correcly you want to know how an IPv6 packet makes it > into the IPv6 stack (ip6_rcv). If the packet is coming via PPP or Ethernet > the protocol id from the L2 frame is used by the lower layer network stack > (net_if_rx) to figure out what L3 protocol handler should be called. Each > L3 protocol handler registers with the kernel via dev_add_pack(). > > Look at linux/include/linux/if_ether.h to see what the L2 header on an > ethernet frame looks like. > > Jim > > On Tue, Jan 08, 2002 at 06:42:50PM +0530, Abdul Khaliq wrote: > > Dear sir /madam > > I have some queries regarding IPv6. I am still in the learning stage. So, my question may look very simple for you, but if you answer it, that will help me a lot. Please answer the following questions. > > > > > > 1. the route table entry consists of "Destination , Next Hop, > > Flags, Metric, Ref, Use, Iface". > > If i recieve an IPv6 packet the function ip6_rcv > > > > I would like to know to what are the functions the function pointers input and output in the struct dst_entry structure(Destination)?. > > > > The function ip6_input processes the Ipv6 packet and calls the upper layer functions. > > > > The fucntion ip6_input is called twice > > 1) when a multi cast packet is recieved to deliver a packet to host, and > > > > 2) rt->u.dst.input = ip6_input; > > it is mapped while adding the address [int ip6_rt_addr_add(struct in6_addr *addr, struct device *dev)] > > > > The address add routine[ip6_rt_addr_add] above is called when a new address is to be added when the flag is RTM_NEWADDR , and called by the function > > static void sit_add_v4_addrs(struct inet6_dev *idev). > > > > But when an Ipv6 address is added to a host, the function ip6_input is not mapped to dst->input, in which case the host cannot recive the packet with that IPv6 address? > > > > Thanks and Regards > > Abdul Khaliq > > -- > James R. Leu _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com From owner-netdev@oss.sgi.com Wed Jan 9 22:50:27 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0A6oRo30518 for netdev-outgoing; Wed, 9 Jan 2002 22:50:27 -0800 Received: from nero.doit.wisc.edu (nero.doit.wisc.edu [128.104.17.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0A6oIg30515 for ; Wed, 9 Jan 2002 22:50:19 -0800 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.9.3/8.9.3) id BAA11607 for netdev@oss.sgi.com; Thu, 10 Jan 2002 01:45:25 -0600 Date: Thu, 10 Jan 2002 01:45:24 -0600 From: "James R. Leu" To: netdev@oss.sgi.com Subject: [PATCH] minor changes to ipv4 Message-ID: <20020110014524.A11601@nero.doit.wisc.edu> Reply-To: jleu@mindspring.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk I have been working on a project to add Multi Protocol Label Switching (MPLS) to the Linux kernel for the last 2 years. Being that MPLS is usless without some IPv4 interaction I was forced to modify the IPv4 stack to "play well with others". Through all of this work I had to make what seem to be minor changes, but make a big difference when considering their interaction with other protocols (think layer 2.5). Please consider these changes for integrating into the 2.5 kernel. Thanks, Jim -- James R. Leu ----------------------------- snip ------------------------ diff -uNr --exclude=CVS mainstream-2.4/include/net/ip.h mpls-linux-1.1/include/net/ip.h --- mainstream-2.4/include/net/ip.h Tue May 29 12:40:52 2001 +++ mpls-linux-1.1/include/net/ip.h Sun Jan 6 20:33:34 2002 @@ -162,9 +162,9 @@ static inline int ip_send(struct sk_buff *skb) { if (skb->len > skb->dst->pmtu) - return ip_fragment(skb, ip_finish_output); + return ip_fragment(skb,skb->dst->output); else - return ip_finish_output(skb); + return skb->dst->output(skb); } /* The function in 2.2 was invalid, producing wrong result for diff -uNr --exclude=CVS mainstream-2.4/net/core/neighbour.c mpls-linux-1.1/net/core/neighbour.c --- mainstream-2.4/net/core/neighbour.c Wed Jan 9 18:01:10 2002 +++ mpls-linux-1.1/net/core/neighbour.c Wed Jan 9 18:38:20 2002 @@ -963,7 +963,7 @@ if (dev->hard_header_cache && dst->hh == NULL) { write_lock_bh(&neigh->lock); if (dst->hh == NULL) - neigh_hh_init(neigh, dst, dst->ops->protocol); + neigh_hh_init(neigh, dst, skb->protocol); err = dev->hard_header(skb, dev, ntohs(skb->protocol), neigh->ha, NULL, skb->len); write_unlock_bh(&neigh->lock); } else { diff -uNr --exclude=CVS mainstream-2.4/net/ipv4/ip_output.c mpls-linux-1.1/net/ipv4/ip_output.c --- mainstream-2.4/net/ipv4/ip_output.c Tue Oct 30 20:52:01 2001 +++ mpls-linux-1.1/net/ipv4/ip_output.c Tue Oct 30 21:30:48 2001 @@ -113,6 +113,7 @@ static inline int output_maybe_reroute(struct sk_buff *skb) { + skb->protocol = __constant_htons(ETH_P_IP); return skb->dst->output(skb); } diff -uNr --exclude=CVS mainstream-2.4/net/ipv4/route.c mpls-linux-1.1/net/ipv4/route.c --- mainstream-2.4/net/ipv4/route.c Wed Jan 9 18:01:10 2002 +++ mpls-linux-1.1/net/ipv4/route.c Wed Jan 9 19:05:50 2002 @@ -1480,7 +1480,7 @@ rth->rt_spec_dst= spec_dst; rth->u.dst.input = ip_forward; - rth->u.dst.output = ip_output; + rth->u.dst.output = ip_finish_output; rt_set_nexthop(rth, &res, itag); From owner-netdev@oss.sgi.com Thu Jan 10 11:09:40 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0AJ9eM19373 for netdev-outgoing; Thu, 10 Jan 2002 11:09:40 -0800 Received: from vieo.com (vieo.com [216.30.79.131]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0AJ9Xg19370 for ; Thu, 10 Jan 2002 11:09:33 -0800 Received: (from root@localhost) by vieo.com (8.11.2/8.11.2) id g0AI9QG84817 for netdev@oss.sgi.com; Thu, 10 Jan 2002 12:09:26 -0600 (CST) (envelope-from golio@vieo.com) Received: from vieo.com (root@ponty.vieo.com [10.1.0.70]) by vieo.com (8.11.2/8.11.2) with ESMTP id g0AI9Pw84777 for ; Thu, 10 Jan 2002 12:09:25 -0600 (CST) (envelope-from golio@vieo.com) Message-ID: <3C3DD8D5.E10FD556@vieo.com> Date: Thu, 10 Jan 2002 12:09:25 -0600 From: Joe Golio X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.4-4GB-SMP i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: [Fwd: Question about /usr/include/netdevice.h] Content-Type: multipart/mixed; boundary="------------0853A5BBBCFE70A08B3E3950" X-scanner: scanned by Inflex 0.1.5c+ on vieo.com Sender: owner-netdev@oss.sgi.com Precedence: bulk This is a multi-part message in MIME format. --------------0853A5BBBCFE70A08B3E3950 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit --------------0853A5BBBCFE70A08B3E3950 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Return-Path: Received: (from root@localhost) by vieo.com (8.11.2/8.11.2) id g0AHAMh53138 for golio@vieo.com; Thu, 10 Jan 2002 11:10:22 -0600 (CST) (envelope-from golio@vieo.com) Received: from vieo.com (root@ponty.vieo.com [10.1.0.70]) by vieo.com (8.11.2/8.11.2) with ESMTP id g0AHALw53055; Thu, 10 Jan 2002 11:10:21 -0600 (CST) (envelope-from golio@vieo.com) Sender: root@vieo.com Message-ID: <3C3DCAFD.D82DCB05@vieo.com> Date: Thu, 10 Jan 2002 11:10:21 -0600 From: Joe Golio X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.4-4GB-SMP i686) X-Accept-Language: en MIME-Version: 1.0 To: linux-kernel@vger.kernel.org, golio@vieo.com Subject: Question about /usr/include/netdevice.h Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-scanner: scanned by Inflex 0.1.5c+ on vieo.com Hello all, I am new at this so bare with me... Inside /usr/include/linux/netdevice.h, there is a #define for MAX_ADDR_LEN which is currently set to a value of "7" in 2.4. I am working on an implementation where the hardware address of the media needs be larger than 7 bytes. What is the process by which I would have to go through to get this value changed to something larger than 7 bytes, if I so desired ? Thanks, Joe Golio --------------0853A5BBBCFE70A08B3E3950-- From owner-netdev@oss.sgi.com Thu Jan 10 19:20:49 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0B3Knn06346 for netdev-outgoing; Thu, 10 Jan 2002 19:20:49 -0800 Received: from mail.telcom.net ([157.238.95.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0B3Kcg06325 for ; Thu, 10 Jan 2002 19:20:39 -0800 Received: from gont.gont.com.ar (200-41-35-127-tnttasa1.impsat.net.ar [200.41.35.127]) by mail.telcom.net (8.11.6/8.11.6) with ESMTP id g0B2KEC81991 for ; Thu, 10 Jan 2002 21:20:16 -0500 (EST) Message-Id: <4.3.2.7.2.20020110000502.00cf6a30@mail.sitanium.com> X-Sender: ingroupcomar-fgont@mail.sitanium.com X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Thu, 10 Jan 2002 00:18:27 -0300 To: netdev@oss.sgi.com From: Fernando Gont Subject: socket() returns "Invalid argument". Why? (part 2 :) ) Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi! While I was making some tests with the socket functions, I intentionally called socket() this way: socket(9999, SOCK_STREAM, 0) (with a protocol family not supported). I was expecting to get a EPFNOSUPPORT, but I got a EINVAL error code, instead. After posting a message to comp.protocol.tcp-ip, someone noted that the socket.h header of his Solaris system had a PF_MAX constant, and that it'd be possible that a EINVAL was returned for any protofol family number greater than PF_MAX. I checked my Linux header files, and found in bits/socket.h the following constants definitions: #define PF_UNSPEC 0 /* Unspecified. */ #define PF_LOCAL 1 /* Local to host (pipes and file-domain). */ #define PF_UNIX PF_LOCAL /* Old BSD name for PF_LOCAL. */ [....] #define PF_SNA 22 /* Linux SNA Project */ #define PF_IRDA 23 /* IRDA sockets. */ #define PF_MAX 32 /* For now.. */ and I got surprised that the PF_MAX constant was *not* defined to be 23 (ie., the highest defined protocol number). Why is PF_MAX defined like this? I changed my socket() call to: socket(25, SOCK_STREAM, 0) (with a protocol familiy lower than PF_MAX) suspecting that perhaps now I'd get a EPFNOSUPPORT. But I still got a EINVAL error code. So that I had a look at the socket.c source code, and found the following: int sock_create(int family, int type, int protocol, struct socket **res) { int i; struct socket *sock; /* * Check protocol is in range */ if(family<0||family>=NPROTO) return -EINVAL; #if defined(CONFIG_KMOD) && defined(CONFIG_NET) /* Attempt to load a protocol module if the find failed. * * 12/09/1996 Marcin: But! this makes REALLY only sense, if the user * requested real, full-featured networking support upon configuration. * Otherwise module support will break! */ if (net_families[family]==NULL) { char module_name[30]; sprintf(module_name,"net-pf-%d",family); request_module(module_name); } #endif if (net_families[family]==NULL) return -EINVAL; I don't understand why the code says: if(family<0||family>=NPROTO) return -EINVAL; instead of: if(family<0||family>=PF_MAX) return -EPFNOSUPPORT; (Note that I check "family" against PF_MAX (instead of NPROTO), and that the return value is EPFNOSUPPORT (instead of EINVAL)) I mean: Why does the code use NPROTO instead of PF_MAX? What's the point of having PF_MAX (or NPROTO) defined to be a value greater than the highest defined protocol number? If the TCP/IP stack does not return a EPFNOSUPPORT errror code in this case (the call to socket I pointed out at the beginning of this message), when would it do it? TIA, Fernando Gont e-mail: fernando@gont.com.ar From owner-netdev@oss.sgi.com Fri Jan 11 17:36:00 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0C1a0W06930 for netdev-outgoing; Fri, 11 Jan 2002 17:36:00 -0800 Received: from dea.linux-mips.net (localhost [127.0.0.1]) by oss.sgi.com (8.11.2/8.11.3) with ESMTP id g0C1Zxg06927 for ; Fri, 11 Jan 2002 17:35:59 -0800 Received: (from ralf@localhost) by dea.linux-mips.net (8.11.1/8.11.1) id g0C0ZvF03031 for netdev@oss.sgi.com; Fri, 11 Jan 2002 16:35:57 -0800 Received: from gw-nl5.philips.com (gw-nl5.philips.com [212.153.235.99]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0BAEhg14711 for ; Fri, 11 Jan 2002 02:14:43 -0800 Received: from smtpscan-nl3.philips.com (localhost.philips.com [127.0.0.1]) by gw-nl5.philips.com with ESMTP id KAA14017; Fri, 11 Jan 2002 10:14:37 +0100 (MET) (envelope-from fabrizio.gennari@philips.com) From: fabrizio.gennari@philips.com Received: from smtpscan-nl3.philips.com(130.139.36.23) by gw-nl5.philips.com via mwrap (4.0a) id xma014015; Fri, 11 Jan 02 10:14:37 +0100 Received: from smtprelay-nl1.philips.com (localhost [127.0.0.1]) by smtpscan-nl3.philips.com (8.9.3/8.8.5-1.2.2m-19990317) with ESMTP id KAA00273; Fri, 11 Jan 2002 10:14:33 +0100 (MET) Received: from hbg001soh.diamond.philips.com (e1soh01.diamond.philips.com [130.143.165.212]) by smtprelay-nl1.philips.com (8.9.3/8.8.5-1.2.2m-19990317) with ESMTP id KAA27704; Fri, 11 Jan 2002 10:14:32 +0100 (MET) To: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: PPP over socket? X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 Message-ID: Date: Fri, 11 Jan 2002 10:13:57 +0100 X-MIMETrack: Serialize by Router on hbg001soh/H/SERVER/PHILIPS(Release 5.0.5 |September 22, 2000) at 11/01/2002 10:32:25, Serialize complete at 11/01/2002 10:32:25 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_alternative 00320E48C1256B3E_=" Sender: owner-netdev@oss.sgi.com Precedence: bulk This is a multipart message in MIME format. --=_alternative 00320E48C1256B3E_= Content-Type: text/plain; charset="us-ascii" I was wondering whether the socket architecture could be modified in order to support PPP connections over a generic socket (of type SOCK_DGRAM or SOCK_SEQPACKET), by mapping each PPP packet to a socket packet. This idea is not completely new: somebody raised is in the past, see for example http://oss.sgi.com/projects/netdev/mail/netdev/msg00180.html or http://oss.sgi.com/projects/netdev/mail/netdev/msg01127.html . The PPPoE sockets are an example of sockets which can be turned into PPP channels (in fact, they were thought to be used as PPP channels!), but they work even without PPP. Probably some features of it can be applied to the generic socket architecture. 1) add in the struct sock a flag, called for example bound_to_ppp 2) support in sock_ioctl the PPPIOCGCHAN ioctl: this would register a PPP channel in the PPP driver, set bound_to_ppp and return the channel index (in fact, in PPPoX/PPPoE only the two latter actions are done in PPPIOCGCHAN) 3) support an analogous ioctl for unbinding 3) add among the family-specific functions a PPP xmit function for the PPP channel, which passes the skb coming from PPP to the family-specific sendmsg 4) modify sock_queue_rcv_skb so, if bound_to_ppp is set, the packets are sent to ppp_input instead of being put in the receive queue 5) when the socket is disconnected, and bound_to_ppp is set, the channel should be unregistered and the relevant PPP interface brought down. Although it requires changes to socket architecture, this is probably feasible, and would simplify development of PPP support over different physical layers. Fabrizio Gennari Philips Research Monza via G.Casati 23, 20052 Monza (MI), Italy tel. +39 039 2037816, fax +39 039 2037800 --=_alternative 00320E48C1256B3E_= Content-Type: text/html; charset="us-ascii"
I was wondering whether the socket architecture could be modified in order to support PPP connections over a generic socket (of type SOCK_DGRAM or SOCK_SEQPACKET), by mapping each PPP packet to a socket packet. This idea is not completely new: somebody raised is in the past, see for example http://oss.sgi.com/projects/netdev/mail/netdev/msg00180.html or http://oss.sgi.com/projects/netdev/mail/netdev/msg01127.html .

The PPPoE sockets are an example of sockets which can be turned into PPP channels (in fact, they were thought to be used as PPP channels!), but they work even without PPP. Probably some features of it can be applied to the generic socket architecture.

1) add in the struct sock a flag, called for example bound_to_ppp
2) support in sock_ioctl the PPPIOCGCHAN ioctl: this would register a PPP channel in the PPP driver, set bound_to_ppp and return the channel index (in fact, in PPPoX/PPPoE only the two latter actions are done in PPPIOCGCHAN)
3) support an analogous ioctl for unbinding
3) add among the family-specific functions a PPP xmit function for the PPP channel, which passes the skb coming from PPP to the family-specific sendmsg
4) modify sock_queue_rcv_skb so, if bound_to_ppp is set, the packets are sent to ppp_input instead of being put in the receive queue
5) when the socket is disconnected, and bound_to_ppp is set, the channel should be unregistered and the relevant PPP interface brought down.

Although it requires changes to socket architecture, this is probably feasible, and would simplify development of PPP support over different physical layers.

Fabrizio Gennari
Philips Research Monza
via G.Casati 23, 20052 Monza (MI), Italy
tel. +39 039 2037816, fax +39 039 2037800
--=_alternative 00320E48C1256B3E_=-- From owner-netdev@oss.sgi.com Fri Jan 11 18:12:14 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0C2CED07496 for netdev-outgoing; Fri, 11 Jan 2002 18:12:14 -0800 Received: from www.linux.org.uk (IDENT:exim@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0C2CAg07492 for ; Fri, 11 Jan 2002 18:12:11 -0800 Received: from pakrat by www.linux.org.uk with local (Exim 3.33 #5) id 16PCiN-0000cV-00; Sat, 12 Jan 2002 01:12:07 +0000 Date: Sat, 12 Jan 2002 01:12:07 +0000 From: Chris Dukes To: fabrizio.gennari@philips.com Cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: PPP over socket? Message-ID: <20020112011207.F7199@parcelfarce.linux.theplanet.co.uk> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from fabrizio.gennari@philips.com on Fri, Jan 11, 2002 at 10:13:57AM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Jan 11, 2002 at 10:13:57AM +0100, fabrizio.gennari@philips.com wrote: > I was wondering whether the socket architecture could be modified in order > to support PPP connections over a generic socket (of type SOCK_DGRAM or > SOCK_SEQPACKET), by mapping each PPP packet to a socket packet. This idea > is not completely new: somebody raised is in the past, see for example > http://oss.sgi.com/projects/netdev/mail/netdev/msg00180.html or > http://oss.sgi.com/projects/netdev/mail/netdev/msg01127.html . vtun already provides this capability in user space. (See http://vtun.sourceforge.net/) ppp(8) on *BSD also provides this capability in user space as well. As memory serves PPPoE on Linux is partially implemented in userspace as is, so a partial user space solution for PPPoUDP shouldn't be that wretched. -- Chris Dukes "Bert is apparently EEEEVIL, whereas Oscar is just a sysadmin^Wgrouch." -- gorski From owner-netdev@oss.sgi.com Fri Jan 11 18:45:04 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0C2j4q07930 for netdev-outgoing; Fri, 11 Jan 2002 18:45:04 -0800 Received: from freeside.toyota.com (freeside.toyota.com [63.87.74.7]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0C2ixg07917 for ; Fri, 11 Jan 2002 18:44:59 -0800 Received: from uranium.tms.toyota.com (uranium.tms.toyota.com [10.49.36.228]) by freeside.toyota.com (8.11.2/8.11.2) with ESMTP id g0C1ipO21657; Fri, 11 Jan 2002 17:44:51 -0800 Received: from lexus.com (IDENT:jjs@localhost.localdomain [127.0.0.1]) by uranium.tms.toyota.com (8.11.6/8.11.2) with ESMTP id g0C1im208899; Fri, 11 Jan 2002 17:44:48 -0800 Message-ID: <3C3F950F.8010700@lexus.com> Date: Fri, 11 Jan 2002 17:44:47 -0800 From: J Sloan User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011221 X-Accept-Language: en-us MIME-Version: 1.0 To: Chris Dukes CC: fabrizio.gennari@philips.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: PPP over socket? References: <20020112011207.F7199@parcelfarce.linux.theplanet.co.uk> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Just my $.02 - vtund rocks, I learned about it when I took on a side job doing linux/vpn admin for a medium size company. vtund connects their branch offices to their main office - it encrypts and compresses traffic between the vpn boxes at each end, which in our case are iptables firewall boxes. I am impressed with it - as mentioned it's user space and works with linux, bsd or solaris.... cu jjs Chris Dukes wrote: >On Fri, Jan 11, 2002 at 10:13:57AM +0100, fabrizio.gennari@philips.com wrote: > >>I was wondering whether the socket architecture could be modified in order >>to support PPP connections over a generic socket (of type SOCK_DGRAM or >>SOCK_SEQPACKET), by mapping each PPP packet to a socket packet. This idea >>is not completely new: somebody raised is in the past, see for example >>http://oss.sgi.com/projects/netdev/mail/netdev/msg00180.html or >>http://oss.sgi.com/projects/netdev/mail/netdev/msg01127.html . >> > >vtun already provides this capability in user space. >(See http://vtun.sourceforge.net/) >ppp(8) on *BSD also provides this capability in user space as well. > >As memory serves PPPoE on Linux is partially implemented in userspace >as is, so a partial user space solution for PPPoUDP shouldn't be that >wretched. > From owner-netdev@oss.sgi.com Sun Jan 13 16:22:51 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0E0Mp117770 for netdev-outgoing; Sun, 13 Jan 2002 16:22:51 -0800 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0E0Mfg17763 for ; Sun, 13 Jan 2002 16:22:41 -0800 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.33 #1) id 16PtxU-0005PL-00 for netdev@oss.sgi.com; Mon, 14 Jan 2002 00:22:36 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.34 #1) id 16PUgc-0002S5-00; Sat, 12 Jan 2002 21:23:30 +0100 Date: Sat, 12 Jan 2002 21:23:30 +0100 From: Harald Welte To: Sami Ponkanen Cc: netdev@oss.sgi.com, Netfilter Development Mailinglist Subject: Re: [BUG] Kernel oops with slip+dnat Message-ID: <20020112212330.I7435@sunbeam.de.gnumonks.org> Mail-Followup-To: Harald Welte , Sami Ponkanen , netdev@oss.sgi.com, Netfilter Development Mailinglist References: <200201080948.LAA15338@vttmail.vtt.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.3.17i In-Reply-To: <200201080948.LAA15338@vttmail.vtt.fi>; from sami.ponkanen@vtt.fi on Tue, Jan 08, 2002 at 11:40:54AM +0200 X-Operating-System: Linux sunbeam.de.gnumonks.org 2.4.17 X-Date: Today is Setting Orange, the 10th day of Chaos in the YOLD 3168 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, Jan 08, 2002 at 11:40:54AM +0200, Sami Ponkanen wrote: > I posted this yesterday to netfilter-devel and linux networking lists, but I > was instructed that this list might suit better. Well, from my perspective it's not sure. > As I wrote earlier on netfilter list and more recently on linux networking > list, there is a bug that results in a kernel oops when using DNAT or > REDIRECT rule in the OUTPUT chain on a host with SLIP interfaces. [...] > I've traced the problem and it seems that the problem is following: > > A buffer for the packet is reserved in ip_build_xmit() > (net/ipv4/ip_output.c:627) and the correct size for the buffer is calculated > on line 667: > > int hh_len = (rt->u.dst.dev->hard_header_len + 15)&~15; > > Now here (I think, correct me if I'm wrong) the hard_header_len is just 1 > byte, the SLIP header byte. > > Later on the control goes through nf_hook_slow() (net/core/netfilter.c:445) > where the packet is put on another output device (skb->dev changes). The > new device has a different hard_header_len, but skb has only space for the > 1-byte SLIP header! Am I on the right tracks here? > > Ok, again few steps forward and the control reaches neigh_resolve_output() > (net/core/neighbour.c:950). Here the function dev->hard_header() is called > and consequently ether_header() (net/ethernet/eth.c:75) is called (why?). > Right in the beginning of the function the call to skb_push(skb, ETH_HLEN) > results in skb_under_panic() and BUG() and consequently the system crashes. mh. I'm not sure why we append an ethernet header, but in any case I think netfilter is expected to do some more work. So if we have a NAT rule in the OUTPUT chain, and we call route_me_harder() from ip_nat_local_fn() we need to check if the hh_len of the output device has changed. If it has, we need to check if skb has enough headroom and potentially re-allocate the skb headroom. Question to the networking gurus: Is it true that the core networking code expects the skb to have enough headroom for the hardware header at the time we return from the netfilter NF_IP_LOCAL_OUT hook? > A quick fix is to reserve few extra bytes in ip_build_xmit(). I tried > changing line 676 in ip_output.c from this: > int hh_len = (rt->u.dst.dev->hard_header_len + 15)&~15; > > into this: > int hh_len = (rt->u.dst.dev->hard_header_len + 31)&~15; > > and voila, no more oopses. Well, this is definitely not the correct way to > fix the problem, but it works for now. sure. As stated above, I think we need to re-allocate headroom inside the netfilter hook. > Now, a few questions came to my mind while debugging the problem. Firstly, > why do you put an ethernet header on a packet that is sent via the loopback > device? no idea. But as loopback is a physical device, it should have at least some information about which l3 protocol the packet is... and using ethernet seems convenient. > Regards, > Sami Pönkänen -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Sun Jan 13 16:22:53 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0E0Mr617777 for netdev-outgoing; Sun, 13 Jan 2002 16:22:53 -0800 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0E0Mfg17762 for ; Sun, 13 Jan 2002 16:22:41 -0800 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.33 #1) id 16PtxV-0005PR-00 for netdev@oss.sgi.com; Mon, 14 Jan 2002 00:22:37 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.34 #1) id 16PVGm-0002TA-00; Sat, 12 Jan 2002 22:00:52 +0100 Date: Sat, 12 Jan 2002 22:00:52 +0100 From: Harald Welte To: Sami Ponkanen Cc: netdev@oss.sgi.com, Netfilter Development Mailinglist Subject: Re: [BUG] Kernel oops with slip+dnat Message-ID: <20020112220052.J7435@sunbeam.de.gnumonks.org> Mail-Followup-To: Harald Welte , Sami Ponkanen , netdev@oss.sgi.com, Netfilter Development Mailinglist References: <200201080948.LAA15338@vttmail.vtt.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <200201080948.LAA15338@vttmail.vtt.fi>; from sami.ponkanen@vtt.fi on Tue, Jan 08, 2002 at 11:40:54AM +0200 X-Operating-System: Linux sunbeam.de.gnumonks.org 2.4.17 X-Date: Today is Setting Orange, the 10th day of Chaos in the YOLD 3168 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, Jan 08, 2002 at 11:40:54AM +0200, Sami Ponkanen wrote: > I posted this yesterday to netfilter-devel and linux networking lists, but I > was instructed that this list might suit better. Hi, following up my previous response, here's an untested patch implementing what I was talking about. Could you try this and report if it works? Thanks. --- linux-plain/net/ipv4/netfilter/ip_nat_standalone.c Sun Dec 2 21:14:38 2001 +++ linux-nfpom/net/ipv4/netfilter/ip_nat_standalone.c Sat Jan 12 22:01:25 2002 @@ -215,8 +215,26 @@ ret = ip_nat_fn(hooknum, pskb, in, out, okfn); if (ret != NF_DROP && ret != NF_STOLEN && ((*pskb)->nh.iph->saddr != saddr - || (*pskb)->nh.iph->daddr != daddr)) - return route_me_harder(*pskb) == 0 ? ret : NF_DROP; + || (*pskb)->nh.iph->daddr != daddr)) { + struct net_device *olddev; + + olddev = (*pskb)->dst->dev; + + if (route_me_harder(*pskb)) + return NF_DROP; + + if ((*pskb)->dst->dev != olddev) { + int hh_len = (*pskb)->dst->dev->hard_header_len; + + /* need to enlarge headroom if not enough for new + * hardware header */ + if (skb_headroom(*pskb) < hh_len + && skb_cow(*pskb, skb_headroom(*pskb)+hh_len)) + /* unable to allocate more headroom, + * drop packet */ + return NF_DROP; + } + } return ret; } > Sami -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Mon Jan 14 15:39:53 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0ENdrw26072 for netdev-outgoing; Mon, 14 Jan 2002 15:39:53 -0800 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0ENdlg26068 for ; Mon, 14 Jan 2002 15:39:48 -0800 Received: (qmail 10573 invoked from network); 14 Jan 2002 22:39:38 -0000 Received: from pd9e4e7be.dip.t-dialin.net (HELO worker.muc.bieringer.de) (217.228.231.190) by mail.bieringer.de with SMTP; 14 Jan 2002 22:39:38 -0000 Date: Mon, 14 Jan 2002 23:41:31 +0100 From: Peter Bieringer To: Maillist netdev Subject: A new LDP compatible Linux+IPv6-HOWTO is born Message-ID: <59600000.1011048091@localhost> X-Mailer: Mulberry/2.1.2 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi all, 6 weeks ago I got a trigger to wrote a LDP compatible IPv6 HOWTO for Linux. Today version 0.14 goes public. It's still not complete and will be filled further on. It's available on LDP as multi-part HTML: http://linuxdoc.org/HOWTO/Linux+IPv6-HOWTO/ also available as single-part HTML, PDF, PS and SGML, for URLs see http://linuxdoc.org/docs.html#howto or http://www.bieringer.de/linux/IPv6/ Fixes, corrections, suggestions are very welcome. BTW: the old "IPv6 & Linux - HowTo" will be still maintained, but size will decrease in the future. Hope this helps, Peter From owner-netdev@oss.sgi.com Mon Jan 14 16:47:33 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0F0lXn27935 for netdev-outgoing; Mon, 14 Jan 2002 16:47:33 -0800 Received: from bulldog.sacerdoti.org (cx421112-a.dt1.sdca.home.com [24.38.4.100]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0F0l6g27924 for ; Mon, 14 Jan 2002 16:47:06 -0800 Received: from there (unknown [192.168.1.51]) by bulldog.sacerdoti.org (Postfix) with SMTP id 4C84B77BB; Mon, 14 Jan 2002 15:45:25 -0800 (PST) Content-Type: text/plain; charset="iso-8859-1" From: Federico David Sacerdoti Organization: UCSD To: netdev@oss.sgi.com, davem@redhat.com, ak@muc.de, kuznet@ms2.inr.ac.ru Subject: New network monitoring proc file. Date: Mon, 14 Jan 2002 15:48:26 -0800 X-Mailer: KMail [version 1.3.1] Cc: linux-kernel@vger.kernel.org MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Message-Id: <20020114234525.4C84B77BB@bulldog.sacerdoti.org> Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, I would like to submit a patch that adds a /proc file to the kernel which monitors the health of active TCP connections. It does this by counting the number of duplicate ACKs sent out, among other things. I have a website detailing the exact metrics used and why I choose them: http://heron.ucsd.edu/tcphealth/ I have tested this patch on 5 i686 computers, and have had many downloads of it by users interested in my tcphealth gkrellm monitoring module. Since I am forced to update my patch often due to demand, I would like to formally submit it to you for inclusion in the new 2.5 development kernel. Sincerely, Federico Sacerdoti Here is the patch for the 2.5.1 kernel. It also works for kernel 2.5.2pre11. ----------- Start Patch ------------ diff -Naur pristine-linux-2.5.1/include/net/sock.h linux-2.5.1/include/net/sock.h --- pristine-linux-2.5.1/include/net/sock.h Mon Jan 14 13:43:41 2002 +++ linux-2.5.1/include/net/sock.h Mon Jan 14 13:48:49 2002 @@ -24,6 +24,7 @@ * Alan Cox : Eliminate low level recv/recvfrom * David S. Miller : New socket lookup architecture. * Steve Whitehouse: Default routines for sock_ops + * Federico D. Sacerdoti : Added TCP health counters. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -278,6 +279,7 @@ __u32 lrcvtime; /* timestamp of last received data packet*/ __u16 last_seg_size; /* Size of last incoming segment */ __u16 rcv_mss; /* MSS used for delayed ACK decisions */ + __u32 last_ack_sent; /* Sequence number of the last ack we sent. */ } ack; /* Data for direct copy to user */ @@ -418,6 +420,14 @@ int linger2; unsigned long last_synq_overflow; + + /* + * TCP health monitoring counters. + */ + __u32 dup_acks_sent; + __u32 dup_pkts_recv; + __u32 acks_sent; + __u32 pkts_recv; }; diff -Naur pristine-linux-2.5.1/net/ipv4/af_inet.c linux-2.5.1/net/ipv4/af_inet.c --- pristine-linux-2.5.1/net/ipv4/af_inet.c Mon Jan 14 13:43:45 2002 +++ linux-2.5.1/net/ipv4/af_inet.c Mon Jan 14 13:53:14 2002 @@ -56,6 +56,7 @@ * Some other random speedups. * Cyrus Durgin : Cleaned up file for kmod hacks. * Andi Kleen : Fix inet_stream_connect TCP race. + * Federico D. Sacerdoti : Added tcphealth /proc/net file. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -129,6 +130,7 @@ extern int afinet_get_info(char *, char **, off_t, int); extern int tcp_get_info(char *, char **, off_t, int); extern int udp_get_info(char *, char **, off_t, int); +extern int tcp_health_get_info(char *, char **, off_t, int); extern void ip_mc_drop_socket(struct sock *sk); #ifdef CONFIG_DLCI @@ -1196,6 +1198,7 @@ proc_net_create ("sockstat", 0, afinet_get_info); proc_net_create ("tcp", 0, tcp_get_info); proc_net_create ("udp", 0, udp_get_info); + proc_net_create ("tcphealth", 0, tcp_health_get_info); #endif /* CONFIG_PROC_FS */ return 0; } diff -Naur pristine-linux-2.5.1/net/ipv4/proc.c linux-2.5.1/net/ipv4/proc.c --- pristine-linux-2.5.1/net/ipv4/proc.c Mon Jan 14 13:43:45 2002 +++ linux-2.5.1/net/ipv4/proc.c Mon Jan 14 14:02:57 2002 @@ -26,6 +26,7 @@ * Andi Kleen : Add support for open_requests and * split functions for more readibility. * Andi Kleen : Add support for /proc/net/netstat + * Federico D. Sacerdoti : Added support for /proc/net/tcphealth * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -212,3 +213,97 @@ len = 0; return len; } + +/* + * Output /proc/net/tcphealth + */ +#define LINESZ 128 + +int tcp_health_get_info(char *buffer, char **start, off_t offset, int length) +{ + int len=0, i=0, num=0; + off_t pos=0, begin=0; + char tmpbuf[LINESZ+1], srcIP[32], destIP[32]; + + unsigned long dest, src, SmoothedRttEstimate, + AcksSent, DupAcksSent, PktsRecv, DupPktsRecv; + unsigned short destp, srcp; + + len = sprintf(buffer, + "TCP Health Monitoring (established connections only)\n" + " -Duplicate ACKs indicate lost or reordered packets on the connection.\n" + " -Duplicate Packets Received signal a slow and badly inefficient connection.\n" + " -RttEst estimates how long future packets will take on a round trip over the connection.\n" + "id Local Address Remote Address RttEst(ms) AcksSent " + "DupAcksSent PktsRecv DupPktsRecv\n"); + pos=len; + + /* Loop through established TCP connections */ + local_bh_disable(); + for (i=0; i < tcp_ehash_size; i++) { + struct tcp_ehash_bucket *head = &tcp_ehash[i]; + struct sock *sk; + struct tcp_opt *tp; + + read_lock(&head->lock); + for (sk=head->chain; sk; sk=sk->next) { + if (!TCP_INET_FAMILY(sk->family)) + continue; + pos+=LINESZ; + if (pos <= offset) + continue; + + dest = ntohl(sk->daddr); + src = ntohl(sk->rcv_saddr); + destp = ntohs(sk->dport); + srcp = ntohs(sk->sport); + + tp = &(sk->tp_pinfo.af_tcp); + SmoothedRttEstimate = (tp->srtt >> 3); + AcksSent = tp->acks_sent; + DupAcksSent = tp->dup_acks_sent; + PktsRecv = tp->pkts_recv; + DupPktsRecv = tp->dup_pkts_recv; + + sprintf(srcIP, "%lu.%lu.%lu.%lu:%u", + ((src >> 24) & 0xFF), ((src >> 16) & 0xFF), ((src >> 8) & 0xFF), (src & 0xFF), + srcp); + sprintf(destIP, "%lu.%lu.%lu.%lu:%u", + ((dest >> 24) & 0xFF), ((dest >> 16) & 0xFF), ((dest >> 8) & 0xFF), (dest & 0xFF), + destp); + + sprintf(tmpbuf, "%d: %-21s %-21s " + "%8lu %8lu %8lu %8lu %8lu", + num, + srcIP, + destIP, + SmoothedRttEstimate, + AcksSent, + DupAcksSent, + PktsRecv, + DupPktsRecv + ); + + len += sprintf(buffer+len, "%-*s\n", LINESZ-1, tmpbuf); + if(pos >= offset+length) { + read_unlock(&head->lock); + goto out; + } + num++; + } + read_unlock(&head->lock); + } + +out: + local_bh_enable(); + + begin = len - (pos - offset); + *start = buffer + begin; + len -= begin; + if(len>length) + len = length; + if (len<0) + len = 0; + return len; +} + diff -Naur pristine-linux-2.5.1/net/ipv4/tcp_input.c linux-2.5.1/net/ipv4/tcp_input.c --- pristine-linux-2.5.1/net/ipv4/tcp_input.c Mon Jan 14 13:43:45 2002 +++ linux-2.5.1/net/ipv4/tcp_input.c Mon Jan 14 14:12:57 2002 @@ -60,6 +60,7 @@ * Pasi Sarolahti, * Panu Kuhlberg: Experimental audit of TCP (re)transmission * engine. Lots of bugs are found. + * Federico D. Sacerdoti: Added TCP health monitoring. */ #include @@ -2496,6 +2497,8 @@ } if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) { + /* Course retransmit inefficiency- this packet has been received twice. */ + tp->dup_pkts_recv++; SOCK_DEBUG(sk, "ofo packet was already received \n"); __skb_unlink(skb, skb->list); __kfree_skb(skb); @@ -2608,6 +2611,10 @@ return; } + /* A packet is a "duplicate" if it contains bytes we have already received. */ + if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) + tp->dup_pkts_recv++; + if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) { /* A retransmit, 2nd most common case. Force an immediate ack. */ NET_INC_STATS_BH(DelayedACKLost); @@ -3241,6 +3248,14 @@ */ tp->saw_tstamp = 0; + + /* + * Tcp health monitoring is interested in + * total per-connection packet arrivals. + * This is in the fast path, but is quick. + */ + + tp->pkts_recv++; /* pred_flags is 0xS?10 << 16 + snd_wnd * if header_predition is to be made diff -Naur pristine-linux-2.5.1/net/ipv4/tcp_output.c linux-2.5.1/net/ipv4/tcp_output.c --- pristine-linux-2.5.1/net/ipv4/tcp_output.c Mon Jan 14 13:43:45 2002 +++ linux-2.5.1/net/ipv4/tcp_output.c Mon Jan 14 14:16:49 2002 @@ -33,6 +33,7 @@ * Andrea Arcangeli: SYNACK carry ts_recent in tsecr. * Cacophonix Gaul : draft-minshall-nagle-01 * J Hadi Salim : ECN support + * Federico D. Sacerdoti : Added TCP health monitoring. * */ @@ -1321,9 +1322,16 @@ TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK; TCP_SKB_CB(buff)->sacked = 0; + /* If the rcv_nxt has not advanced since sending our last ACK, this is a duplicate. */ + if (tp->rcv_nxt == tp->ack.last_ack_sent) + tp->dup_acks_sent++; + /* Record the total number of acks sent on this connection. */ + tp->acks_sent++; + /* Send it off, this clears delayed acks for us. */ TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq = tcp_acceptable_seq(sk, tp); TCP_SKB_CB(buff)->when = tcp_time_stamp; + tp->ack.last_ack_sent = tp->rcv_nxt; tcp_transmit_skb(sk, buff); } } -------------- End Patch ------------------- From owner-netdev@oss.sgi.com Mon Jan 14 18:53:31 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0F2rVu30270 for netdev-outgoing; Mon, 14 Jan 2002 18:53:31 -0800 Received: from amdext.amd.com (amdext.amd.com [139.95.251.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0F2rTg30267 for ; Mon, 14 Jan 2002 18:53:29 -0800 Received: from ssvlgs01.amd.com (ssvlgs01.amd.com [139.95.250.16]) by amdext.amd.com (8.9.3/8.9.3/AMD) with SMTP id RAA25500 for ; Mon, 14 Jan 2002 17:53:22 -0800 (PST) Received: from 139.95.250.1 by ssvlgs01.amd.com with ESMTP (Tumbleweed MMS SMTP Relay (MMS v4.7)); Mon, 14 Jan 2002 17:53:21 -0800 X-Server-Uuid: 02753650-11b0-11d5-bbc5-00508bf987eb Received: from cmdmail.amd.com (cmdmail.amd.com [172.28.14.226]) by amdint.amd.com (8.9.3/8.9.3/AMD) with ESMTP id RAA21832 for ; Mon, 14 Jan 2002 17:53:21 -0800 (PST) Received: from cmdmail.amd.com (IDENT:amitg@vegi33 [172.28.20.33]) by cmdmail.amd.com (8.9.1a-LCCHA/8.9.0/lccha 1.5) with ESMTP id RAA29437 for ; Mon, 14 Jan 2002 17:53:20 -0800 (PST) Message-ID: <3C438B90.F632A9DC@cmdmail.amd.com> Date: Mon, 14 Jan 2002 17:53:20 -0800 From: "Amit Gupta" X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.16-3 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: arpd not working in 2.4.17 or 2.5.1 X-WSS-ID: 105D541A190546-01-01 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi All, I am running 2.5.1 kernel on a 2 AMD processor system and have enable routing messages, netlink and arpd support inside kernel as described in arpd docs. Then after making 36 character devices, when I run arpd, it's starts up but always keeps silent (strace) and the kernel also does not keep it's 256 arp address limit. Pls help fix it, I need linux to be able to talk to more than 1024 clients. Thanks in Advance. Amit amit.gupta@amd.com From owner-netdev@oss.sgi.com Mon Jan 14 21:48:34 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0F5mYj02446 for netdev-outgoing; Mon, 14 Jan 2002 21:48:34 -0800 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0F5mVP02441 for ; Mon, 14 Jan 2002 21:48:31 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id UAA18115; Mon, 14 Jan 2002 20:47:04 -0800 Date: Mon, 14 Jan 2002 20:47:04 -0800 (PST) Message-Id: <20020114.204704.21652738.davem@redhat.com> To: fds@cs.ucsd.edu Cc: netdev@oss.sgi.com, ak@muc.de, kuznet@ms2.inr.ac.ru, linux-kernel@vger.kernel.org Subject: Re: New network monitoring proc file. From: "David S. Miller" In-Reply-To: <20020114234525.4C84B77BB@bulldog.sacerdoti.org> References: <20020114234525.4C84B77BB@bulldog.sacerdoti.org> X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: Federico David Sacerdoti Date: Mon, 14 Jan 2002 15:48:26 -0800 I would like to submit a patch that adds a /proc file to the kernel which monitors the health of active TCP connections. It does this by counting the number of duplicate ACKs sent out, among other things. I have a website detailing the exact metrics used and why I choose them: http://heron.ucsd.edu/tcphealth/ I would rather that you add this to the tcp_diag facility in 2.4.x instead of creating yet another proc file. tcp_diag is designed perfectly for fetching the kind of information your TCP health monitor is providing. This is irregardless of whether your selection of health metrics is sound or not, I have not looked into this part at all. But it will have to be discussed before we think about adding the changes. From owner-netdev@oss.sgi.com Tue Jan 15 14:15:31 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0FMFVe05935 for netdev-outgoing; Tue, 15 Jan 2002 14:15:31 -0800 Received: from zcars0m9.ca.nortel.com (zcars0m9.nortelnetworks.com [47.129.242.157]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0FMDXP05887 for ; Tue, 15 Jan 2002 14:13:33 -0800 Received: from zcars04f.ca.nortel.com (zcars04f.ca.nortel.com [47.129.242.57]) by zcars0m9.ca.nortel.com (Switch-2.2.0/Switch-2.2.0) with ESMTP id g0FLDIn13371; Tue, 15 Jan 2002 16:13:18 -0500 (EST) Received: from zcard00m.ca.nortel.com (zcard00m.ca.nortel.com [47.129.26.62]) by zcars04f.ca.nortel.com (Switch-2.2.0/Switch-2.2.0) with ESMTP id g0FLDFD13258; Tue, 15 Jan 2002 16:13:16 -0500 (EST) Received: from zcard0k6.ca.nortel.com ([47.129.242.158]) by zcard00m.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id C8F7JCP1; Tue, 15 Jan 2002 16:13:13 -0500 Received: from pcard0ks.ca.nortel.com ([47.129.117.131]) by zcard0k6.ca.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id Y7V6CKJT; Tue, 15 Jan 2002 16:13:15 -0500 Received: from nortelnetworks.com (localhost.localdomain [127.0.0.1]) by pcard0ks.ca.nortel.com (Postfix) with ESMTP id D670A4CD1; Tue, 15 Jan 2002 16:19:31 -0500 (EST) Message-ID: <3C449CE3.FBA52C68@nortelnetworks.com> Date: Tue, 15 Jan 2002 16:19:31 -0500 X-Sybari-Space: 00000000 00000000 00000000 From: Chris Friesen X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.16 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: how to do DIVERT socket equivalent with netfilter? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk We've got a legacy app with its own message stacks, one for tcp/icmp and one for udp/sctp. Currently in 2.2 we're using multiple divert sockets with appropriate ipchains rules to direct the right traffic to each socket. The app asks for messages for one of the two stacks, we check if there is anything on that socket, and if there is anything we pass it up the stack. Now we're looking to make the thing work on 2.4. Unfortunately, it doesn't look like DIVERT sockets are supported in 2.4, so I started looking at netfilter's QUEUE target. This looks fine, except that there is only a single queue and I'd like at least two. Does anyone know of anything that 1) gives me multiple queues/sockets based on protocol (like DIVERT sockets) 2) ensures that the kernel itself doesn't try and handle the packet, resulting in destination unreachable error packets (like DIVERT and netfilter) 3) works on 2.4 I can always filter the incoming messages by protocol and store them in a pair of message queues (one for each stack) in the lower level of the app itself, but this seems kind of kludgy and I'm sure there's gotta be a better way. If there is something already available I'd love to hear about it. Any ideas? Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com From owner-netdev@oss.sgi.com Tue Jan 15 14:24:22 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0FMOM606178 for netdev-outgoing; Tue, 15 Jan 2002 14:24:22 -0800 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0FMOKP06175 for ; Tue, 15 Jan 2002 14:24:20 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id NAA20082; Tue, 15 Jan 2002 13:21:33 -0800 Date: Tue, 15 Jan 2002 13:21:32 -0800 (PST) Message-Id: <20020115.132132.62388900.davem@redhat.com> To: cfriesen@nortelnetworks.com Cc: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: how to do DIVERT socket equivalent with netfilter? From: "David S. Miller" In-Reply-To: <3C449CE3.FBA52C68@nortelnetworks.com> References: <3C449CE3.FBA52C68@nortelnetworks.com> X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: Chris Friesen Date: Tue, 15 Jan 2002 16:19:31 -0500 Now we're looking to make the thing work on 2.4. Unfortunately, it doesn't look like DIVERT sockets are supported in 2.4 Umm... linux/net/core/dv.c implement the divert stuff just like the 2.2.x copy does? Franks a lot, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Jan 15 21:27:50 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0G5RoU15245 for netdev-outgoing; Tue, 15 Jan 2002 21:27:50 -0800 Received: from exalane.intransa.com ([66.89.142.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0G5RlP15242 for ; Tue, 15 Jan 2002 21:27:47 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: questions on IP bonding... Date: Tue, 15 Jan 2002 20:27:45 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: questions on IP bonding... Thread-Index: AcGeRiUvW11q1dPaTuKFPSZfYLboTw== From: "Linda Wang" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id g0G5RmP15243 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi, I would like to get some clearification on IP bonding and how it was implemented. Looks like IP bonding supports aggregation of ports between 2 end-nodes (hosts) possbily thourgh mutliple hopes, is this correct? (This seems to be a supper set of IEEE 802.3ad Link aggreation implementation.) Also, IP bonding seems only applicable to multiple ethernet ports on the same IP subnet, though not necessarily the same switch? Is that correct? Last but not least, can someone let me know if there is some documentation on the implementation? many thanks -linda From owner-netdev@oss.sgi.com Wed Jan 16 15:59:40 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0GNxef27200 for netdev-outgoing; Wed, 16 Jan 2002 15:59:40 -0800 Received: from dea.linux-mips.net (localhost [127.0.0.1]) by oss.sgi.com (8.11.2/8.11.3) with ESMTP id g0GNxeP27197 for ; Wed, 16 Jan 2002 15:59:40 -0800 Received: (from ralf@localhost) by dea.linux-mips.net (8.11.1/8.11.1) id g0GMxcI03314 for netdev@oss.sgi.com; Wed, 16 Jan 2002 14:59:38 -0800 Received: from VL-MS-MR001.sc1.videotron.ca (relais.videotron.ca [24.201.245.36]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0GItqP04297 for ; Wed, 16 Jan 2002 10:55:53 -0800 Received: from 8d.com ([66.130.115.91]) by VL-MS-MR001.sc1.videotron.ca (Netscape Messaging Server 4.15) with ESMTP id GQ1LT105.25H for ; Wed, 16 Jan 2002 12:55:49 -0500 Message-ID: <3C45BEA4.A633749C@8d.com> Date: Wed, 16 Jan 2002 12:55:48 -0500 From: Dominic Duval X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Testing tools Content-Type: multipart/alternative; boundary="------------FF537E312A39BC19F2E198A0" Sender: owner-netdev@oss.sgi.com Precedence: bulk --------------FF537E312A39BC19F2E198A0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi everyone, I'm currently looking for some good, open-source, performance testing tools such as Netpipe in order to do some stress-testing on various network-related parts of the Kernel. What do you people use when you're ready to test modifications to the stack or new network drivers? In fact, I was curious to find out if there are any de facto tool used to test the network, evaluate performances (bandwidth, latency, etc.) when it comes to Kernel-related development. Thanks a lot, -- Dominic Duval 8D Technologies inc. dd@8D.com http://www.8D.com/ --------------FF537E312A39BC19F2E198A0 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Hi everyone,

I'm currently looking for some good, open-source, performance testing tools such as Netpipe in order to do some stress-testing on various network-related parts of the Kernel. What do you people use when you're ready to test modifications to the stack or new network drivers?

In fact, I was curious to find out if there are any de facto tool used to test the network, evaluate performances (bandwidth, latency, etc.) when it comes to Kernel-related development.

Thanks a lot,

--
Dominic Duval                           8D Technologies inc.
dd@8D.com                               http://www.8D.com/

  --------------FF537E312A39BC19F2E198A0-- From owner-netdev@oss.sgi.com Wed Jan 16 20:09:51 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0H49p332637 for netdev-outgoing; Wed, 16 Jan 2002 20:09:51 -0800 Received: from x86unx3.comp.nus.edu.sg (x86unx3.comp.nus.edu.sg [137.132.90.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0H49lP32634 for ; Wed, 16 Jan 2002 20:09:48 -0800 Received: from sf0.comp.nus.edu.sg (mksarav@sf0.comp.nus.edu.sg [137.132.90.52]) by x86unx3.comp.nus.edu.sg (8.9.1/8.9.1) with ESMTP id LAA14232; Thu, 17 Jan 2002 11:09:42 +0800 (GMT-8) Received: from localhost (mksarav@localhost) by sf0.comp.nus.edu.sg (8.8.5/8.8.5) with ESMTP id LAA03756; Thu, 17 Jan 2002 11:09:41 +0800 (GMT-8) Date: Thu, 17 Jan 2002 11:09:40 +0800 (GMT-8) From: M K Saravanan To: Dominic Duval cc: netdev@oss.sgi.com Subject: Re: Testing tools In-Reply-To: <3C45BEA4.A633749C@8d.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, 16 Jan 2002, Dominic Duval forced the electrons thusly: > In fact, I was curious to find out if there are any de facto tool used > to test the network, evaluate performances (bandwidth, latency, etc.) > when it comes to Kernel-related development. Try this: http://dast.nlanr.net/Tools.html Lot of tools are listed there. -- mks -- From owner-netdev@oss.sgi.com Thu Jan 17 15:28:10 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0HNSAC02913 for netdev-outgoing; Thu, 17 Jan 2002 15:28:10 -0800 Received: from mail.somanetworks.com ([63.204.6.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0HNRlP02907 for ; Thu, 17 Jan 2002 15:27:47 -0800 Received: from somanetworks.com ([10.11.10.14]) by mail.somanetworks.com (Netscape Messaging Server 4.15) with ESMTP id GQ3T1E00.GIW; Thu, 17 Jan 2002 14:27:14 -0800 Received: (from mjfrazer@localhost) by somanetworks.com (8.11.2/8.11.2) id g0HMRDW01983; Thu, 17 Jan 2002 17:27:13 -0500 Date: Thu, 17 Jan 2002 17:27:13 -0500 From: "Mark Frazer" To: davem@redhat.com, ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Cc: Linux Kernel Subject: [RFC][PATCH] new sysctl net/ipv4/ip_default_bind Message-ID: <20020117172713.A1893@somanetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-Message-Flag: Lookout! Organization: Detectable, well, not really Sender: owner-netdev@oss.sgi.com Precedence: bulk The following patch applies against 2.4.17 and creates a new sysctl node /proc/sys/net/ipv4/ip_default_bind. The purpose of the control is to allow a default IP address to be selected by the sysadmin for outgoing connections. That is, for sockets which do not bind(2) the local end of the socket before calling connect(2). For high-availability, we have several numbered interfaces on the same subnet. There is a virtual interface which is expected to be highly available. In order for connections to survive the disconnection of one or the physical interfaces, all connections should use the IP address of the virtual interface. We cannot use bonding as we have some cheezy tulip chip without an input for the link state signal provided by the PHY. This patch causes legacy applications such as telnet to behave the way we like them to with no apparent adverse affects. Does anyone find anything particularly offensive about this? cheers -mark diff -Nur linux/include/linux/sysctl.h linux.mjf/include/linux/sysctl.h --- linux/include/linux/sysctl.h Mon Nov 26 09:55:36 2001 +++ linux.mjf/include/linux/sysctl.h Wed Jan 16 22:47:06 2002 @@ -289,7 +289,8 @@ NET_TCP_ADV_WIN_SCALE=87, NET_IPV4_NONLOCAL_BIND=88, NET_IPV4_ICMP_RATELIMIT=89, - NET_IPV4_ICMP_RATEMASK=90 + NET_IPV4_ICMP_RATEMASK=90, + NET_IPV4_DEFAULT_BIND=91 }; enum { @@ -641,6 +642,8 @@ void *buffer, size_t *lenp); extern int proc_dostring(ctl_table *, int, struct file *, + void *, size_t *); +extern int proc_doinaddr(ctl_table *, int, struct file *, void *, size_t *); extern int proc_dointvec(ctl_table *, int, struct file *, void *, size_t *); diff -Nur linux/kernel/sysctl.c linux.mjf/kernel/sysctl.c --- linux/kernel/sysctl.c Wed Jan 16 22:34:43 2002 +++ linux.mjf/kernel/sysctl.c Wed Jan 16 22:34:14 2002 @@ -806,6 +806,104 @@ return r; } +/* parse an ipv4 addr, don't take no crap */ +#include +static int proc_inet_aton (char const *c, int blen, struct in_addr *addr) +{ + unsigned int _n[4] = {0}; + unsigned int *n = _n; + + while (blen && isspace (*c)) { + ++c; + --blen; + } + while (blen) { + if (!isdigit (*c)) + return 1; + while (blen && isdigit (*c)) { + *n = *n * 10 + *c++ - '0'; + --blen; + if (*n > 255) /* error: stop */ + return 1; + } + if (blen && '.' == *c) { + ++c; + --blen; + if (!blen) /* error: need more digits */ + return 1; + if (n == &_n[3]) /* error: don't inc n */ + return 1; + ++n; + continue; + } else { /* should have been last char */ + if (blen && !isspace (*c)) + return 1; + else + break; + } + } + if (n != &_n[3]) + return 1; + + addr->s_addr = htonl (_n[0]<<24 | _n[1]<<16 | _n[2]<<8 | _n[3]); + return 0; +} + + +/** + * proc_doinaddr - read an ipv4 dotted-decimal network address + * @table: the sysctl table + * @write: %TRUE if this is a write to the sysctl file + * @filp: the file structure + * @buffer: the user buffer + * @lenp: the size of the user buffer + * + * Reads/writes a single ip4v network address in dotted-decimal notation. + * The user buffer is an ASCII string. + * + * Returns: -EFAULT on kernel->user I/O error, 0 otherwise. + */ +int proc_doinaddr (ctl_table *table, int write, struct file *filp, + void *buffer, size_t *lenp) +{ + #define TMPBUFLEN 20 + char buf[TMPBUFLEN]; + size_t len; + + if (!table->data || table->maxlen != sizeof (struct in_addr) || !*lenp + || (filp->f_pos && !write)) { + *lenp = 0; + return 0; + } + + if (write) { + struct in_addr addr; + if (*lenp > TMPBUFLEN - 2) + return 0; + len = *lenp; + if (copy_from_user (buf, buffer, len)) + return -EFAULT; + buf[len] = 0; + if (! proc_inet_aton (buf, len, &addr)) + ((struct in_addr*)table->data)->s_addr = addr.s_addr; + filp->f_pos += len; + } else { + uint32_t addr = ntohl (((struct in_addr*)table->data)->s_addr); + len = snprintf (buf, TMPBUFLEN - 2, "%d.%d.%d.%d\n", + (addr >> 24) & 0xff, (addr >> 16) & 0xff, + (addr >> 8) & 0xff, (addr) & 0xff); + buf[len] = 0; /* kernel snprintf never returns -1 */ + if (len > *lenp) + len = *lenp; + if (copy_to_user (buffer, buf, len)) + return -EFAULT; + *lenp = len; + filp->f_pos += len; + } + + return 0; +} + #define OP_SET 0 #define OP_AND 1 #define OP_OR 2 diff -Nur linux/net/ipv4/af_inet.c linux.mjf/net/ipv4/af_inet.c --- linux/net/ipv4/af_inet.c Wed Jan 16 22:34:43 2002 +++ linux.mjf/net/ipv4/af_inet.c Thu Jan 17 16:16:03 2002 @@ -469,6 +469,8 @@ /* It is off by default, see below. */ int sysctl_ip_nonlocal_bind; +/* Default local address to use. */ +struct in_addr sysctl_ip_default_bind; static int inet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { @@ -484,6 +486,11 @@ if (addr_len < sizeof(struct sockaddr_in)) return -EINVAL; + + /* this will catch UDP sockets not bound before connect() */ + if (addr->sin_addr.s_addr == INADDR_ANY) { + addr->sin_addr.s_addr = sysctl_ip_default_bind.s_addr; + } chk_addr_ret = inet_addr_type(addr->sin_addr.s_addr); diff -Nur linux/net/ipv4/sysctl_net_ipv4.c linux.mjf/net/ipv4/sysctl_net_ipv4.c --- linux/net/ipv4/sysctl_net_ipv4.c Fri Nov 23 09:26:31 2001 +++ linux.mjf/net/ipv4/sysctl_net_ipv4.c Wed Jan 16 20:24:01 2002 @@ -17,6 +17,7 @@ /* From af_inet.c */ extern int sysctl_ip_nonlocal_bind; +extern struct in_addr sysctl_ip_default_bind; /* From icmp.c */ extern int sysctl_icmp_echo_ignore_all; @@ -115,6 +116,9 @@ {NET_IPV4_NONLOCAL_BIND, "ip_nonlocal_bind", &sysctl_ip_nonlocal_bind, sizeof(int), 0644, NULL, &proc_dointvec}, + {NET_IPV4_DEFAULT_BIND, "ip_default_bind", + &sysctl_ip_default_bind, sizeof(struct in_addr), 0644, NULL, + &proc_doinaddr}, {NET_IPV4_TCP_SYN_RETRIES, "tcp_syn_retries", &sysctl_tcp_syn_retries, sizeof(int), 0644, NULL, &proc_dointvec}, {NET_TCP_SYNACK_RETRIES, "tcp_synack_retries", diff -Nur linux/net/ipv4/tcp_ipv4.c linux.mjf/net/ipv4/tcp_ipv4.c --- linux/net/ipv4/tcp_ipv4.c Wed Jan 16 22:34:43 2002 +++ linux.mjf/net/ipv4/tcp_ipv4.c Thu Jan 17 16:15:58 2002 @@ -643,6 +643,7 @@ } /* This will initiate an outgoing connection. */ +extern struct in_addr sysctl_ip_default_bind; int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) { struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp); @@ -665,6 +666,11 @@ return -EINVAL; nexthop = sk->protinfo.af_inet.opt->faddr; } + + /* This will catch TCP sockets not bound before connect */ + if (sk->saddr == INADDR_ANY) { + sk->saddr = sysctl_ip_default_bind.s_addr; + } tmp = ip_route_connect(&rt, nexthop, sk->saddr, RT_CONN_FLAGS(sk), sk->bound_dev_if); From owner-netdev@oss.sgi.com Thu Jan 17 15:35:03 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0HNZ3p03114 for netdev-outgoing; Thu, 17 Jan 2002 15:35:03 -0800 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0HNZ2P03111 for ; Thu, 17 Jan 2002 15:35:02 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id OAA12047; Thu, 17 Jan 2002 14:33:15 -0800 Date: Thu, 17 Jan 2002 14:33:15 -0800 (PST) Message-Id: <20020117.143315.85394543.davem@redhat.com> To: mark@somanetworks.com Cc: ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH] new sysctl net/ipv4/ip_default_bind From: "David S. Miller" In-Reply-To: <20020117172713.A1893@somanetworks.com> References: <20020117172713.A1893@somanetworks.com> X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk If you setup your routes properly, one will be marked "primary" and will be used for outgoing address selection. From owner-netdev@oss.sgi.com Thu Jan 17 15:41:26 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0HNfQC03262 for netdev-outgoing; Thu, 17 Jan 2002 15:41:26 -0800 Received: from mailout04.sul.t-online.com (mailout04.sul.t-online.com [194.25.134.18]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0HNfMP03259 for ; Thu, 17 Jan 2002 15:41:23 -0800 Received: from fwd10.sul.t-online.de by mailout04.sul.t-online.com with smtp id 16RLDf-0000gE-03; Thu, 17 Jan 2002 23:41:15 +0100 Received: from averell.firstfloor.org (520003261363-0001@[80.130.9.68]) by fmrl10.sul.t-online.com with esmtp id 16RLDW-1EbUjAC; Thu, 17 Jan 2002 23:41:06 +0100 Received: by averell.firstfloor.org (Postfix on SuSE Linux 7.2 (i386), from userid 500) id E6D2069C5C; Thu, 17 Jan 2002 23:41:03 +0100 (CET) Date: Thu, 17 Jan 2002 23:41:03 +0100 From: Andi Kleen To: Mark Frazer Cc: davem@redhat.com, ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, Linux Kernel Subject: Re: [RFC][PATCH] new sysctl net/ipv4/ip_default_bind Message-ID: <20020117234103.A2797@averell> References: <20020117172713.A1893@somanetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.16i In-Reply-To: <20020117172713.A1893@somanetworks.com>; from mark@somanetworks.com on Thu, Jan 17, 2002 at 11:27:13PM +0100 X-Sender: 520003261363-0001@t-dialin.net Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Jan 17, 2002 at 11:27:13PM +0100, Mark Frazer wrote: > The following patch applies against 2.4.17 and creates a new sysctl > node /proc/sys/net/ipv4/ip_default_bind. The purpose of the control > is to allow a default IP address to be selected by the sysadmin for > outgoing connections. That is, for sockets which do not bind(2) the > local end of the socket before calling connect(2). You can already do that using the 'from' attribute in iproute2 aka prefered source address per route. Just set it for your default route. -Andi From owner-netdev@oss.sgi.com Thu Jan 17 15:55:11 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0HNtBY03552 for netdev-outgoing; Thu, 17 Jan 2002 15:55:11 -0800 Received: from mail.somanetworks.com ([63.204.6.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0HNt8P03549 for ; Thu, 17 Jan 2002 15:55:08 -0800 Received: from somanetworks.com ([10.11.10.14]) by mail.somanetworks.com (Netscape Messaging Server 4.15) with ESMTP id GQ3UAF00.DIC; Thu, 17 Jan 2002 14:54:15 -0800 Received: (from mjfrazer@localhost) by somanetworks.com (8.11.2/8.11.2) id g0HMsEa02264; Thu, 17 Jan 2002 17:54:14 -0500 Date: Thu, 17 Jan 2002 17:54:14 -0500 From: "Mark Frazer" To: Andi Kleen Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, Linux Kernel Subject: Re: [RFC][PATCH] new sysctl net/ipv4/ip_default_bind Message-ID: <20020117175414.A2187@somanetworks.com> References: <20020117172713.A1893@somanetworks.com> <20020117234103.A2797@averell> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20020117234103.A2797@averell>; from ak@muc.de on Thu, Jan 17, 2002 at 11:41:03PM +0100 X-Message-Flag: Lookout! Organization: Detectable, well, not really Sender: owner-netdev@oss.sgi.com Precedence: bulk Doh. I was using the old SIOCADDRT to add routes and such. Off to learn rtnetlink... thanks -mark Andi Kleen [02/01/17 17:42]: > You can already do that using the 'from' attribute in iproute2 > aka prefered source address per route. Just set it for your default > route. > > -Andi -- "we are like unbaked soma vessels" From owner-netdev@oss.sgi.com Thu Jan 17 21:11:21 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0I5BLv10708 for netdev-outgoing; Thu, 17 Jan 2002 21:11:21 -0800 Received: from noxmail.sandelman.ottawa.on.ca (cyphermail.sandelman.ottawa.on.ca [192.139.46.78]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0I5BJP10704 for ; Thu, 17 Jan 2002 21:11:19 -0800 Received: from marajade.sandelman.ottawa.on.ca ([2002:c08b:2e21:2:204:76ff:fe2d:8c]) by noxmail.sandelman.ottawa.on.ca (8.11.6/8.11.6) with ESMTP id g0I4BBn04439 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK); Thu, 17 Jan 2002 23:11:15 -0500 (EST) Received: from marajade.sandelman.ottawa.on.ca (localhost [[UNIX: localhost]]) by marajade.sandelman.ottawa.on.ca (8.11.6/8.11.0) with ESMTP id g0I452001668; Thu, 17 Jan 2002 23:05:05 -0500 (EST) Message-Id: <200201180405.g0I452001668@marajade.sandelman.ottawa.on.ca> To: "Mark Frazer" cc: netdev@oss.sgi.com Subject: Re: [RFC][PATCH] new sysctl net/ipv4/ip_default_bind In-reply-to: Your message of "Thu, 17 Jan 2002 17:27:13 EST." <20020117172713.A1893@somanetworks.com> Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Thu, 17 Jan 2002 23:05:01 -0500 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk I think that you can do the same thing with advanced routing, using the "src" option to "ip route". Set this on your default route. ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[ ] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ From owner-netdev@oss.sgi.com Fri Jan 18 06:00:13 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0IE0DQ24162 for netdev-outgoing; Fri, 18 Jan 2002 06:00:13 -0800 Received: from melanieb.vtt.fi (melanieb.vtt.fi [130.188.1.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0IE0BP24159 for ; Fri, 18 Jan 2002 06:00:11 -0800 Received: from mailgw.vtt.fi (localhost [127.0.0.1]) by melanieb.vtt.fi (8.9.3/8.9.3) with ESMTP id PAA03531; Fri, 18 Jan 2002 15:00:05 +0200 (EET) Received: from vttmail.vtt.fi (vttmail.vtt.fi [130.188.1.4]) by mailgw.vtt.fi (8.9.3/8.9.3) with ESMTP id PAA07492; Fri, 18 Jan 2002 15:00:04 +0200 (EET) Received: from there (tte3168.tte.vtt.fi [130.188.71.92]) by vttmail.vtt.fi (8.9.3/8.9.3) with SMTP id PAA03805; Fri, 18 Jan 2002 15:00:03 +0200 (EET) Message-Id: <200201181300.PAA03805@vttmail.vtt.fi> Content-Type: text/plain; charset="iso-8859-1" From: Sami Ponkanen Organization: VTT Information Technology To: Harald Welte Subject: Re: [BUG] Kernel oops with slip+dnat Date: Fri, 18 Jan 2002 14:52:00 +0200 X-Mailer: KMail [version 1.3.2] Cc: netdev@oss.sgi.com, Netfilter Development Mailinglist References: <200201080948.LAA15338@vttmail.vtt.fi> <20020112220052.J7435@sunbeam.de.gnumonks.org> In-Reply-To: <20020112220052.J7435@sunbeam.de.gnumonks.org> MIME-Version: 1.0 X-MIME-Autoconverted: from 8bit to quoted-printable by melanieb.vtt.fi id PAA03531 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id g0IE0CP24160 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Saturday 12 January 2002 23:00, Harald Welte wrote: > Hi, following up my previous response, here's an untested patch > implementing what I was talking about. Could you try this and report if it > works? Thank You Harald for the patch! I did not try it yet, but instead I tried linux-2.4.18-pre3-ac2, which fixes the same problem. Sami Pönkänen From owner-netdev@oss.sgi.com Fri Jan 18 08:36:33 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0IGaXp31623 for netdev-outgoing; Fri, 18 Jan 2002 08:36:33 -0800 Received: from steam.ssi.bg (steam.ssi.bg [212.95.166.19]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0IGaSP31620 for ; Fri, 18 Jan 2002 08:36:28 -0800 Received: (qmail 31410 invoked from network); 18 Jan 2002 15:36:12 -0000 Received: from unamed.infotel.bg (HELO alex) (@212.39.68.18) by steam.ssi.bg with SMTP; 18 Jan 2002 15:36:12 -0000 Message-ID: <002001c1a035$cd507ec0$5d28a4cd@alex.himel.bg> From: "Alexander Atanasov" To: Cc: "J Hadi Salim" , "Alexey Kuznetsov" Subject: sch_gred.c wrong error checking Date: Fri, 18 Jan 2002 17:35:42 +0200 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_001E_01C1A046.8D492320" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Sender: owner-netdev@oss.sgi.com Precedence: bulk This is a multi-part message in MIME format. ------=_NextPart_000_001E_01C1A046.8D492320 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi there! A small copy&paste bug in sch_gred.c when checking for failed kmalloc. -- have fun, alex ------=_NextPart_000_001E_01C1A046.8D492320 Content-Type: application/octet-stream; name="sch_gred-2.4-wrongnullcheck.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="sch_gred-2.4-wrongnullcheck.diff" LS0tIHNjaF9ncmVkLmMub3JpZwlGcmkgSmFuIDE4IDE3OjIyOjExIDIwMDIKKysrIHNjaF9ncmVk LmMJRnJpIEphbiAxOCAxNzoyMjozNCAyMDAyCkBAIC00MzYsNyArNDM2LDcgQEAKIAkJaWYgKHRh YmxlLT50YWJbdGFibGUtPmRlZl0gPT0gTlVMTCkgewogCQkJdGFibGUtPnRhYlt0YWJsZS0+ZGVm XT0KIAkJCQlrbWFsbG9jKHNpemVvZihzdHJ1Y3QgZ3JlZF9zY2hlZF9kYXRhKSwgR0ZQX0tFUk5F TCk7Ci0JCQlpZiAoTlVMTCA9PSB0YWJsZS0+dGFiW2N0bC0+RFBdKQorCQkJaWYgKE5VTEwgPT0g dGFibGUtPnRhYlt0YWJsZS0+ZGVmXSkKIAkJCQlyZXR1cm4gLUVOT01FTTsKIAogCQkJbWVtc2V0 KHRhYmxlLT50YWJbdGFibGUtPmRlZl0sIDAsCg== ------=_NextPart_000_001E_01C1A046.8D492320-- From owner-netdev@oss.sgi.com Fri Jan 18 15:19:36 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0INJa713419 for netdev-outgoing; Fri, 18 Jan 2002 15:19:36 -0800 Received: from luxik.cdi.cz (root@inway106.cdi.cz [213.151.81.106]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0INJUP13409 for ; Fri, 18 Jan 2002 15:19:31 -0800 Received: from localhost ([127.0.0.1] ident=devik) by luxik.cdi.cz with esmtp (Exim 3.16 #1) id 16RhM3-0004xs-00; Fri, 18 Jan 2002 23:19:23 +0100 Date: Fri, 18 Jan 2002 23:19:22 +0100 (CET) From: Martin Devera To: hadi@nortelnetworks.com cc: netdev@oss.sgi.com Subject: gred_dump (2.4.17): bad semantic and memory leak In-Reply-To: <002001c1a035$cd507ec0$5d28a4cd@alex.himel.bg> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, I found several bugs in gred_dump (net/sched/sch_gred.c) code. First there is code sample: opt=kmalloc(sizeof(struct tc_gred_qopt)*MAX_DPs, GFP_KERNEL); ... irelevant code ... if (!table->initd) { DPRINTK("NO GRED Queues setup!\n"); return -1; } It means that when table->initd is NULL then dump is aborted and also ALL others dumps are aborted. For user is seems as all qdiscs disappeared. Bad luck. The second problem is IMHO opt leak. It is NEVER deallocated. It is later used in: RTA_PUT(skb, TCA_GRED_PARMS, sizeof(struct tc_gred_qopt)*MAX_DPs, opt); and it is end of opt's usage. Seems as serious memory leak to me. I didn't created a fix because I'm in hurry just now. regards, devik From owner-netdev@oss.sgi.com Fri Jan 18 16:40:37 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0J0ebV15031 for netdev-outgoing; Fri, 18 Jan 2002 16:40:37 -0800 Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.240.114]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0J0eRP15028 for ; Fri, 18 Jan 2002 16:40:27 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id SAA18885; Fri, 18 Jan 2002 18:36:07 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Fri, 18 Jan 2002 18:36:06 -0500 (EST) From: jamal To: Martin Devera cc: , Alexander Atanasov Subject: Re: gred_dump (2.4.17): bad semantic and memory leak In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Ok, is it sch_gred day or what? ;-> Martin you had to get me out hiding, didnt you? I have included Alexander Atanasov fix as well: Please look for more bugs before i submit ;-> BTW, i dont think that is a semantical problem, its just not clean so i cleaned that too and fixed an email address i havent repsonded to for about a year now ;-> cheers, jamal --- sch_gred.c 2002/01/18 23:15:56 1.1 +++ sch_gred.c 2002/01/18 23:26:46 @@ -7,7 +7,7 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * - * Authors: J Hadi Salim (hadi@nortelnetworks.com) 1998,1999 + * Authors: J Hadi Salim (hadi@cyberus.ca) 1998-2002 * * 991129: - Bug fix with grio mode * - a better sing. AvgQ mode with Grio(WRED) @@ -436,7 +436,7 @@ if (table->tab[table->def] == NULL) { table->tab[table->def]= kmalloc(sizeof(struct gred_sched_data), GFP_KERNEL); - if (NULL == table->tab[ctl->DP]) + if (NULL == table->tab[table->def]) return -ENOMEM; memset(table->tab[table->def], 0, @@ -498,7 +498,7 @@ { unsigned long qave; struct rtattr *rta; - struct tc_gred_qopt *opt; + struct tc_gred_qopt *opt = NULL ; struct tc_gred_qopt *dst; struct gred_sched *table = (struct gred_sched *)sch->data; struct gred_sched_data *q; @@ -520,7 +520,7 @@ if (!table->initd) { DPRINTK("NO GRED Queues setup!\n"); - return -1; + goto rtattr_failure; } for (i=0;irta_len = skb->tail - b; + kfree(opt); + return skb->len; rtattr_failure: + if (opt) + kfree(opt); DPRINTK("gred_dump: FAILURE!!!!\n"); /* also free the opt struct here */ From owner-netdev@oss.sgi.com Sat Jan 19 09:43:06 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JHh6X07097 for netdev-outgoing; Sat, 19 Jan 2002 09:43:06 -0800 Received: from smtp3.libero.it (smtp3.libero.it [193.70.192.53]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JHh2P07094 for ; Sat, 19 Jan 2002 09:43:03 -0800 Received: from trantor.ferrara.linux.it (151.26.185.143) by smtp3.libero.it (6.0.032) id 3BD43E25021AD1E2 for netdev@oss.sgi.com; Sat, 19 Jan 2002 17:42:53 +0100 Received: from localhost (localhost.localdomain [127.0.0.1]) by trantor.ferrara.linux.it (Postfix) with ESMTP id E604D1FACF for ; Fri, 18 Jan 2002 15:43:05 +0100 (CET) Date: Fri, 18 Jan 2002 15:43:05 +0100 (CET) From: Mauro Tortonesi To: Subject: SCTP and IPv6 roadmap Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk what's the state of SCTP support under linux? i believe the last patch from lksctp is up to linux 2.4.1. has been the lksctp project abandoned? are you planning to integrate sources from the lksctp project in linux 2.5, or to rewrite SCTP support from scratch? and about IPv6? the USAGI project is doing a good work. will some of their code be integrated in linux 2.5? -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi mauro@ferrara.linux.it Ferrara Linux User Group http://www.ferrara.linux.it Project6 - IPv6 for Linux http://project6.ferrara.linux.it From owner-netdev@oss.sgi.com Sat Jan 19 11:51:13 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JJpDc08366 for netdev-outgoing; Sat, 19 Jan 2002 11:51:13 -0800 Received: from tux.rsn.bth.se (tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JJp7P08363 for ; Sat, 19 Jan 2002 11:51:07 -0800 Received: from localhost (gandalf@localhost [127.0.0.1]) by tux.rsn.bth.se (8.12.1/8.12.1/Debian -5) with ESMTP id g0JIoElx015603; Sat, 19 Jan 2002 19:50:15 +0100 Date: Sat, 19 Jan 2002 19:50:14 +0100 (CET) From: Martin Josefsson X-Sender: gandalf@tux.rsn.bth.se To: "David S. Miller" cc: netdev@oss.sgi.com Subject: [PATCH] make rt_intern_hash() don't search yet another time on UP Message-ID: X-message-flag: Get yourself a real mail client! http://www.washington.edu/pine/ MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi, I've been playing around a little trying to improve the performance of iptables connectiontracking and in one test (flood with random source ip's) I noticed that there are alot of searches in the routingcache (this is because of all the cachemisses). this second search in rt_intern_hash() isn't needed on UP AFAIK. No other cpu can insert entries in the routingcache while we prepare the new entry to be inserted. And it fixes what I think is a small bug on SMP. We dereference rt_hash_table[hash].chain before taking the lock. what if it changes before we start the search, ie. we have to wait for the lock and when we get to run it's been changed by another cpu. --- linux-2.4.18-pre3-NAPI.orig/net/ipv4/route.c Sun Jan 13 20:06:47 2002 +++ linux-2.4.18-pre3-NAPI/net/ipv4/route.c Sat Jan 19 19:35:36 2002 @@ -605,9 +605,11 @@ int attempts = !in_softirq(); restart: - rthp = &rt_hash_table[hash].chain; - write_lock_bh(&rt_hash_table[hash].lock); + +#ifdef CONFIG_SMP + rthp = &rt_hash_table[hash].chain; + while ((rth = *rthp) != NULL) { if (memcmp(&rth->key, &rt->key, sizeof(rt->key)) == 0) { /* Put it first */ @@ -627,7 +629,7 @@ rthp = &rth->u.rt_next; } - +#endif /* Try to bind route to arp only if it is output route or unicast forwarding path. */ /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. From owner-netdev@oss.sgi.com Sat Jan 19 12:15:02 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JKF2L08752 for netdev-outgoing; Sat, 19 Jan 2002 12:15:02 -0800 Received: from luxik.cdi.cz (root@inway106.cdi.cz [213.151.81.106]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JKEvP08745 for ; Sat, 19 Jan 2002 12:14:57 -0800 Received: from localhost ([127.0.0.1] ident=devik) by luxik.cdi.cz with esmtp (Exim 3.16 #1) id 16S0wx-00014D-00; Sat, 19 Jan 2002 20:14:47 +0100 Date: Sat, 19 Jan 2002 20:14:41 +0100 (CET) From: Martin Devera To: jamal cc: netdev@oss.sgi.com, Alexander Atanasov Subject: Re: gred_dump (2.4.17): bad semantic and memory leak In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk > Ok, is it sch_gred day or what? ;-> Martin you had to get me out hiding, > didnt you? Hehe, I have to keep you in touch with life .. I need you to look over my paper ;-) > Please look for more bugs before i submit ;-> > > BTW, i dont think that is a semantical problem, its just not clean so i > cleaned that too and fixed an email address i havent repsonded to for > about a year now ;-> well, look below .. > if (!table->initd) { > DPRINTK("NO GRED Queues setup!\n"); > - return -1; > + goto rtattr_failure; > } > I think that you should not fail so hardly here. I'm not sure what is table->initd for but it is possible that user configures it in way where table->initd is NULL (actualy one user did it and complained about HTB error - this way I found bug above). Dump in sch_api.c calls xxx_dump for all qdiscs on the interface until all are exhausted or -1 is returned. By returning -1 in !table->initd case you prevent all other qdisc from being displayed. IMHO the -1 value is used for hard error (eg. unrecoverable one in RTNETLINK comm) not to report error in qdisc's setup. Do you think that it makes sense ? devik From owner-netdev@oss.sgi.com Sat Jan 19 12:22:33 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JKMXc08928 for netdev-outgoing; Sat, 19 Jan 2002 12:22:33 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JKMUP08924 for ; Sat, 19 Jan 2002 12:22:30 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA32142; Sat, 19 Jan 2002 22:22:18 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201191922.WAA32142@ms2.inr.ac.ru> Subject: Re: [PATCH] make rt_intern_hash() don't search yet another time on UP To: gandalf@wlug.westbo.SE (Martin Josefsson) Date: Sat, 19 Jan 2002 22:22:18 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Martin Josefsson" at Jan 19, 2 10:15:06 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > rt_intern_hash() isn't needed on UP AFAIK. Did you notice that route_slow works with enabled softirqs? > And it fixes what I think is a small bug on SMP. We dereference > rt_hash_table[hash].chain > > - rthp = &rt_hash_table[hash].chain; It is not dereference. Alexey From owner-netdev@oss.sgi.com Sat Jan 19 12:43:57 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JKhv409379 for netdev-outgoing; Sat, 19 Jan 2002 12:43:57 -0800 Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.240.114]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JKhpP09375 for ; Sat, 19 Jan 2002 12:43:51 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id OAA19996; Sat, 19 Jan 2002 14:39:36 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sat, 19 Jan 2002 14:39:36 -0500 (EST) From: jamal To: Martin Devera cc: , Alexander Atanasov Subject: Re: gred_dump (2.4.17): bad semantic and memory leak In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 19 Jan 2002, Martin Devera wrote: > > Ok, is it sch_gred day or what? ;-> Martin you had to get me out hiding, > > didnt you? > > Hehe, I have to keep you in touch with life .. I need you > to look over my paper ;-) BTW, your email seems to have a lot of problems. Please send the paper via email (i have some cycles right now) > well, look below .. > > > if (!table->initd) { > > DPRINTK("NO GRED Queues setup!\n"); > > - return -1; > > + goto rtattr_failure; > > } > > > > I think that you should not fail so hardly here. I'm not sure what > is table->initd for but it is possible that user configures it in way > where table->initd is NULL (actualy one user did it and complained > about HTB error - this way I found bug above). GRED insists that table->initd is non-zero to be completely configured. It is best that they get caught (like what happened with your user); essentially this is a fatal misconfig. > Dump in sch_api.c calls xxx_dump for all qdiscs on the interface until > all are exhausted or -1 is returned. > By returning -1 in !table->initd case you prevent all other qdisc > from being displayed. IMHO the -1 value is used for hard error > (eg. unrecoverable one in RTNETLINK comm) not to report error in > qdisc's setup. > Do you think that it makes sense ? > In this case of GRED it might make _some_ sense but not in the case of any other qdisc ... GRED is configured in two steps; essentially initd protects to ensure that the first stage is complete; all other qdiscs configure in one atomic operation. If you can think of some atomic way to provision GRED (eg by batching the transaction) without annoying the user, and without making the whole thing utterly complex, then -1 as return would be correct 100% semantically; if not, it is the safest thing to do; so we can leave it as is and meet 99.9% of semantical meaning. Let the user suffer and learn ;-> cheers, jamal From owner-netdev@oss.sgi.com Sat Jan 19 12:54:12 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JKsCY09624 for netdev-outgoing; Sat, 19 Jan 2002 12:54:12 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JKs9P09621 for ; Sat, 19 Jan 2002 12:54:10 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA32293; Sat, 19 Jan 2002 22:53:59 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201191953.WAA32293@ms2.inr.ac.ru> Subject: Re: gred_dump (2.4.17): bad semantic and memory leak To: hadi@cyberus.CA (jamal) Date: Sat, 19 Jan 2002 22:53:59 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "jamal" at Jan 19, 2 10:45:00 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > GRED insists that table->initd is non-zero to be completely configured. No matter what gred insists on, it must not return error to dump. And you say wrong thing about atomicity. gred may not work while being configured, but it always has some _state_ and is able to show it to dump. rtattr_failure happens only when there is no room in skb. Please, return success and an information about current state of gred. Alexey From owner-netdev@oss.sgi.com Sat Jan 19 12:56:23 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JKuNA09766 for netdev-outgoing; Sat, 19 Jan 2002 12:56:23 -0800 Received: from tux.rsn.bth.se (tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JKuJP09763 for ; Sat, 19 Jan 2002 12:56:19 -0800 Received: from localhost (gandalf@localhost [127.0.0.1]) by tux.rsn.bth.se (8.12.1/8.12.1/Debian -5) with ESMTP id g0JJtIlx016230; Sat, 19 Jan 2002 20:55:18 +0100 Date: Sat, 19 Jan 2002 20:55:18 +0100 (CET) From: Martin Josefsson X-Sender: gandalf@tux.rsn.bth.se To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com Subject: Re: [PATCH] make rt_intern_hash() don't search yet another time on UP In-Reply-To: <200201191922.WAA32142@ms2.inr.ac.ru> Message-ID: X-message-flag: Get yourself a real mail client! http://www.washington.edu/pine/ MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 19 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > Hello! Hi, > > rt_intern_hash() isn't needed on UP AFAIK. > > Did you notice that route_slow works with enabled softirqs? Hmm can you explain this in more detail please? I didn't think a new softirq that was scheduled by a interrupthandler could "preempt" a running softirq if that's what you mean. I certainly see why that extra lookup is needed on SMP but I can't really see why it's needed on UP. > > - rthp = &rt_hash_table[hash].chain; > > It is not dereference. Gah, you are correct. (I think I need some coffee :) /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. From owner-netdev@oss.sgi.com Sat Jan 19 13:14:46 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JLEk210172 for netdev-outgoing; Sat, 19 Jan 2002 13:14:46 -0800 Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.240.114]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JLEfP10169 for ; Sat, 19 Jan 2002 13:14:41 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id PAA20028; Sat, 19 Jan 2002 15:10:29 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sat, 19 Jan 2002 15:10:29 -0500 (EST) From: jamal To: cc: Subject: Re: gred_dump (2.4.17): bad semantic and memory leak In-Reply-To: <200201191953.WAA32293@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 19 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > GRED insists that table->initd is non-zero to be completely configured. > > No matter what gred insists on, it must not return error to dump. > > And you say wrong thing about atomicity. gred may not work while being > configured, but it always has some _state_ and is able to show it to dump. > > rtattr_failure happens only when there is no room in skb. > > Please, return success and an information about current state of gred. > It does have some state. Note it will continue to work even if half configured just using default parameters .. How about totaly removing that check? It would report accumulated state just fine ... i.e --- sch_gred.c 2002/01/19 20:05:59 1.2 +++ sch_gred.c 2002/01/19 20:09:05 @@ -518,10 +518,6 @@ memset(opt, 0, (sizeof(struct tc_gred_qopt))*table->DPs); - if (!table->initd) { - DPRINTK("NO GRED Queues setup!\n"); - goto rtattr_failure; - } for (i=0;i; Sat, 19 Jan 2002 13:30:19 -0800 Received: from localhost ([127.0.0.1] ident=devik) by luxik.cdi.cz with esmtp (Exim 3.16 #1) id 16S27E-0001L3-00; Sat, 19 Jan 2002 21:29:28 +0100 Date: Sat, 19 Jan 2002 21:29:28 +0100 (CET) From: Martin Devera To: jamal cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: gred_dump (2.4.17): bad semantic and memory leak In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk > It does have some state. Note it will continue to work even if half > configured just using default parameters .. > How about totaly removing that check? It would report accumulated state > just fine ... i.e > > --- sch_gred.c 2002/01/19 20:05:59 1.2 > +++ sch_gred.c 2002/01/19 20:09:05 > @@ -518,10 +518,6 @@ > > memset(opt, 0, (sizeof(struct tc_gred_qopt))*table->DPs); > > - if (!table->initd) { > - DPRINTK("NO GRED Queues setup!\n"); > - goto rtattr_failure; > - } It would solve it just cleanly. Another possibility would be to remove goto rtattr_failure; only and left printk here to report the mistake to an user. devik From owner-netdev@oss.sgi.com Sat Jan 19 14:01:08 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0JM18k10732 for netdev-outgoing; Sat, 19 Jan 2002 14:01:08 -0800 Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.240.114]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0JM0wP10724 for ; Sat, 19 Jan 2002 14:00:58 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id PAA20085; Sat, 19 Jan 2002 15:56:36 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sat, 19 Jan 2002 15:56:36 -0500 (EST) From: jamal To: Martin Devera cc: , Subject: Re: gred_dump (2.4.17): bad semantic and memory leak In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 19 Jan 2002, Martin Devera wrote: > > It does have some state. Note it will continue to work even if half > > configured just using default parameters .. > > How about totaly removing that check? It would report accumulated state > It would solve it just cleanly. > Another possibility would be to remove goto rtattr_failure; only > and left printk here to report the mistake to an user. Ok, Alexey; please queue this; i just tested and it looks fine --- sch_gred.c 2002/01/18 23:15:56 1.1 +++ sch_gred.c 2002/01/19 20:46:51 @@ -7,7 +7,7 @@ * as published by the Free Software Foundation; either version * 2 of the License, or (at your option) any later version. * - * Authors: J Hadi Salim (hadi@nortelnetworks.com) 1998,1999 + * Authors: J Hadi Salim (hadi@cyberus.ca) 1998-2002 * * 991129: - Bug fix with grio mode * - a better sing. AvgQ mode with Grio(WRED) @@ -436,7 +436,7 @@ if (table->tab[table->def] == NULL) { table->tab[table->def]= kmalloc(sizeof(struct gred_sched_data), GFP_KERNEL); - if (NULL == table->tab[ctl->DP]) + if (NULL == table->tab[table->def]) return -ENOMEM; memset(table->tab[table->def], 0, @@ -498,7 +498,7 @@ { unsigned long qave; struct rtattr *rta; - struct tc_gred_qopt *opt; + struct tc_gred_qopt *opt = NULL ; struct tc_gred_qopt *dst; struct gred_sched *table = (struct gred_sched *)sch->data; struct gred_sched_data *q; @@ -520,7 +520,6 @@ if (!table->initd) { DPRINTK("NO GRED Queues setup!\n"); - return -1; } for (i=0;irta_len = skb->tail - b; + kfree(opt); return skb->len; rtattr_failure: + if (opt) + kfree(opt); DPRINTK("gred_dump: FAILURE!!!!\n"); /* also free the opt struct here */ cheers, jamal From owner-netdev@oss.sgi.com Sat Jan 19 22:08:39 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0K68dN16102 for netdev-outgoing; Sat, 19 Jan 2002 22:08:39 -0800 Received: from marina.lowendale.com.au (neale@gw.lowendale.com.au [203.26.242.120]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0K68LP16098 for ; Sat, 19 Jan 2002 22:08:23 -0800 Received: from localhost (neale@localhost) by marina.lowendale.com.au (8.9.3/8.9.3/Debian/GNU) with ESMTP id QAA05993; Sun, 20 Jan 2002 16:31:41 +1100 Date: Sun, 20 Jan 2002 16:31:39 +1100 (EST) From: Neale Banks To: linux-kernel@vger.kernel.org cc: Hein Roehrig , netdev@oss.sgi.com Subject: [PATCH][2.2] drivers/net/net_init.c - bounds checking etc Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Greetings, Appended patch (against 2.2.21-pre2) addresses: (1) lack of bounds checking of statically-dimensioned arrays such as *ethdev_index[MAX_ETH_CARDS] (2) unnecessary initialisation if i in etherdev_get_index() I notice also that init_etherdev() can return a NULL pointer. In earleir 2.2 I found a few ethernet drivers which do not contemplate this possibility. Presumably this should be cleaned up too? Regards, Neale. --- linux-2.2.21-pre2-pristine/drivers/net/net_init.c Sat Nov 3 03:39:07 2001 +++ linux-2.2.21-pre2-ntb/drivers/net/net_init.c Sun Jan 20 15:04:02 2002 @@ -104,6 +104,10 @@ goto found; } } + if (i>=MAX_ETH_CARDS) { + printk("init_etherdev: FATAL - too many eth devs.\n"); + return NULL; + } alloc_size &= ~3; /* Round to dword boundary. */ @@ -224,6 +228,10 @@ goto hipfound; } } + if (i>=MAX_HIP_CARDS) { + printk("init_hippi_dev: FATAL - too many hip devs.\n"); + return NULL; + } alloc_size &= ~3; /* Round to dword boundary. */ @@ -269,6 +277,8 @@ break; } } + if (i>=MAX_HIP_CARDS) + printk("unregister_hipdev: WARNING - didn't find dev.\n"); rtnl_unlock(); } @@ -468,8 +478,7 @@ static int etherdev_get_index(struct device *dev) { - int i=MAX_ETH_CARDS; - + int i; for (i = 0; i < MAX_ETH_CARDS; ++i) { if (ethdev_index[i] == NULL) { sprintf(dev->name, "eth%d", i); @@ -490,6 +499,8 @@ break; } } + if (i>=MAX_ETH_CARDS) + printk("etherdev_put_index: WARNING - didn't find dev.\n"); } int register_netdev(struct device *dev) @@ -553,6 +564,10 @@ goto trfound; } } + if (i>=MAX_TR_CARDS) { + printk("init_trdev: FATAL - too many tr devs.\n"); + return NULL; + } alloc_size &= ~3; /* Round to dword boundary. */ dev = (struct device *)kmalloc(alloc_size, GFP_KERNEL); @@ -624,6 +639,8 @@ break; } } + if (i>=MAX_TR_CARDS) + printk("tr_freedev: WARNING - didn't find dev.\n"); } int register_trdev(struct device *dev) @@ -712,6 +729,10 @@ goto fcfound; } } + if (i>=MAX_FC_CARDS) { + printk("init_fcdev: FATAL - too many fc devs.\n"); + return NULL; + } alloc_size &= ~3; /* Round to dword boundary. */ dev = (struct device *)kmalloc(alloc_size, GFP_KERNEL); @@ -747,6 +768,8 @@ break; } } + if (i>=MAX_FC_CARDS) + printk("fc_freedev: WARNING - didn't find dev.\n"); } From owner-netdev@oss.sgi.com Sun Jan 20 10:23:55 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0KINtx27557 for netdev-outgoing; Sun, 20 Jan 2002 10:23:55 -0800 Received: from u.domain.uli (ja.mac.ssi.bg [212.95.166.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0KINkP27554 for ; Sun, 20 Jan 2002 10:23:48 -0800 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id g0KJQ6912134; Sun, 20 Jan 2002 19:26:16 GMT Date: Sun, 20 Jan 2002 19:26:06 +0000 (GMT) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: Alexey Kuznetsov cc: netdev@oss.sgi.com, , Rusty Russell Subject: [PATCH] Restore ROUTE MASQ in 2.4 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, I'm guilty, what to say more. I resurrected the route masq usage in 2.4: http://www.linuxvirtualserver.org/~julian/#rtmasq By this way the route masq has more priority when the NAT connections are setup, the Netfilter (iptables/ipchains) rules play after them. Examples (nothing new in the usage): Similar to -j MASQUERADE (but the connections don't die on netdev down event): ip rule add ... lookup TABLE nat 0 Similar to -j SNAT: ip rule add ... lookup TABLE map-to EXT_IP The first tests work but I'm not sure what is the best way to correctly stop RTCF_NAT when Netfilter's NAT plays (see the change in ip_nat_dumb.c). May be one bug: inet_rtm_delrule does not match the srcmap (RTA_GATEWAY) and by this way a wrong rule is deleted when they differ only by srcmap. Is it fixable? Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Mon Jan 21 06:08:56 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0LE8u326722 for netdev-outgoing; Mon, 21 Jan 2002 06:08:56 -0800 Received: from marina.lowendale.com.au (neale@gw.lowendale.com.au [203.26.242.120]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0LE8eP26711 for ; Mon, 21 Jan 2002 06:08:40 -0800 Received: from localhost (neale@localhost) by marina.lowendale.com.au (8.9.3/8.9.3/Debian/GNU) with ESMTP id AAA08862; Tue, 22 Jan 2002 00:33:52 +1100 Date: Tue, 22 Jan 2002 00:33:50 +1100 (EST) From: Neale Banks To: linux-kernel@vger.kernel.org cc: Hein Roehrig , netdev@oss.sgi.com Subject: Re: [PATCH][2.2] drivers/net/net_init.c - bounds checking etc In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sun, 20 Jan 2002, Neale Banks wrote: > Greetings, > > Appended patch (against 2.2.21-pre2) addresses: > > (1) lack of bounds checking of statically-dimensioned arrays > such as *ethdev_index[MAX_ETH_CARDS] > > (2) unnecessary initialisation if i in etherdev_get_index() Corrected and tested patch appended. > I notice also that init_etherdev() can return a NULL pointer. In earleir > 2.2 I found a few ethernet drivers which do not contemplate this > possibility. Presumably this should be cleaned up too? Separate patch for eepro100 to follow. Regards, Neale. --- linux-2.2.21-pre2-pristine/drivers/net/net_init.c Sat Nov 3 03:39:07 2001 +++ linux-2.2.21-pre2-ntb/drivers/net/net_init.c Mon Jan 21 22:53:42 2002 @@ -103,7 +103,12 @@ if (dev->priv) memset(dev->priv, 0, sizeof_priv); goto found; } + break; /* have found a non-initialised slot */ } + if (i>=MAX_ETH_CARDS) { + printk("init_etherdev: FATAL - too many eth devs.\n"); + return NULL; + } alloc_size &= ~3; /* Round to dword boundary. */ @@ -223,7 +228,12 @@ if (dev->priv) memset(dev->priv, 0, sizeof_priv); goto hipfound; } + break; /* have found a non-initialised slot */ } + if (i>=MAX_HIP_CARDS) { + printk("init_hippi_dev: FATAL - too many hip devs.\n"); + return NULL; + } alloc_size &= ~3; /* Round to dword boundary. */ @@ -269,6 +279,8 @@ break; } } + if (i>=MAX_HIP_CARDS) + printk("unregister_hipdev: WARNING - didn't find dev.\n"); rtnl_unlock(); } @@ -468,8 +480,7 @@ static int etherdev_get_index(struct device *dev) { - int i=MAX_ETH_CARDS; - + int i; for (i = 0; i < MAX_ETH_CARDS; ++i) { if (ethdev_index[i] == NULL) { sprintf(dev->name, "eth%d", i); @@ -490,6 +501,8 @@ break; } } + if (i>=MAX_ETH_CARDS) + printk("etherdev_put_index: WARNING - didn't find dev.\n"); } int register_netdev(struct device *dev) @@ -552,7 +565,12 @@ if (dev->priv) memset(dev->priv, 0, sizeof_priv); goto trfound; } + break; /* have found a non-initialised slot */ } + if (i>=MAX_TR_CARDS) { + printk("init_trdev: FATAL - too many tr devs.\n"); + return NULL; + } alloc_size &= ~3; /* Round to dword boundary. */ dev = (struct device *)kmalloc(alloc_size, GFP_KERNEL); @@ -624,6 +642,8 @@ break; } } + if (i>=MAX_TR_CARDS) + printk("tr_freedev: WARNING - didn't find dev.\n"); } int register_trdev(struct device *dev) @@ -711,7 +731,12 @@ if (dev->priv) memset(dev->priv, 0, sizeof_priv); goto fcfound; } + break; /* have found a non-initialised slot */ } + if (i>=MAX_FC_CARDS) { + printk("init_fcdev: FATAL - too many fc devs.\n"); + return NULL; + } alloc_size &= ~3; /* Round to dword boundary. */ dev = (struct device *)kmalloc(alloc_size, GFP_KERNEL); @@ -747,6 +772,8 @@ break; } } + if (i>=MAX_FC_CARDS) + printk("fc_freedev: WARNING - didn't find dev.\n"); } From owner-netdev@oss.sgi.com Mon Jan 21 15:21:05 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0LNL5a17706 for netdev-outgoing; Mon, 21 Jan 2002 15:21:05 -0800 Received: from dea.linux-mips.net (localhost [127.0.0.1]) by oss.sgi.com (8.11.2/8.11.3) with ESMTP id g0LNL4P17702 for ; Mon, 21 Jan 2002 15:21:04 -0800 Received: (from ralf@localhost) by dea.linux-mips.net (8.11.1/8.11.1) id g0JNVDJ00582 for netdev@oss.sgi.com; Sat, 19 Jan 2002 15:31:13 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0IDjuP23652 for ; Fri, 18 Jan 2002 05:45:56 -0800 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id NAA23766; Fri, 18 Jan 2002 13:48:25 +0100 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15432.6553.31168.353330@robur.slu.se> Date: Fri, 18 Jan 2002 13:48:25 +0100 To: Dominic Duval Cc: netdev@oss.sgi.com, Robert.Olsson@data.slu.se Subject: Testing tools In-Reply-To: <3C45BEA4.A633749C@8d.com> References: <3C45BEA4.A633749C@8d.com> X-Mailer: VM 6.92 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Dominic Duval writes: > Hi everyone, > > I'm currently looking for some good, open-source, performance testing > tools such as Netpipe in order to do some stress-testing on various > network-related parts of the Kernel. What do you people use when you're > ready to test modifications to the stack or new network drivers? Hello! If you like to do stress testing in terms packets/sec or Mbit/sec there is a little program pg3.c in the iputils package. With a hacked e1000 driver it can fill a GIGE pipe in the range 256-1500 byte packets and w. smallest (64 byte) packets just over 1 Mpps can be sent with a PIII @ 933 MHz. Cheers. --ro From owner-netdev@oss.sgi.com Mon Jan 21 16:27:38 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0M0RcR18866 for netdev-outgoing; Mon, 21 Jan 2002 16:27:38 -0800 Received: from mg03.austin.ibm.com (mg03.austin.ibm.com [192.35.232.20]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0M0RWP18862 for ; Mon, 21 Jan 2002 16:27:32 -0800 Received: from austin.ibm.com (netmail1.austin.ibm.com [9.3.7.138]) by mg03.austin.ibm.com (AIX4.3/8.9.3/8.9.3) with ESMTP id RAA25898; Mon, 21 Jan 2002 17:24:45 -0600 Received: from austin.ibm.com (death.austin.ibm.com [9.53.216.109]) by austin.ibm.com (AIX4.3/8.9.3/8.9.3) with ESMTP id RAA03662; Mon, 21 Jan 2002 17:27:25 -0600 Message-ID: <3C4CA2E9.26253A69@austin.ibm.com> Date: Mon, 21 Jan 2002 17:23:21 -0600 From: Jon Grimm X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: "sctp-developers-list@cig.mot.com" , mauro@ferrara.linux.it Subject: Re: SCTP and IPv6 roadmap Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi. I was just forwarded Mauro's note: Mauro Tortonesi wrote: > > what's the state of SCTP support under linux? i believe the last patch > from lksctp is up to linux 2.4.1. has been the lksctp project abandoned? > are you planning to integrate sources from the lksctp project in linux > 2.5, or to rewrite SCTP support from scratch? > > and about IPv6? the USAGI project is doing a good work. will some of their > code be integrated in linux 2.5? > > -- > Aequam memento rebus in arduis servare mentem... > > Mauro Tortonesi mauro@ferrara.linux.it > Ferrara Linux User Group http://www.ferrara.linux.it > Project6 - IPv6 for Linux http://project6.ferrara.linux.it Mauro, I'm glad you ask, since it gives me a chance to put in a little plug for the lksctp project. No, we aren't abandoned at all. It is quite true that we've been on a base of 2.4.1 overly long. We are in the active process of moving to a 2.4.17 base and redoing our file hierarchy to better allow us to stay up with current kernels. We will hopefully be more nimble in the future, but our current code is only available as anonymous download in CVS. Overall, we'd love to get into 2.5, however realize we have a bit of work to focus on first. For more information, please visit the project's website at: http://www.sourceforge.net/projects/lksctp For more information on the SCTP protocol, see RFC 2960 at: http://www.ietf.org/rfc/rfc2960.txt Best Regards, Jon Grimm From owner-netdev@oss.sgi.com Tue Jan 22 02:33:59 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0MAXxO29943 for netdev-outgoing; Tue, 22 Jan 2002 02:33:59 -0800 Received: from web21201.mail.yahoo.com (web21201.mail.yahoo.com [216.136.129.59]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0MAXvP29940 for ; Tue, 22 Jan 2002 02:33:57 -0800 Message-ID: <20020122093353.29721.qmail@web21201.mail.yahoo.com> Received: from [158.144.6.192] by web21201.mail.yahoo.com via HTTP; Tue, 22 Jan 2002 01:33:53 PST Date: Tue, 22 Jan 2002 01:33:53 -0800 (PST) From: Amit Jain Subject: about "dev_queue_xmit" To: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi, Hope you could answer this...also please CC to my address ..I m not a member of this group 1)Once dev_queue_xmit(skb)is executed......is the skb buffer freed??? 2)what can one expect if the buffer size is greater than MTU??? Thank you Amit __________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/ From owner-netdev@oss.sgi.com Tue Jan 22 04:21:19 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0MCLJS00415 for netdev-outgoing; Tue, 22 Jan 2002 04:21:19 -0800 Received: from iiic.ethz.ch (root@rif-giga.iiic.ethz.ch [129.132.179.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0MCLGP00412 for ; Tue, 22 Jan 2002 04:21:17 -0800 Received: from iiic.ethz.ch (tik-dyn45.ethz.ch [129.132.30.45]) by iiic.ethz.ch (8.9.3/8.9.3) with ESMTP id MAA18955 for ; Tue, 22 Jan 2002 12:20:59 +0100 (MET) Message-ID: <3C4D4B32.45E0A49A@iiic.ethz.ch> Date: Tue, 22 Jan 2002 12:21:22 +0100 From: Thomas Heinis X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: IPv6 & Linux Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi all, does anybody know, where I can find a good documentation of the IPv6 implementation in the Linux kernel? I'm just tired of guessing what a function does by looking at its name... Thanks Thomas From owner-netdev@oss.sgi.com Tue Jan 22 06:58:44 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0MEwir06467 for netdev-outgoing; Tue, 22 Jan 2002 06:58:44 -0800 Received: from netbank.com.br (IDENT:postfix@garrincha.netbank.com.br [200.203.199.88]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0MEwfP06454 for ; Tue, 22 Jan 2002 06:58:41 -0800 Received: from brinquedo.distro.conectiva (1-052.ctame701-2.telepar.net.br [200.181.138.52]) by netbank.com.br (Postfix) with ESMTP id 97BDF4688C; Tue, 22 Jan 2002 11:51:47 -0200 (BRDT) Received: by brinquedo.distro.conectiva (Postfix, from userid 501) id 6631BC455; Tue, 22 Jan 2002 11:58:57 -0200 (BRST) Date: Tue, 22 Jan 2002 11:58:57 -0200 From: Arnaldo Carvalho de Melo To: Amit Jain Cc: netdev@oss.sgi.com Subject: Re: about "dev_queue_xmit" Message-ID: <20020122135857.GB15308@conectiva.com.br> References: <20020122093353.29721.qmail@web21201.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020122093353.29721.qmail@web21201.mail.yahoo.com> User-Agent: Mutt/1.3.25i X-Url: http://advogato.org/person/acme Sender: owner-netdev@oss.sgi.com Precedence: bulk Em Tue, Jan 22, 2002 at 01:33:53AM -0800, Amit Jain escreveu: > Hope you could answer this...also please CC to my > address ..I m not a member of this group > > 1)Once dev_queue_xmit(skb)is executed......is the skb > buffer freed??? not necessarily but you should assume that it is not to be touched anymore. - Arnaldo From owner-netdev@oss.sgi.com Tue Jan 22 09:44:21 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0MHiLj17919 for netdev-outgoing; Tue, 22 Jan 2002 09:44:21 -0800 Received: from dibbler.ne.mediaone.net (IDENT:root@dibbler.ne.mediaone.net [24.218.57.139]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0MHiIP17913 for ; Tue, 22 Jan 2002 09:44:19 -0800 Received: (from rodrigc@localhost) by dibbler.ne.mediaone.net (8.11.0/8.11.0) id g0MGiCc01959; Tue, 22 Jan 2002 11:44:12 -0500 Date: Tue, 22 Jan 2002 11:44:12 -0500 From: Craig Rodrigues To: Thomas Heinis Cc: netdev@oss.sgi.com Subject: Re: IPv6 & Linux Message-ID: <20020122114412.A1952@mediaone.net> References: <3C4D4B32.45E0A49A@iiic.ethz.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3C4D4B32.45E0A49A@iiic.ethz.ch>; from theinis@iiic.ethz.ch on Tue, Jan 22, 2002 at 12:21:22PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, Jan 22, 2002 at 12:21:22PM +0100, Thomas Heinis wrote: > Hi all, > does anybody know, where I can find a good documentation of the IPv6 > implementation in the Linux kernel? I'm just tired of guessing what a > function does by looking at its name... http://www.linux-ipv6.org has some documentation, but you can ask on their mailing list, since they are actively working on IPv6 for Linux. -- Craig Rodrigues http://www.gis.net/~craigr rodrigc@mediaone.net From owner-netdev@oss.sgi.com Tue Jan 22 10:53:20 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0MIrKZ27029 for netdev-outgoing; Tue, 22 Jan 2002 10:53:20 -0800 Received: from yamato.ccrle.nec.de (yamato.ccrle.nec.de [195.37.70.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0MIrDP27026 for ; Tue, 22 Jan 2002 10:53:13 -0800 Received: from wallace.heidelberg.ccrle.nec.de (root@wallace [192.168.102.1]) by yamato.ccrle.nec.de (8.11.6/8.10.1) with ESMTP id g0MHrbH93132 for ; Tue, 22 Jan 2002 18:53:37 +0100 (CET) Received: from fukuoka.mobility.ccrle.nec.de ([192.168.101.178]) by wallace.heidelberg.ccrle.nec.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) with SMTP id SAA16418 for ; Tue, 22 Jan 2002 18:53:02 +0100 Content-Type: text/plain; charset="iso-8859-1" From: Joerg Eggink Organization: NEC Europe Ltd. To: netdev@oss.sgi.com Subject: dev_ioctl() question ? Date: Tue, 22 Jan 2002 18:53:10 +0100 X-Mailer: KMail [version 1.2] MIME-Version: 1.0 Message-Id: <02012218531000.03441@fukuoka.mobility.ccrle.nec.de> X-MIME-Autoconverted: from 8bit to quoted-printable by yamato.ccrle.nec.de id g0MHrbH93132 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id g0MIrEP27027 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello all I have a problem with reading or setting the wireless settings. I want to use the function dev_ioctl(). (defined in core/dev.c ) For this I write my own function integrated in a kernel module. The function is now called with the device_name eth0. The problem is that sometimes I get "no wireless extension". But if a packet arrive (e.g. Router solicitation) I can read the wireless settings. Is there anything I forgot ? Do I need locking functions or is it not possible to use the dev_ioctl function from another kernel module ? Or has anybody another idea to read or set the wireless settings (e.g. the frequency or channel). ***************************************************************************** void mho_get_wireless_info(char *dev_name) { struct iwreq wrq; /*set the device name*/ strncpy(wrq.ifr_name, dev_name, IFNAMSIZ); if( dev_ioctl(SIOCGIWNAME, &wrq) < 0) { printk("MHO: No wireless extension. error = %d\n",err); return; } else { printk("Wireless info: devicename= %s \n",wrq.u.name); return; } } ******************************************************************************* Thank you for all help in advance Jörg -------------------------------------------------------- Joerg Eggink Network Laboratories Heidelberg NEC Europe Ltd. Adenauerplatz 6 D-69115 Heidelberg, Germany email: joerg.eggink@ccrle.nec.de http://www.ccrle.nec.de ------------------------------------------------------- From owner-netdev@oss.sgi.com Tue Jan 22 12:18:00 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0MKI0P29198 for netdev-outgoing; Tue, 22 Jan 2002 12:18:00 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0MKHtP29195 for ; Tue, 22 Jan 2002 12:17:55 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA10279; Tue, 22 Jan 2002 22:16:59 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201221916.WAA10279@ms2.inr.ac.ru> Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 To: ja@ssi.bg (Julian Anastasov) Date: Tue, 22 Jan 2002 22:16:59 +0300 (MSK) Cc: netdev@oss.sgi.com, netfilter@lists.samba.org, rusty@rustcorp.com.au In-Reply-To: from "Julian Anastasov" at Jan 20, 2 07:26:06 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > I'm guilty, what to say more. I resurrected the route > masq usage in 2.4: Does resurrection make a sense?? What are reasons to do this? iptables seem to do everything. I made this trick in 2.2 because people (particuarly, me) wanted masquerading to work and ipchains did not provide this facility masquearading to a random address. I am afraid it is not resurrection, but rather waking up a zombie. > http://www.linuxvirtualserver.org/~julian/#rtmasq It is intersting in any case. I even did not know that this is possible. :-) > May be one bug: inet_rtm_delrule does not match the > srcmap (RTA_GATEWAY) and by this way a wrong rule is deleted > when they differ only by srcmap. Is it fixable? No, I think. Actually, I planned to kill the match against everything but priority. But the more I delayed this change, the more it was cathastrophic. Well, look into ip-cref, it directly warns about this change in future and prescribes to give an explicit priority. But I will concentrate all the will and will do it in 2.5. Alexey From owner-netdev@oss.sgi.com Tue Jan 22 14:05:44 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0MM5iY31973 for netdev-outgoing; Tue, 22 Jan 2002 14:05:44 -0800 Received: from u.domain.uli (ja.mac.ssi.bg [212.95.166.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0MM5XP31946 for ; Tue, 22 Jan 2002 14:05:34 -0800 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id g0MN8Sm01570; Tue, 22 Jan 2002 23:08:28 GMT Date: Tue, 22 Jan 2002 23:08:28 +0000 (GMT) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com, , Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 In-Reply-To: <200201221916.WAA10279@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, On Tue, 22 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > I'm guilty, what to say more. I resurrected the route > > masq usage in 2.4: > > Does resurrection make a sense?? > What are reasons to do this? iptables seem to do everything. > > I made this trick in 2.2 because people (particuarly, me) wanted > masquerading to work and ipchains did not provide this facility > masquearading to a random address. > > I am afraid it is not resurrection, but rather waking up a zombie. :) I find the route masq useful in some complex setups where many local networks exist (without NAT-ing between them), there is NAT to other networks and where the result is a complex list of iptables/ipchains NAT rules (ACCEPT exceptions, SNAT...). We know, rtmasq selects source per route path while the netfilter selects source for each connection, so may be the NAT setup will be faster for rtmasq. Even if Netfilter is smarter when setting up the NAT connections, the result can be difficult management of NAT rules and the most bad thing: not sync-ed with the routing. I can speedup the fib_rules_policy() code by not calling inet_addr_type() for the "nat 0.0.0.0" case which is the most used one. So, the feature should not add any performance degradation. So, nothing new as you see, only some simplification in the NAT rules (which are not visible for small number of networks). So, rtmasq is a different way to be happy :) I don't have more ideas on this topic :) May be I'm too tired to add many Netfilter rules :) If the netfilter gurus don't find it useful, no problem :) > > http://www.linuxvirtualserver.org/~julian/#rtmasq > > It is intersting in any case. I even did not know that this is possible. :-) Yes, the good thing in Netfilter is that we can setup the connections in many different ways and then the code will maintain them. Of course, the SNAT-ing process may be needs correct routing (may be a new "ROUTING" chain) and little routing code changes (I have it in some my patches). > > May be one bug: inet_rtm_delrule does not match the > > srcmap (RTA_GATEWAY) and by this way a wrong rule is deleted > > when they differ only by srcmap. Is it fixable? > > No, I think. Actually, I planned to kill the match against everything > but priority. But the more I delayed this change, the more > it was cathastrophic. Well, look into ip-cref, it directly warns > about this change in future and prescribes to give an explicit priority. May be I have the rules with same priority, anyways :) There are so many free priorities, so it does not matter so much. > But I will concentrate all the will and will do it in 2.5. > > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Wed Jan 23 06:02:23 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0NE2NX27542 for netdev-outgoing; Wed, 23 Jan 2002 06:02:23 -0800 Received: from tiku.hut.fi (tiku.hut.fi [130.233.228.86]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0NE2JP27521 for ; Wed, 23 Jan 2002 06:02:19 -0800 Received: from kosh.hut.fi (dima@kosh.hut.fi [130.233.228.10]) by tiku.hut.fi (8.9.3/8.9.3) with ESMTP id PAA11025 for ; Wed, 23 Jan 2002 15:02:14 +0200 (EET) Received: from localhost (dima@localhost) by kosh.hut.fi (8.9.3/8.9.3) with ESMTP id PAA30053 for ; Wed, 23 Jan 2002 15:02:14 +0200 (EET) X-Authentication-Warning: kosh.hut.fi: dima owned process doing -bs Date: Wed, 23 Jan 2002 15:02:14 +0200 (EET) From: Dmitrii Tisnek To: Subject: netdev.stats change suggestion Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk hey, I've discovered that struct net_device_stats defines counters like rx_bytes and tx_bytes as unsigned long, which on x86 is, sadly, 32 bits. I think that stats are there to be useful, and if so, 4GB limit is cannot be justified, esp when we have 64-bit file offset support ;-), and yes, a 32-bit counter does wrap around for me ;-) I understand that some architectures may not support 64-bit types at all (as opposed to natively), so perhaps what needs to be done is a data type, like int64_on_platfroms_and_compilers_which_provide_such_otherwise_32. (although it seems uint64 is used in some headers, so perhaps it's enough to use that) of course other counters in stats structure could be changed too. say rx_packets is only some 1000 times less than rx_bytes in case of ethernet. cheers, dima From owner-netdev@oss.sgi.com Wed Jan 23 06:23:29 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0NENTe00665 for netdev-outgoing; Wed, 23 Jan 2002 06:23:29 -0800 Received: from luxik.cdi.cz (root@inway106.cdi.cz [213.151.81.106]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0NENPP00651 for ; Wed, 23 Jan 2002 06:23:25 -0800 Received: from localhost ([127.0.0.1] ident=devik) by luxik.cdi.cz with esmtp (Exim 3.16 #1) id 16TN7y-0001G4-00; Wed, 23 Jan 2002 14:07:46 +0100 Date: Wed, 23 Jan 2002 14:07:46 +0100 (CET) From: Martin Devera To: Dmitrii Tisnek cc: netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk > I understand that some architectures may not support 64-bit types at all > (as opposed to natively), so perhaps what needs to be done is a data type, > like int64_on_platfroms_and_compilers_which_provide_such_otherwise_32. I'd like 64bit netstats too. Only note that "long int" is probably what you mentioned above. It is 64bit on supported platforms. Probably you wanted type which is 64bit if compiler supports it regardless of platform. Am I right ? regards, devik From owner-netdev@oss.sgi.com Wed Jan 23 07:53:16 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0NFrGD00679 for netdev-outgoing; Wed, 23 Jan 2002 07:53:16 -0800 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0NFrDP00664 for ; Wed, 23 Jan 2002 07:53:13 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id GAA03074; Wed, 23 Jan 2002 06:51:56 -0800 Date: Wed, 23 Jan 2002 06:51:55 -0800 (PST) Message-Id: <20020123.065155.02303792.davem@redhat.com> To: gandalf@wlug.westbo.se Cc: netdev@oss.sgi.com Subject: Re: [PATCH] make rt_intern_hash() don't search yet another time on UP From: "David S. Miller" In-Reply-To: References: X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: Martin Josefsson Date: Sat, 19 Jan 2002 19:50:14 +0100 (CET) I've been playing around a little trying to improve the performance of iptables connectiontracking and in one test (flood with random source ip's) I noticed that there are alot of searches in the routingcache (this is because of all the cachemisses). this second search in rt_intern_hash() isn't needed on UP AFAIK. No other cpu can insert entries in the routingcache while we prepare the new entry to be inserted. I'm apply this part, thanks. And it fixes what I think is a small bug on SMP. We dereference rt_hash_table[hash].chain before taking the lock. what if it changes before we start the search, ie. we have to wait for the lock and when we get to run it's been changed by another cpu. It takes "address of" chain, not chain. There is no bug :) Franks a lot, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Jan 23 09:24:01 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0NHO1822881 for netdev-outgoing; Wed, 23 Jan 2002 09:24:01 -0800 Received: from tux.rsn.bth.se (tux.rsn.bth.se [194.47.143.135]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0NHNuP22861 for ; Wed, 23 Jan 2002 09:23:56 -0800 Received: from localhost (gandalf@localhost [127.0.0.1]) by tux.rsn.bth.se (8.12.1/8.12.1/Debian -5) with ESMTP id g0NGMgQq003279; Wed, 23 Jan 2002 17:22:42 +0100 Date: Wed, 23 Jan 2002 17:22:42 +0100 (CET) From: Martin Josefsson X-Sender: gandalf@tux.rsn.bth.se To: Martin Devera cc: Dmitrii Tisnek , netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion In-Reply-To: Message-ID: X-message-flag: Get yourself a real mail client! http://www.washington.edu/pine/ MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, 23 Jan 2002, Martin Devera wrote: > > I understand that some architectures may not support 64-bit types at all > > (as opposed to natively), so perhaps what needs to be done is a data type, > > like int64_on_platfroms_and_compilers_which_provide_such_otherwise_32. > > I'd like 64bit netstats too. Only note that "long int" is probably > what you mentioned above. It is 64bit on supported platforms. the problem with 64bit counters on 32bit systems is that after each increase of the low 32bits you have to check for an overflow and if one occured then we should increase the high 32bits. That is slower then a simple increase of a 32bit counter. And I think DaveM, ANK, AK doesn't want this in the core networking code. on a 100Mbit/s network a 32 bit counter overflows in over 320 seconds so if you check the counter every 5 minutes and compensate if it has overflown there isn't really a problem but if you have a 1Gbit/s network on a 32bit machine it will overflow in over just 30 seconds so you have to check much more often. Or as I recommend if you are going to push the limits of 1Git/s interfaces... get a 64bit machine. /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. From owner-netdev@oss.sgi.com Wed Jan 23 10:02:17 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0NI2H730871 for netdev-outgoing; Wed, 23 Jan 2002 10:02:17 -0800 Received: from luxik.cdi.cz (root@inway106.cdi.cz [213.151.81.106]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0NI2CP30853 for ; Wed, 23 Jan 2002 10:02:12 -0800 Received: from localhost ([127.0.0.1] ident=devik) by luxik.cdi.cz with esmtp (Exim 3.16 #1) id 16TQmi-00038Y-00; Wed, 23 Jan 2002 18:02:04 +0100 Date: Wed, 23 Jan 2002 18:02:04 +0100 (CET) From: Martin Devera To: Martin Josefsson cc: Dmitrii Tisnek , netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk > the problem with 64bit counters on 32bit systems is that after each > increase of the low 32bits you have to check for an overflow and if one > occured then we should increase the high 32bits. you are right. It is probably not good to use ADC which would generate one more memory write cycle everytime but decent ... add ax,mem1 jc 1f .section .text.stub 1: inc mem2 jmp 2f .previous 2: should do it with low overhead. You can count about half of cycle on Pentiums for not taken branch. Only it will take one position in BTB ... Because every packet eat hundrets of cycles AFAIK the single JC should not be even measurable. Or am I missing something ? ;) devik From owner-netdev@oss.sgi.com Wed Jan 23 12:21:13 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0NKLDx01086 for netdev-outgoing; Wed, 23 Jan 2002 12:21:13 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0NKL5P01067 for ; Wed, 23 Jan 2002 12:21:06 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA26415; Wed, 23 Jan 2002 22:19:46 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201231919.WAA26415@ms2.inr.ac.ru> Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 To: ja@ssi.bg (Julian Anastasov) Date: Wed, 23 Jan 2002 22:19:45 +0300 (MSK) Cc: netdev@oss.sgi.com, netfilter@lists.samba.org, rusty@rustcorp.com.au In-Reply-To: from "Julian Anastasov" at Jan 22, 2 11:08:28 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > rules :) If the netfilter gurus don't find it useful, no problem :) Well, this is right. I join their opinion too. :-) > Of course, the SNAT-ing process may be needs correct routing (may be > a new "ROUTING" chain) and little routing code changes (I have it in some > my patches). BTW this is puzzle for me: how do they block redirects? This was another big problem with masquerading in 2.2 and in fact another advantage of controlling masquearding via routing, when all such things went right automatically. > May be I have the rules with same priority, anyways :) This is one of the things to prohibit, priority will be handle of rule in fact. It is the only predictable way to distinguish such objects. (BTW this applies to iptables too.) Sigh, lots of scripts will break. So, it is not so bad idea to install some filter dropping panic emails to /dev/null before an attempt to sanitize this. :-) Alexey From owner-netdev@oss.sgi.com Wed Jan 23 12:56:06 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0NKu6l08307 for netdev-outgoing; Wed, 23 Jan 2002 12:56:06 -0800 Received: from u.domain.uli (ja.mac.ssi.bg [212.95.166.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0NKtuP08275 for ; Wed, 23 Jan 2002 12:55:57 -0800 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id g0NLwvK08014; Wed, 23 Jan 2002 21:58:57 GMT Date: Wed, 23 Jan 2002 21:58:57 +0000 (GMT) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com, , Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 In-Reply-To: <200201231919.WAA26415@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, On Wed, 23 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > rules :) If the netfilter gurus don't find it useful, no problem :) > > Well, this is right. I join their opinion too. :-) OK :) I still don't have their :) > > Of course, the SNAT-ing process may be needs correct routing (may be > > a new "ROUTING" chain) and little routing code changes (I have it in some > > my patches). > > BTW this is puzzle for me: how do they block redirects? You mean the ICMP redirects? IIRC, they catch them in postrouting and drop them (icmp_reply_translation). > This was another big problem with masquerading in 2.2 and in fact another > advantage of controlling masquearding via routing, when all such things > went right automatically. The problem (even in 2.2) is for setup with multiple default gateways with distinct IP ranges. Once the the connections are bound to maddr we can't change it. But if a routing cache entry expires or someone flushes the cache the multipath routes forget the right directions that was used from this maddr to the universe. So, I use the trick to do proper routing from maddr to universe at routing time. By this way the multipath route is hit only for the first masqueraded packet from each connection, once bound to maddr we don't hit it. In Netfilter I do such trick by hooking at the end of prerouting (we don't have a routing chain) and calling modified ip_route_input with one additional argument named "lsrc": I load it with maddr because we know what source address manipulation is scheduled for postrouting. As result, the modified ip_route_input behaves also as ip_route_output, it is a mixed version (lsrc must be local IP, same check as in ip_route_output)... I stop the ICMP redirects, just like for the RTCF_?NAT/MASQ case. I can point you to the right place if you prefer: http://www.linuxvirtualserver.org/~julian/05_nf_reroute-2.4.14-5pre2.diff I use function net/ipv4/netfilter/ip_nat_core.c:ip_nat_route_input() that calls ip_route_input, changed in route.c. I add one arg for ip_route_output (gw) but this is different issue. The result is always one ip_route_input but smarter one. This is the reason I'm talking about ROUTING hook, the SNAT-ed traffic can be routed properly when using multipath routes. rtmasq is not immune to this problem. The best approach for rtmasq is to call fib_lookup by using the srcmap/prefsrc as source. The value 0 in srcmap always causes the masquerading (in 2.2) to call ip_route_output, the same is in my patch for 2.4, one extra call for each connection isntead of one call for each slow route lookup. But the problem with the missing lsrc functionality remains: we can misroute the masqueraded traffic. > So, it is not so bad idea to install some filter dropping panic emails > to /dev/null before an attempt to sanitize this. :-) Yes, the compatibility ... > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Thu Jan 24 00:58:50 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0O8woJ21871 for netdev-outgoing; Thu, 24 Jan 2002 00:58:50 -0800 Received: from u.domain.uli (ja.mac.ssi.bg [212.95.166.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0O8wgP21824 for ; Thu, 24 Jan 2002 00:58:43 -0800 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id g0OA17N01050; Thu, 24 Jan 2002 10:01:09 GMT Date: Thu, 24 Jan 2002 10:01:07 +0000 (GMT) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com, , Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 In-Reply-To: <200201231919.WAA26415@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, On Wed, 23 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > > Of course, the SNAT-ing process may be needs correct routing (may be > > a new "ROUTING" chain) and little routing code changes (I have it in some > > my patches). > > BTW this is puzzle for me: how do they block redirects? > This was another big problem with masquerading in 2.2 and in fact another > advantage of controlling masquearding via routing, when all such things > went right automatically. I forgot to mention another thing: users report that there are applications that change the tos in established state (openssh?). This change causes ip_route_input to select different path from the multipath route when masqueraded. I hope this does not happen for related ICMP packets, I see that icmp_send copies the tos from the request packets. But wait, the tos is copied from the out->in packets while the in->out packets have their own tos that can be different. I'm not sure whether we misroute these related ICMPs when masqueraded (if such ICMPs really can be generated from a masqueraded box). And again, ip_route_input's lsrc arg is a solution for this problem. The result: - we need a place (ROUTING chain?) where each masqueraded connection can feed ip_route_input with the desired data (lsrc). If there is no lsrc, then the RT_TOS(tos) arg must be always constant for the masqueraded conn even for the related ICMP traffic because we risk to select different path. Of course, if lsrc exists, we can feed ip_route_input with different tos values, we don't care, we don't risk to select path with distinct IP block that drops the packets with other src IPs. The masqueraded connections will behave nearly to a TCP/UDP socket in its routing usage: they are usually bound to maddr, they can change its tos. - without lsrc arg the multipath usage can easily fail on route cache flush - the usage of lsrc does not generate ICMP redirects (you will not worry about netfilter and ICMP redirects :)) - the bad thing: the ip_route_input prototype is changed :( Different issue: the Linux Virtual Server can use this ROUTING hook to call ip_route_input with different args, i.e. not always with args extracted from the iph because we can forward packets without changing address information in it. > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Thu Jan 24 02:33:26 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OAXQA17838 for netdev-outgoing; Thu, 24 Jan 2002 02:33:26 -0800 Received: from tiku.hut.fi (tiku.hut.fi [130.233.228.86]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OAXLP17809 for ; Thu, 24 Jan 2002 02:33:21 -0800 Received: from kosh.hut.fi (dima@kosh.hut.fi [130.233.228.10]) by tiku.hut.fi (8.9.3/8.9.3) with ESMTP id LAA17754; Thu, 24 Jan 2002 11:33:17 +0200 (EET) Received: from localhost (dima@localhost) by kosh.hut.fi (8.9.3/8.9.3) with ESMTP id LAA19269; Thu, 24 Jan 2002 11:33:17 +0200 (EET) X-Authentication-Warning: kosh.hut.fi: dima owned process doing -bs Date: Thu, 24 Jan 2002 11:33:17 +0200 (EET) From: Dmitrii Tisnek To: Martin Devera cc: Dmitrii Tisnek , Subject: Re: netdev.stats change suggestion In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, 23 Jan 2002, Martin Devera wrote: > > I understand that some architectures may not support 64-bit types at all > > (as opposed to natively), so perhaps what needs to be done is a data type, > > like int64_on_platfroms_and_compilers_which_provide_such_otherwise_32. > > I'd like 64bit netstats too. Only note that "long int" is probably > what you mentioned above. It is 64bit on supported platforms. > > Probably you wanted type which is 64bit if compiler supports it regardless > of platform. Am I right ? indeed. > > regards, devik > From owner-netdev@oss.sgi.com Thu Jan 24 04:21:26 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OCLQV17079 for netdev-outgoing; Thu, 24 Jan 2002 04:21:26 -0800 Received: from tapu.f00f.org (tapu.cryptoapps.com [63.108.153.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OCLNP17065 for ; Thu, 24 Jan 2002 04:21:24 -0800 Received: by tapu.f00f.org (Postfix, from userid 1000) id 5DC0D1A0F39; Fri, 25 Jan 2002 00:20:23 +1300 (NZDT) Date: Thu, 24 Jan 2002 03:20:23 -0800 From: Chris Wedgwood To: Dmitrii Tisnek Cc: netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion Message-ID: <20020124112023.GA31956@tapu.f00f.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.26i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, Jan 23, 2002 at 03:02:14PM +0200, Dmitrii Tisnek wrote: I've discovered that struct net_device_stats defines counters like rx_bytes and tx_bytes as unsigned long, which on x86 is, sadly, 32 bits. How fast is our IO? For most everyone, 32-bits is plenty enough. Have a daemon/whatever check it from time-to-time and detect overflow. (although it seems uint64 is used in some headers, so perhaps it's enough to use that) In theory we could use 'long long' on 32-bit architectures, but then we can't do atomic add/sub operations... --cw From owner-netdev@oss.sgi.com Thu Jan 24 04:29:42 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OCTg319550 for netdev-outgoing; Thu, 24 Jan 2002 04:29:42 -0800 Received: from sgi.com (sgi-too.SGI.COM [204.94.211.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OCTdP19535 for ; Thu, 24 Jan 2002 04:29:39 -0800 Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id DAA07211 for ; Thu, 24 Jan 2002 03:28:28 -0800 (PST) mail_from (jgarzik@mandrakesoft.com) Received: from adsl-156-52-82.asm.bellsouth.net ([66.156.52.82] helo=mandrakesoft.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 16ThzY-0003Ce-00; Thu, 24 Jan 2002 11:24:30 +0000 Message-ID: <3C4FEEE6.6E44E6EE@mandrakesoft.com> Date: Thu, 24 Jan 2002 06:24:22 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.18-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: Chris Wedgwood CC: Dmitrii Tisnek , netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <20020124112023.GA31956@tapu.f00f.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Chris Wedgwood wrote: > > On Wed, Jan 23, 2002 at 03:02:14PM +0200, Dmitrii Tisnek wrote: > > I've discovered that struct net_device_stats defines counters like > rx_bytes and tx_bytes as unsigned long, which on x86 is, sadly, 32 > bits. > > How fast is our IO? For most everyone, 32-bits is plenty enough. > Have a daemon/whatever check it from time-to-time and detect overflow. We should make them 64-bit because related SNMP MIBs use 64-bits. Jeff -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com From owner-netdev@oss.sgi.com Thu Jan 24 04:30:07 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OCU7R19753 for netdev-outgoing; Thu, 24 Jan 2002 04:30:07 -0800 Received: from tapu.f00f.org (tapu.cryptoapps.com [63.108.153.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OCU4P19736 for ; Thu, 24 Jan 2002 04:30:04 -0800 Received: by tapu.f00f.org (Postfix, from userid 1000) id BA4C61A1008; Fri, 25 Jan 2002 00:29:04 +1300 (NZDT) Date: Thu, 24 Jan 2002 03:29:04 -0800 From: Chris Wedgwood To: Jeff Garzik Cc: Dmitrii Tisnek , netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion Message-ID: <20020124112904.GA31991@tapu.f00f.org> References: <20020124112023.GA31956@tapu.f00f.org> <3C4FEEE6.6E44E6EE@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3C4FEEE6.6E44E6EE@mandrakesoft.com> User-Agent: Mutt/1.3.26i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Jan 24, 2002 at 06:24:22AM -0500, Jeff Garzik wrote: We should make them 64-bit because related SNMP MIBs use 64-bits. (1) Do we need atomic add/sub for any of these? If so, making them 64-bit sucks terribly. (2) What can't snmpd detect and deal with wrap? I know for certain SNMP operations things are supposed to be strictly increasing for the life-time the machine is up --- but is this really a big deal? SNMP albeit a very useful thing and times, is also horribly crude and has some terrible limitations, it alone doesn't seem like a good reason to me. Comments? --cw From owner-netdev@oss.sgi.com Thu Jan 24 04:43:30 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OChUR23028 for netdev-outgoing; Thu, 24 Jan 2002 04:43:30 -0800 Received: from sgi.com (sgi-too.SGI.COM [204.94.211.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OChPP22993 for ; Thu, 24 Jan 2002 04:43:25 -0800 Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id DAA07339 for ; Thu, 24 Jan 2002 03:42:14 -0800 (PST) mail_from (jgarzik@mandrakesoft.com) Received: from adsl-156-52-82.asm.bellsouth.net ([66.156.52.82] helo=mandrakesoft.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 16TiHp-0003O6-00; Thu, 24 Jan 2002 11:43:21 +0000 Message-ID: <3C4FF358.B4B35B12@mandrakesoft.com> Date: Thu, 24 Jan 2002 06:43:20 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.18-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: Chris Wedgwood CC: Dmitrii Tisnek , netdev@oss.sgi.com, "David S. Miller" Subject: Re: netdev.stats change suggestion References: <20020124112023.GA31956@tapu.f00f.org> <3C4FEEE6.6E44E6EE@mandrakesoft.com> <20020124112904.GA31991@tapu.f00f.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Chris Wedgwood wrote: > > On Thu, Jan 24, 2002 at 06:24:22AM -0500, Jeff Garzik wrote: > > We should make them 64-bit because related SNMP MIBs use 64-bits. > > (1) Do we need atomic add/sub for any of these? If so, making them > 64-bit sucks terribly. > > (2) What can't snmpd detect and deal with wrap? I know for certain > SNMP operations things are supposed to be strictly increasing for > the life-time the machine is up --- but is this really a big deal? > SNMP albeit a very useful thing and times, is also horribly crude > and has some terrible limitations, it alone doesn't seem like a > good reason to me. > > Comments? With GigE you want 64-bit anyway. Heavily loaded GigE networks will turn over 32-bit counters pretty often. WRT atomicity, no we shouldn't need atomicity in struct netdev_stats. On the net driver side, the net driver is required to do its own locking, when it updates those stats. Further, yet another reason is that newer NICs store the stats in 64-bit numbers, in hardware. I don't think this was discussed with DaveM, but since the SNMP MIBs use 64-bit numbers and newer GigE cards use 64-bit numbers, we pretty much decided at the kernel meeting that netdev_stats should go to 64-bit. Jeff -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com From owner-netdev@oss.sgi.com Thu Jan 24 07:23:16 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OFNGh05310 for netdev-outgoing; Thu, 24 Jan 2002 07:23:16 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OFNBP05302 for ; Thu, 24 Jan 2002 07:23:11 -0800 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id PAA01157; Thu, 24 Jan 2002 15:25:39 +0100 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15440.6499.131678.563214@robur.slu.se> Date: Thu, 24 Jan 2002 15:25:39 +0100 To: Jeff Garzik Cc: Chris Wedgwood , Dmitrii Tisnek , netdev@oss.sgi.com, "David S. Miller" Subject: Re: netdev.stats change suggestion In-Reply-To: <3C4FF358.B4B35B12@mandrakesoft.com> References: <20020124112023.GA31956@tapu.f00f.org> <3C4FEEE6.6E44E6EE@mandrakesoft.com> <20020124112904.GA31991@tapu.f00f.org> <3C4FF358.B4B35B12@mandrakesoft.com> X-Mailer: VM 6.92 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Jeff Garzik writes: > > Comments? > > With GigE you want 64-bit anyway. Heavily loaded GigE networks will > turn over 32-bit counters pretty often. Yes 32 bit counters w. GIGE is real pain. at 5 min sampling rate byte counters wraps just above 100 Mbps. Illstrated by a Linux GIGE router now constanly at about 200 Mbit/s pps counters fine they dont wrap yet :-) http://robur.slu.se/traffic-mrtg/archive-r1-pps.html To compare with were the output byte counters do wrap so they more or less useless. http://robur.slu.se/traffic-mrtg/archive-r1.html Of one decrease the sampling rate but it only very a short time solution and this overhead has to traded with checking carry overflow overhead. > Further, yet another reason is that newer NICs store the stats in 64-bit > numbers, in hardware. Yes. > I don't think this was discussed with DaveM, but since the SNMP MIBs use > 64-bit numbers and newer GigE cards use 64-bit numbers, we pretty much > decided at the kernel meeting that netdev_stats should go to 64-bit. I put my vote there too. Cheers. --ro From owner-netdev@oss.sgi.com Thu Jan 24 07:28:23 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OFSN205564 for netdev-outgoing; Thu, 24 Jan 2002 07:28:23 -0800 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OFSKP05556 for ; Thu, 24 Jan 2002 07:28:20 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id GAA14595; Thu, 24 Jan 2002 06:26:51 -0800 Date: Thu, 24 Jan 2002 06:26:50 -0800 (PST) Message-Id: <20020124.062650.66057933.davem@redhat.com> To: Robert.Olsson@data.slu.se Cc: jgarzik@mandrakesoft.com, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion From: "David S. Miller" In-Reply-To: <15440.6499.131678.563214@robur.slu.se> References: <20020124112904.GA31991@tapu.f00f.org> <3C4FF358.B4B35B12@mandrakesoft.com> <15440.6499.131678.563214@robur.slu.se> X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: Robert Olsson Date: Thu, 24 Jan 2002 15:25:39 +0100 Jeff Garzik writes: > I don't think this was discussed with DaveM, but since the SNMP MIBs use > 64-bit numbers and newer GigE cards use 64-bit numbers, we pretty much > decided at the kernel meeting that netdev_stats should go to 64-bit. I put my vote there too. I have no problems with it, as long as we don't horribly break tools that parse the values we export now. BTW, we ought to start thinking about NAPI integration for 2.5.x soon. What is the current status of the patches Robert? From owner-netdev@oss.sgi.com Thu Jan 24 08:28:37 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OGSb411887 for netdev-outgoing; Thu, 24 Jan 2002 08:28:37 -0800 Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OGSXP11873 for ; Thu, 24 Jan 2002 08:28:33 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 9CAFC1E4DB; Thu, 24 Jan 2002 16:28:25 +0100 (MET) Date: Thu, 24 Jan 2002 16:28:25 +0100 From: Andi Kleen To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, jgarzik@mandrakesoft.com, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion Message-ID: <20020124162825.A24611@wotan.suse.de> References: <20020124112904.GA31991@tapu.f00f.org> <3C4FF358.B4B35B12@mandrakesoft.com> <15440.6499.131678.563214@robur.slu.se> <20020124.062650.66057933.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020124.062650.66057933.davem@redhat.com> User-Agent: Mutt/1.3.22.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Jan 24, 2002 at 06:26:50AM -0800, David S. Miller wrote: > From: Robert Olsson > Date: Thu, 24 Jan 2002 15:25:39 +0100 > > Jeff Garzik writes: > > I don't think this was discussed with DaveM, but since the SNMP MIBs use > > 64-bit numbers and newer GigE cards use 64-bit numbers, we pretty much > > decided at the kernel meeting that netdev_stats should go to 64-bit. > > I put my vote there too. > > I have no problems with it, as long as we don't horribly break tools > that parse the values we export now. Unfortunately they will very likely. glibc scanf and strtoul() have overflow checking and will return ERANGE or stop (*scanf). nettools uses sscanf for example. The only way I see to do it in a compatible way is to still supply %INT_MAX values in the old fields and add new fields for the 64bit values. There are also a lot of broken /proc/net/dev parsers around so it may be a good idea to use a new /proc/net file and leave the old alone. -Andi From owner-netdev@oss.sgi.com Thu Jan 24 08:39:17 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OGdH714329 for netdev-outgoing; Thu, 24 Jan 2002 08:39:17 -0800 Received: from www.linux.org.uk (IDENT:exim@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OGdBP14321 for ; Thu, 24 Jan 2002 08:39:11 -0800 Received: from adsl-156-52-82.asm.bellsouth.net ([66.156.52.82] helo=mandrakesoft.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 16Tlxz-0007fl-00; Thu, 24 Jan 2002 15:39:07 +0000 Message-ID: <3C502A99.4EEFFB40@mandrakesoft.com> Date: Thu, 24 Jan 2002 10:39:05 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.18-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: "David S. Miller" , Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <20020124112904.GA31991@tapu.f00f.org> <3C4FF358.B4B35B12@mandrakesoft.com> <15440.6499.131678.563214@robur.slu.se> <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Andi Kleen wrote: > > On Thu, Jan 24, 2002 at 06:26:50AM -0800, David S. Miller wrote: > > From: Robert Olsson > > Date: Thu, 24 Jan 2002 15:25:39 +0100 > > > > Jeff Garzik writes: > > > I don't think this was discussed with DaveM, but since the SNMP MIBs use > > > 64-bit numbers and newer GigE cards use 64-bit numbers, we pretty much > > > decided at the kernel meeting that netdev_stats should go to 64-bit. > > > > I put my vote there too. > > > > I have no problems with it, as long as we don't horribly break tools > > that parse the values we export now. > > Unfortunately they will very likely. glibc scanf and strtoul() have overflow > checking and will return ERANGE or stop (*scanf). nettools uses sscanf > for example. > > The only way I see to do it in a compatible way is to still supply %INT_MAX > values in the old fields and add new fields for the 64bit values. > There are also a lot of broken /proc/net/dev parsers around so it may be a > good idea to use a new /proc/net file and leave the old alone. Using procfs is lame in the first place, IMHO, and viro will rightly yell at us for adding new files. Whatever method we choose, though, I agree that there is not much way around creating a new method for getting that data. Maybe drivers could pass a list of 64-bit values to ethtool, along with a version number. ethtool would then use that version as a key for decoding which values belong with which names "rx errors", "rx missed pkts", etc. I have very little preference for the interface, besides -not- dumping yet another random file into procfs... Jeff -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com From owner-netdev@oss.sgi.com Thu Jan 24 08:48:53 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OGmrY15182 for netdev-outgoing; Thu, 24 Jan 2002 08:48:53 -0800 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OGmpP15179 for ; Thu, 24 Jan 2002 08:48:51 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id HAA01339; Thu, 24 Jan 2002 07:47:29 -0800 Date: Thu, 24 Jan 2002 07:47:29 -0800 (PST) Message-Id: <20020124.074729.41631242.davem@redhat.com> To: jgarzik@mandrakesoft.com Cc: ak@suse.de, Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion From: "David S. Miller" In-Reply-To: <3C502A99.4EEFFB40@mandrakesoft.com> References: <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> <3C502A99.4EEFFB40@mandrakesoft.com> X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: Jeff Garzik Date: Thu, 24 Jan 2002 10:39:05 -0500 I have very little preference for the interface, besides -not- dumping yet another random file into procfs... Since netlink is available always now, let's use that. From owner-netdev@oss.sgi.com Thu Jan 24 08:52:30 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OGqUp15651 for netdev-outgoing; Thu, 24 Jan 2002 08:52:30 -0800 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OGqNP15644 for ; Thu, 24 Jan 2002 08:52:23 -0800 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id QAA02464; Thu, 24 Jan 2002 16:54:56 +0100 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15440.11856.500676.908238@robur.slu.se> Date: Thu, 24 Jan 2002 16:54:56 +0100 To: "David S. Miller" Cc: Robert.Olsson@data.slu.se, jgarzik@mandrakesoft.com, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion In-Reply-To: <20020124.062650.66057933.davem@redhat.com> References: <20020124112904.GA31991@tapu.f00f.org> <3C4FF358.B4B35B12@mandrakesoft.com> <15440.6499.131678.563214@robur.slu.se> <20020124.062650.66057933.davem@redhat.com> X-Mailer: VM 6.92 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk David S. Miller writes: > > I put my vote there too. > > I have no problems with it, as long as we don't horribly break tools > that parse the values we export now. > > BTW, we ought to start thinking about NAPI integration for 2.5.x soon. > What is the current status of the patches Robert? First the GIGE router I pointed to runs NAPI since about 3 months with three e1000's. I the feel the obstacle for 2.5.x inclusion and wider distribution is that Alexey wanted a cleaner interface to poll the function. Noone disagrees and Manfred and other were proposing this some time ago of course Alexey has the last word. Jamal has done good converting-to-NAPI document which and this is too dependant on the poll-call API. NAPI code as-is today consists of: 1) ANK kernel patch. We use this in production with 2.4.16. It patches with 2.5.2 but not yet tested. 2) Tulip driver. I forked off some older kernel driver. I think Jeff has code with his more recent kernel version. I can give Jeff a hand verifying it. 3) e1000 driver. It is not kernel tree today. Intel indicated some interest to have this included in 2.5.X and I think thats the reason they are now changing the BSD-ish copyright to comply better with GPL. I have at Intel we eventfully can ask. 4) patch for 3c59x contributed. Lennert Bytenback, Andrew, Jamal, ANK and others. There might be later revs. 5) Documentation. Jamal papers. Usenix paper and porting guide. NAPI does not break any netif_rx drivers they run untouched with virtually no performance degradation. Problem comes if we need to change driver API more than once. This is my view Jamal and Alexey will comment. I have collected the current work: robur.slu.se:/pub/Linux/net-development/NAPI/ Cheers. --ro From owner-netdev@oss.sgi.com Thu Jan 24 09:05:49 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OH5nc16979 for netdev-outgoing; Thu, 24 Jan 2002 09:05:49 -0800 Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OH5hP16975 for ; Thu, 24 Jan 2002 09:05:44 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 4A81F1E4FC; Thu, 24 Jan 2002 17:05:36 +0100 (MET) Date: Thu, 24 Jan 2002 17:05:35 +0100 From: Andi Kleen To: "David S. Miller" Cc: jgarzik@mandrakesoft.com, ak@suse.de, Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion Message-ID: <20020124170535.A18315@wotan.suse.de> References: <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> <3C502A99.4EEFFB40@mandrakesoft.com> <20020124.074729.41631242.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020124.074729.41631242.davem@redhat.com> User-Agent: Mutt/1.3.22.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Jan 24, 2002 at 07:47:29AM -0800, David S. Miller wrote: > From: Jeff Garzik > Date: Thu, 24 Jan 2002 10:39:05 -0500 > > I have very little preference for the interface, besides -not- > dumping yet another random file into procfs... > > Since netlink is available always now, let's use that. Advantage of netlink is that it can block -- with /proc polling is always needed. e.g. one could add setting of reporting thresholds to the interface and let netlink send a message when the counter overflows it. This way a statistic gathering tool could sleep and only wake up when something interesting happens. [it's a real problem - when you have a whole gnome or windowmaker panel of statistics reporting tools around it can chew up a not insignificant amount of CPU time because they all wake up regularly and check /proc if nothing has changed] Problem is only that sending rtnetlink messages from hard interrupt context does not work very well, but it can be handled by using queue_event(). -Andi From owner-netdev@oss.sgi.com Thu Jan 24 09:17:12 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OHHCP17744 for netdev-outgoing; Thu, 24 Jan 2002 09:17:12 -0800 Received: from grok.yi.org (IDENT:6MIvG9mKpwaPbqEIhUuZFIouyerclGxo@cx97923-a.phnx3.az.home.com [24.1.197.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OHH8P17740 for ; Thu, 24 Jan 2002 09:17:08 -0800 Received: from candelatech.com (IDENT:b4gLuLJFwRUpR+AE2xbQWAVoE5BBDmqn@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.2) with ESMTP id g0OGG8D32749; Thu, 24 Jan 2002 09:16:08 -0700 Message-ID: <3C503347.5020608@candelatech.com> Date: Thu, 24 Jan 2002 09:16:07 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2 X-Accept-Language: en-us MIME-Version: 1.0 To: "David S. Miller" CC: jgarzik@mandrakesoft.com, ak@suse.de, Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> <3C502A99.4EEFFB40@mandrakesoft.com> <20020124.074729.41631242.davem@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From a quick glance at the man page, it seems netlink would be pretty heavy-weight for just wanting to get the counters from a device. I especially do not want to have to deal with the unreliable (according to the man page) nature of netlink when reading kernel counters. To me, an IOCTL seems best. By the way, how do you change a 64-bit counter to/from network-byte-order on a 32bit machine? (Perhaps you would need to do it in this case..but I'm curious :)) Thanks, Ben David S. Miller wrote: > From: Jeff Garzik > Date: Thu, 24 Jan 2002 10:39:05 -0500 > > I have very little preference for the interface, besides -not- > dumping yet another random file into procfs... > > Since netlink is available always now, let's use that. > > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu Jan 24 09:47:00 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OHl0U19665 for netdev-outgoing; Thu, 24 Jan 2002 09:47:00 -0800 Received: from sgi.com (sgi-too.SGI.COM [204.94.211.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OHkqP19662 for ; Thu, 24 Jan 2002 09:46:52 -0800 Received: from www.linux.org.uk (parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id IAA07982 for ; Thu, 24 Jan 2002 08:46:49 -0800 (PST) mail_from (jgarzik@mandrakesoft.com) Received: from adsl-156-52-82.asm.bellsouth.net ([66.156.52.82] helo=mandrakesoft.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 16Tmrl-0000ec-00; Thu, 24 Jan 2002 16:36:45 +0000 Message-ID: <3C50381B.3E1B602@mandrakesoft.com> Date: Thu, 24 Jan 2002 11:36:43 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.18-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: Ben Greear CC: "David S. Miller" , ak@suse.de, Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> <3C502A99.4EEFFB40@mandrakesoft.com> <20020124.074729.41631242.davem@redhat.com> <3C503347.5020608@candelatech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Ben Greear wrote: > From a quick glance at the man page, it seems netlink would be pretty > heavy-weight for just wanting to get the counters from a device. I > especially do not want to have to deal with the unreliable > (according to the man page) nature of netlink when reading kernel > counters. To me, an IOCTL seems best. We wanna send link up/down notification via netlink too, might as well do stats too. Gives us a lot of flexibility. If you have concerns about netlink stability, IMHO we can address that separately... by fixing manpage, code, whatever. Suggestions/code review welcome. > By the way, how do you change a 64-bit counter to/from network-byte-order > on a 32bit machine? (Perhaps you would need to do it in this case..but I'm > curious :)) in the kernel, cpu_to/from_be64. in userspace, using glib at least, GUINT64_TO_BE, etc. Jeff -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com From owner-netdev@oss.sgi.com Thu Jan 24 09:56:10 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OHuA120739 for netdev-outgoing; Thu, 24 Jan 2002 09:56:10 -0800 Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OHu5P20733 for ; Thu, 24 Jan 2002 09:56:06 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 355651E507; Thu, 24 Jan 2002 17:55:57 +0100 (MET) Date: Thu, 24 Jan 2002 17:55:54 +0100 From: Andi Kleen To: Ben Greear Cc: "David S. Miller" , jgarzik@mandrakesoft.com, ak@suse.de, Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion Message-ID: <20020124175554.A9159@wotan.suse.de> References: <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> <3C502A99.4EEFFB40@mandrakesoft.com> <20020124.074729.41631242.davem@redhat.com> <3C503347.5020608@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3C503347.5020608@candelatech.com> User-Agent: Mutt/1.3.22.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Jan 24, 2002 at 09:16:07AM -0700, Ben Greear wrote: > From a quick glance at the man page, it seems netlink would be pretty > heavy-weight for just wanting to get the counters from a device. I > especially do not want to have to deal with the unreliable > (according to the man page) nature of netlink when reading kernel > counters. To me, an IOCTL seems best. When the ioctl allocates memory for example it is not in any way more reliable than a netlink request. The only special case is when you get an netlink notification from an softirq/irq, in this case there is the possibility that the netlink packet cannot get allocated when you're out of GFP_ATOMIC memory. In this case the application needs to be notified in a failsafe way to request a resync. There are various ways to do that, e.g. set a flag and wake it all waiters up or use a preallocated error skb for this case. Note this cannot happen for a simple request, only for a async threshold exceeded message. Simple request happens completely in process context and are mostly equivalent to ioctls. > > By the way, how do you change a 64-bit counter to/from network-byte-order > on a 32bit machine? (Perhaps you would need to do it in this case..but I'm > curious :)) netlink packets are in host order. -Andi P.S.: people will of course not use netlink, but instead do popen("eth-tool") It's therefore important that the output of eth-tool is table and easy to parse too. In the end it'll have the same issues as /proc. From owner-netdev@oss.sgi.com Thu Jan 24 10:36:05 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OIa5v29979 for netdev-outgoing; Thu, 24 Jan 2002 10:36:05 -0800 Received: from grok.yi.org (IDENT:3n88JdfHtpixGAvMVLaef77HUg1kjzmv@cx97923-a.phnx3.az.home.com [24.1.197.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OIa0P29954 for ; Thu, 24 Jan 2002 10:36:00 -0800 Received: from candelatech.com (IDENT:jhjZaejDxiOPBjOlPODYMkOYI3BLalcA@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.2) with ESMTP id g0OHZUD00675; Thu, 24 Jan 2002 10:35:30 -0700 Message-ID: <3C5045E2.3010003@candelatech.com> Date: Thu, 24 Jan 2002 10:35:30 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2 X-Accept-Language: en-us MIME-Version: 1.0 To: Jeff Garzik CC: "David S. Miller" , ak@suse.de, Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> <3C502A99.4EEFFB40@mandrakesoft.com> <20020124.074729.41631242.davem@redhat.com> <3C503347.5020608@candelatech.com> <3C50381B.3E1B602@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Jeff Garzik wrote: > Ben Greear wrote: > >> From a quick glance at the man page, it seems netlink would be pretty >>heavy-weight for just wanting to get the counters from a device. I >>especially do not want to have to deal with the unreliable >>(according to the man page) nature of netlink when reading kernel >>counters. To me, an IOCTL seems best. >> > > We wanna send link up/down notification via netlink too, might as well > do stats too. Gives us a lot of flexibility. > > If you have concerns about netlink stability, IMHO we can address that > separately... by fixing manpage, code, whatever. Suggestions/code > review welcome. Note that I am not advocating NOT providing stats through netlink, I just prefer an IOCTL too. > in the kernel, cpu_to/from_be64. in userspace, using glib at least, > GUINT64_TO_BE, etc. Thanks...what does 'be' stand for? Big endian? So...erm...is network-byte-order big-endian or little-endian? :) Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu Jan 24 10:46:19 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OIkJA04741 for netdev-outgoing; Thu, 24 Jan 2002 10:46:19 -0800 Received: from www.linux.org.uk (IDENT:exim@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OIkFP04719 for ; Thu, 24 Jan 2002 10:46:15 -0800 Received: from adsl-156-52-82.asm.bellsouth.net ([66.156.52.82] helo=mandrakesoft.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 16Tnwt-0002MF-00; Thu, 24 Jan 2002 17:46:07 +0000 Message-ID: <3C50485D.29CFCB48@mandrakesoft.com> Date: Thu, 24 Jan 2002 12:46:05 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.18-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: Ben Greear CC: "David S. Miller" , ak@suse.de, Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> <3C502A99.4EEFFB40@mandrakesoft.com> <20020124.074729.41631242.davem@redhat.com> <3C503347.5020608@candelatech.com> <3C50381B.3E1B602@mandrakesoft.com> <3C5045E2.3010003@candelatech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Ben Greear wrote: > Jeff Garzik wrote: > > in the kernel, cpu_to/from_be64. in userspace, using glib at least, > > GUINT64_TO_BE, etc. > > Thanks...what does 'be' stand for? Big endian? yep > So...erm...is network-byte-order > big-endian or little-endian? :) Big endian. Sun rules.[1] Jeff [1] I'm just guessing who to blame, I don't claim to know my history -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com From owner-netdev@oss.sgi.com Thu Jan 24 13:18:56 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OLIu911555 for netdev-outgoing; Thu, 24 Jan 2002 13:18:56 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OLImP11520 for ; Thu, 24 Jan 2002 13:18:48 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA11228; Thu, 24 Jan 2002 23:17:58 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201242017.XAA11228@ms2.inr.ac.ru> Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 To: ja@ssi.bg (Julian Anastasov) Date: Thu, 24 Jan 2002 23:17:58 +0300 (MSK) Cc: netdev@oss.sgi.com, netfilter@lists.samba.org, rusty@rustcorp.com.au In-Reply-To: from "Julian Anastasov" at Jan 24, 2 10:01:07 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > This change causes ip_route_input to select different path from > the multipath route when masqueraded. Pheew... "multipath" route + when "masqueraded" + rules introducing dependency on tos. Do not make this and live in peace. :-) > - we need a place (ROUTING chain?) where each masqueraded connection > can feed ip_route_input ip_route_input is called on a packet. It needs no more arguments. Shortly, you can understand from this my statemnet above that I have lost sync and confused a lot. :-) :-) Seems, I need to return to that your mail where "lsrc" was explained. No matter: > - without lsrc arg the multipath usage can easily fail on route > cache flush sounds like a nonsense. Multipath surely cannot fail just because all the attributes of balanced routes are equivalent. Or were you able to imagine situation when one of paths is masqueraded and another is not or masqueraded differently? Just stop such fantasms. NAT is _not_ permitted in environments with not trivial routing and based on notion of strict barrier. It is an axiom. Alexey From owner-netdev@oss.sgi.com Thu Jan 24 13:34:16 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OLYG115057 for netdev-outgoing; Thu, 24 Jan 2002 13:34:16 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OLY9P15042 for ; Thu, 24 Jan 2002 13:34:10 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA11324; Thu, 24 Jan 2002 23:33:56 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201242033.XAA11324@ms2.inr.ac.ru> Subject: Re: netdev.stats change suggestion To: davem@redhat.COM (David S. Miller) Date: Thu, 24 Jan 2002 23:33:56 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20020124.062650.66057933.davem@redhat.com> from "David S. Miller" at Jan 24, 2 05:45:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > I have no problems with it, I have. 64bit counters create lots of troubles. Particularly, all the reads must be serialized wrt writes. And I really do not see _any_ legal reasons to hold this kind of statistics in the kernel. I would even prefer that 64bit architectures used not "unsigned long" but u32. In fact, all that is required of statistics is to grow monotonically. If it does, user level is more than happy. Look into iproute2, for example for ifstat, nstat and rtacct. Alexey From owner-netdev@oss.sgi.com Thu Jan 24 13:43:17 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OLhHL17362 for netdev-outgoing; Thu, 24 Jan 2002 13:43:17 -0800 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OLh9P17336 for ; Thu, 24 Jan 2002 13:43:10 -0800 Received: from localhost (IDENT:davem@localhost.localdomain [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id MAA04642; Thu, 24 Jan 2002 12:41:22 -0800 Date: Thu, 24 Jan 2002 12:41:22 -0800 (PST) Message-Id: <20020124.124122.105431755.davem@redhat.com> To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion From: "David S. Miller" In-Reply-To: <200201242033.XAA11324@ms2.inr.ac.ru> References: <20020124.062650.66057933.davem@redhat.com> <200201242033.XAA11324@ms2.inr.ac.ru> X-Mailer: Mew version 2.1 on Emacs 21.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: kuznet@ms2.inr.ac.ru Date: Thu, 24 Jan 2002 23:33:56 +0300 (MSK) 64bit counters create lots of troubles. Particularly, all the reads must be serialized wrt writes. We have this serialization at the writes already. Device is always locked in some way. All that is proposed is to formalize this. Yes, I can see why we would not want to do this. Your point has been heard. And I really do not see _any_ legal reasons to hold this kind of statistics in the kernel. I would even prefer that 64bit architectures used not "unsigned long" but u32. In fact, all that is required of statistics is to grow monotonically. If it does, user level is more than happy. Look into iproute2, for example for ifstat, nstat and rtacct. What if u32 wraps twice between snapshots? I sense that your answer imposes some requirement upon the user... But when I run some command line tool, I want it to tell me how many bazillion packets have gone through my terabit ethernet interface since boot :-) Aside from this, I really want to move all of these statistical things towards netlink. And I also want to do it in such a way that it is painless to provide new counters. All of this stuff is basically string+u32 tuples so it can't be that difficult. We could even expose the per-cpu nature of the counters (and even the BH'icity) with a properly designed netlink interface. From owner-netdev@oss.sgi.com Thu Jan 24 13:57:28 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OLvSV20962 for netdev-outgoing; Thu, 24 Jan 2002 13:57:28 -0800 Received: from u.domain.uli (ja.mac.ssi.bg [212.95.166.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OLvIP20920 for ; Thu, 24 Jan 2002 13:57:19 -0800 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id g0ON0BH01379; Thu, 24 Jan 2002 23:00:11 GMT Date: Thu, 24 Jan 2002 23:00:11 +0000 (GMT) From: Julian Anastasov X-X-Sender: ja@u.domain.uli To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com, , Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 In-Reply-To: <200201242017.XAA11228@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, On Thu, 24 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > This change causes ip_route_input to select different path from > > the multipath route when masqueraded. > > Pheew... "multipath" route + when "masqueraded" + rules introducing > dependency on tos. Do not make this and live in peace. :-) No, there are no rules depending on tos but ip_route_input selects different paths for masqueraded packets from same connection but with different tos. > > - we need a place (ROUTING chain?) where each masqueraded connection > > can feed ip_route_input > > ip_route_input is called on a packet. It needs no more arguments. > > Shortly, you can understand from this my statemnet above that > I have lost sync and confused a lot. :-) :-) Seems, I need to return > to that your mail where "lsrc" was explained. Yes, it is a complicated issue, simple setup: ip rule add prio 10 table main ... ip addr add 10.0.1.1/24 brd + dev wan0 ip addr add 10.0.2.1/24 brd + dev wan1 ip rule add prio 20 from 10.0.1.0/24 table 20 ip route add default via 10.0.1.1 dev wan0 src 10.0.1.2 table 20 ip rule add prio 30 from 10.0.2.0/24 table 30 ip route add default via 10.0.2.1 dev wan1 src 10.0.2.2 table 30 ip rule add prio 100 table 100 nat 0.0.0.0 ip route add default table 100 \ nexthop via 10.0.1.1 dev wan0 \ nexthop via 10.0.2.1 dev wan1 nothing special, only a multipath route, universe through 2 gateways > No matter: > > > - without lsrc arg the multipath usage can easily fail on route > > cache flush > > sounds like a nonsense. Multipath surely cannot fail just because > all the attributes of balanced routes are equivalent. > > Or were you able to imagine situation when one of paths is masqueraded > and another is not or masqueraded differently? Just stop such fantasms. No :) See above: two distinct IP blocks through two ISPs, flush the cache and the paths are forgotten. > NAT is _not_ permitted in environments with not trivial routing and > based on notion of strict barrier. It is an axiom. > > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Thu Jan 24 14:04:47 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OM4lE22965 for netdev-outgoing; Thu, 24 Jan 2002 14:04:47 -0800 Received: from grok.yi.org (IDENT:+ymW0dWzmAfwGQw4/TX8vG0pdWx4S801@cx97923-a.phnx3.az.home.com [24.1.197.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OM4eP22940 for ; Thu, 24 Jan 2002 14:04:41 -0800 Received: from candelatech.com (IDENT:VGa8JsoCaZZ6AUDJYPbCJhzCoRY76ePb@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.2) with ESMTP id g0OL4JD02514; Thu, 24 Jan 2002 14:04:19 -0700 Message-ID: <3C5076D3.9040906@candelatech.com> Date: Thu, 24 Jan 2002 14:04:19 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2 X-Accept-Language: en-us MIME-Version: 1.0 To: Andi Kleen CC: "David S. Miller" , jgarzik@mandrakesoft.com, Robert.Olsson@data.slu.se, cw@f00f.org, dima@cc.hut.fi, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <20020124.062650.66057933.davem@redhat.com> <20020124162825.A24611@wotan.suse.de> <3C502A99.4EEFFB40@mandrakesoft.com> <20020124.074729.41631242.davem@redhat.com> <3C503347.5020608@candelatech.com> <20020124175554.A9159@wotan.suse.de> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Andi Kleen wrote: > On Thu, Jan 24, 2002 at 09:16:07AM -0700, Ben Greear wrote: > >>From a quick glance at the man page, it seems netlink would be pretty >>heavy-weight for just wanting to get the counters from a device. I >>especially do not want to have to deal with the unreliable >>(according to the man page) nature of netlink when reading kernel >>counters. To me, an IOCTL seems best. >> > > When the ioctl allocates memory for example it is not in any way more > reliable than a netlink request. The only special case is when you > get an netlink notification from an softirq/irq, in this case there > is the possibility that the netlink packet cannot get allocated when > you're out of GFP_ATOMIC memory. > In this case the application needs to be notified in a failsafe way > to request a resync. > There are various ways to do that, e.g. set a flag and wake it all waiters > up or use a preallocated error skb for this case. > > Note this cannot happen for a simple request, only for a async > threshold exceeded message. Simple request happens completely > in process context and are mostly equivalent to ioctls. Ok, that sounds better. An IOCTL in this case should not have to allocate any memory (other than perhaps something on the stack). It also wouldn't involve having to parse any netlink headers, but could just pass back a packed structure of 64bit numbers. > > >>By the way, how do you change a 64-bit counter to/from network-byte-order >>on a 32bit machine? (Perhaps you would need to do it in this case..but I'm >>curious :)) >> > > netlink packets are in host order. > > -Andi > P.S.: people will of course not use netlink, but instead do popen("eth-tool") > It's therefore important that the output of eth-tool is table and easy > to parse too. In the end it'll have the same issues as /proc. > > -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu Jan 24 14:41:32 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0OMfWO31903 for netdev-outgoing; Thu, 24 Jan 2002 14:41:32 -0800 Received: from luxik.cdi.cz (root@inway106.cdi.cz [213.151.81.106]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0OMfPP31880 for ; Thu, 24 Jan 2002 14:41:26 -0800 Received: from localhost ([127.0.0.1] ident=devik) by luxik.cdi.cz with esmtp (Exim 3.16 #1) id 16TrSC-0006Du-00; Thu, 24 Jan 2002 22:30:40 +0100 Date: Thu, 24 Jan 2002 22:30:40 +0100 (CET) From: Martin Devera To: kuznet@ms2.inr.ac.ru cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion In-Reply-To: <200201242033.XAA11324@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk > In fact, all that is required of statistics is to grow monotonically. > If it does, user level is more than happy. Look into iproute2, for example > for ifstat, nstat and rtacct. I can't found ifstat, nstat in my iproute 010824 .. Maybe old one ? By the way I don't see how you get over wraparound problem. On 1G net 32 bit can wrap in 40s not speaking about bonding more of them. About write/read locking, what about: WRT L if (carry) [implicit barrier] INC H in update part and: 1:READ H barrier READ L barrier READ H -> X if (X != H) goto 1 in read stat part ? Fast, simple (and possibly wrong) :) devik From owner-netdev@oss.sgi.com Fri Jan 25 11:15:16 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PJFGD30731 for netdev-outgoing; Fri, 25 Jan 2002 11:15:16 -0800 Received: from sgi.com (sgi-too.SGI.COM [204.94.211.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PJF8P30725 for ; Fri, 25 Jan 2002 11:15:08 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via SMTP id KAA06282 for ; Fri, 25 Jan 2002 10:14:50 -0800 (PST) mail_from (kuznet@ms2.inr.ac.ru) From: kuznet@ms2.inr.ac.ru Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA27390; Fri, 25 Jan 2002 21:04:13 +0300 Message-Id: <200201251804.VAA27390@ms2.inr.ac.ru> Subject: Re: netdev.stats change suggestion To: davem@redhat.com (David S. Miller) Date: Fri, 25 Jan 2002 21:04:13 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20020124.124122.105431755.davem@redhat.com> from "David S. Miller" at Jan 24, 2 12:41:22 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > We have this serialization at the writes already. > Device is always locked in some way. All that is > proposed is to formalize this. This is not true. Writes are serialized wrt writes and this happens automatically now (not always correctly though, alas). Namely, if A is updated on device irq, no need to serialize etc. Reads are _not_ serialized wrt writes and rely on the fact that each read will fetch a valid counter. With non-atomic types this fails. Actually, it is not fatal flaw: even new irq protected(!!!) lock is not required. Something sort of: do { x = read_low(); y = read_high(); x1 = read_low(); } while (x != x1); (maybe wrong, but surely fixable. I used something sort of this to get tstamp from jiffies and get_cycles() in netif_rx() and this worked) But it is still bizarre. > through my terabit ethernet interface since boot :-) _Daemon_ makes this. "ifstat -d 30" is enough to fight wraps of any kind. But my attitide is based on another observation: to calculate rates etc. mush shorter sampling interval is required, sort of 1-5 seconds and "ifstat -d 5". So, disappearing of problem with wrapping is just gratuitous. > Aside from this, I really want to move all of these statistical things > towards netlink. It is another question. rtnetlink really likes 32 bit numbers. But it another issue, explained mostly by your troubles with sparc. Well, and because u32 is a unique point where everything works: it is identically "unsigned int", so printf, scanf and all the shit works. :-) 100% portable, no issues. No, I do not think that string+number is good. Mixing static and dynamic info is 100% fault. Strings must be fetchable, but separately. Number are accessible in a faster way. I would prefer mmap, like rtacct does. BTW core stat counters are standard. They are defined by MIBs. Scheme which I would propose is the following: 1. a virtual FS. Not /proc. May be driverfs, or just new one. 2. Each device has its own directory. 3. One file maps standard identifiers (copy from MIBs) to offsets. 4. Device specific extensions are separate. 5. Data is accesible both via mmap and read. 6. Counters may be even 64 bit, but their high part is to be undefined. 7. Some set of controls to allow non-standard extensions (f.e. acenic dmas lots of very interesting statistics and it is stupid to lose it) Alexey From owner-netdev@oss.sgi.com Fri Jan 25 11:19:39 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PJJd731333 for netdev-outgoing; Fri, 25 Jan 2002 11:19:39 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PJJXP31314 for ; Fri, 25 Jan 2002 11:19:33 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA27489; Fri, 25 Jan 2002 21:19:14 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201251819.VAA27489@ms2.inr.ac.ru> Subject: Re: netdev.stats change suggestion To: devik@cdi.cz (Martin Devera) Date: Fri, 25 Jan 2002 21:19:14 +0300 (MSK) Cc: davem@redhat.COM, netdev@oss.sgi.com In-Reply-To: from "Martin Devera" at Jan 24, 2 10:30:40 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > I can't found ifstat, nstat in my iproute 010824 .. Maybe old one ? Yes, if appeared in the nest snapshot together with tcpdiag. > By the way I don't see how you get over wraparound problem. It needs sampling each sevral seconds to calculate rates, so that the problem just does not exist. > in read stat part ? Fast, simple (and possibly wrong) :) Yes. And this is exactly which I do not want to pay in addition to storing in the kernel information which it never wants to remember. Well, this attitude has grown as a kind of physiological reaction to proposals sort of adding to struct net_device a Cisco-like "interface desription" string. The reaction is: please, start from showing something really useful, f.e. rates. After you do this, all the problems will disappear, you have lots of space in /etc to store description strings. Counter wrapping is pseudoproblem of the same nature. What will change if netstat will show numbers sort of 761586014605217? The next step is: "Oh, well, I want to clear them!" And this will not pass. Clearing kernel counters is hard bug, which makes impossible interaction to multiple information consumers. Alexey From owner-netdev@oss.sgi.com Fri Jan 25 11:33:17 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PJXH701303 for netdev-outgoing; Fri, 25 Jan 2002 11:33:17 -0800 Received: from gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PJXCP01287 for ; Fri, 25 Jan 2002 11:33:12 -0800 Received: by gtf.org (Postfix, from userid 500) id 4787B1F67; Fri, 25 Jan 2002 12:33:08 -0600 (CST) Date: Fri, 25 Jan 2002 13:33:08 -0500 From: Jeff Garzik To: kuznet@ms2.inr.ac.ru Cc: Martin Devera , davem@redhat.com, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion Message-ID: <20020125133308.B1978@havoc.gtf.org> References: <200201251819.VAA27489@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200201251819.VAA27489@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Fri, Jan 25, 2002 at 09:19:14PM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Jan 25, 2002 at 09:19:14PM +0300, kuznet@ms2.inr.ac.ru wrote: > > I can't found ifstat, nstat in my iproute 010824 .. Maybe old one ? > > Yes, if appeared in the nest snapshot together with tcpdiag. > > > By the way I don't see how you get over wraparound problem. > > It needs sampling each sevral seconds to calculate rates, > so that the problem just does not exist. > > > > in read stat part ? Fast, simple (and possibly wrong) :) > > Yes. And this is exactly which I do not want to pay in addition > to storing in the kernel information which it never wants to remember. On the general topic of 64-bit counters, two facts weigh very heavily, first, current and future hardware stores stats in 64-bit numbers, and second, there are indeed MiBs which have 64-bit stats. We had the specific info at the kernel summit meeting... I will see if I can dig it up. If we do -not- have a way for NICs to dump 64-bit stats, each driver is going to have scale stats down to 32 bits, and perhaps implement in-driver sampling code. Finally, IMHO the world is moving to machines that store 64-bit numbers naturally and atomically. Don't some of the new IA32 CPUs even have support for 64-bit integers? So, I beg pardon for being lost in the thread and not having found the perfect spot to reply, but there it is :) Regards, Jeff From owner-netdev@oss.sgi.com Fri Jan 25 11:44:51 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PJipL03301 for netdev-outgoing; Fri, 25 Jan 2002 11:44:51 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PJilP03289 for ; Fri, 25 Jan 2002 11:44:47 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA27691; Fri, 25 Jan 2002 21:43:59 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201251843.VAA27691@ms2.inr.ac.ru> Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 To: ja@ssi.bg (Julian Anastasov) Date: Fri, 25 Jan 2002 21:43:59 +0300 (MSK) Cc: netdev@oss.sgi.com, netfilter@lists.samba.org, rusty@rustcorp.com.au In-Reply-To: from "Julian Anastasov" at Jan 24, 2 11:00:11 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > No, there are no rules depending on tos but ip_route_input > selects different paths for masqueraded packets from same connection > but with different tos. Hey, stop! tos has nothing to do with this. Your problem is much worse, the same thing will happen as soon as route disappears from cache. > Yes, it is a complicated issue, simple setup: Masquerading to different sources depending on multipath selection? Right? Well, it is exactly the situation when multipath is illegal. It is legal only when different hands of multipath bring the same packet to the same destination. Please, do not try to bring statefullness of any kind to routing. Especially, taking into account that the same thing can be made if you sync to state internal to masquerading with an fwmark. Seems, your "lsrc" is just a second fwmark. Alexey From owner-netdev@oss.sgi.com Fri Jan 25 11:58:24 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PJwOk05679 for netdev-outgoing; Fri, 25 Jan 2002 11:58:24 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PJwJP05663 for ; Fri, 25 Jan 2002 11:58:19 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA27784; Fri, 25 Jan 2002 21:57:51 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201251857.VAA27784@ms2.inr.ac.ru> Subject: Re: netdev.stats change suggestion To: garzik@havoc.gtf.org (Jeff Garzik) Date: Fri, 25 Jan 2002 21:57:51 +0300 (MSK) Cc: devik@cdi.cz, davem@redhat.com, netdev@oss.sgi.com In-Reply-To: <20020125133308.B1978@havoc.gtf.org> from "Jeff Garzik" at Jan 25, 2 01:33:08 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > If we do -not- have a way for NICs to dump 64-bit stats, each driver is > going to have scale stats down to 32 bits, Look at my previous mail, by the way. It covers this. Well, just a note: driver has to transform in any case, f.e. to canonicalize the number to host word order. I proposed to solve all the issues in one shot: no transforms, direct access, set of offsets. Well, and provided bits remain undefined, everyone will use only u32 part. What's about MIBs, it is _user_ _space_ thing, they can be 256 bit with the same success and equally easily. > and perhaps implement > in-driver sampling code. OK, I will wait with rates until such hardware will appear. :-) Alexey From owner-netdev@oss.sgi.com Fri Jan 25 12:04:34 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PK4Yx07095 for netdev-outgoing; Fri, 25 Jan 2002 12:04:34 -0800 Received: from l.himel.bg (IDENT:root@unamed.infotel.bg [212.39.68.18] (may be forged)) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PK4TP07075 for ; Fri, 25 Jan 2002 12:04:29 -0800 Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.9.3/8.9.3) with ESMTP id VAA21219; Fri, 25 Jan 2002 21:11:30 +0200 Date: Fri, 25 Jan 2002 21:11:30 +0200 (EET) From: Julian Anastasov X-X-Sender: To: cc: , netfilter , , Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 In-Reply-To: <200201251843.VAA27691@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, On Fri, 25 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > > Yes, it is a complicated issue, simple setup: > > Masquerading to different sources depending on multipath selection? Right? > > Well, it is exactly the situation when multipath is illegal. > It is legal only when different hands of multipath bring the same > packet to the same destination. > > Please, do not try to bring statefullness of any kind to routing. > > Especially, taking into account that the same thing can be made > if you sync to state internal to masquerading with an fwmark. > Seems, your "lsrc" is just a second fwmark. In fact, the masquerade connections will have the right to call ip_route_input providing lsrc. This is the only valid way to support masquerade through different ISPs with multipath (the route has no preferred source, the first primary IP is used). You are right, the multipath route has distinct paths but the lsrc solves this problem, there are no other issues. Routers NAT-ing through different ISPs are a good thing to support. The users buy two or more ADSLs and achieve failover. > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Fri Jan 25 12:26:29 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PKQTt11512 for netdev-outgoing; Fri, 25 Jan 2002 12:26:29 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PKQQP11492 for ; Fri, 25 Jan 2002 12:26:26 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA27999; Fri, 25 Jan 2002 22:26:11 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201251926.WAA27999@ms2.inr.ac.ru> Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 To: ja@ssi.bg (Julian Anastasov) Date: Fri, 25 Jan 2002 22:26:11 +0300 (MSK) Cc: netdev@oss.sgi.com, netfilter@lists.samba.org, rusty@rustcorp.com.au, ja@ssi.bg In-Reply-To: from "Julian Anastasov" at Jan 25, 2 09:11:30 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > multipath route has distinct paths but the lsrc solves this problem, What's about fwmark? Why it does not help? > Routers NAT-ing through different ISPs are > a good thing to support. I understand this, of course. Alexey From owner-netdev@oss.sgi.com Fri Jan 25 12:42:34 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PKgYx14831 for netdev-outgoing; Fri, 25 Jan 2002 12:42:34 -0800 Received: from l.himel.bg (IDENT:root@unamed.infotel.bg [212.39.68.18] (may be forged)) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PKgSP14810 for ; Fri, 25 Jan 2002 12:42:28 -0800 Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.9.3/8.9.3) with ESMTP id VAA21587; Fri, 25 Jan 2002 21:49:21 +0200 Date: Fri, 25 Jan 2002 21:49:21 +0200 (EET) From: Julian Anastasov X-X-Sender: To: cc: , netfilter , , Subject: Re: [PATCH] Restore ROUTE MASQ in 2.4 In-Reply-To: <200201251926.WAA27999@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, On Fri, 25 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > multipath route has distinct paths but the lsrc solves this problem, > > What's about fwmark? Why it does not help? fwmark can be used for many things. For example, exactly in such setups LVS can use it to mark the incoming traffic that should be part of a virtual service. Then we can't use it to remember the incoming path and then to route the in->out traffic based on it. But may be it is possible, I still don't know the cost. > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Fri Jan 25 14:56:14 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PMuEg10871 for netdev-outgoing; Fri, 25 Jan 2002 14:56:14 -0800 Received: from vaio.greennet (battlejitney.wdhq.scyld.com [216.254.93.178]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PMu8P10838 for ; Fri, 25 Jan 2002 14:56:08 -0800 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id RAA01885; Fri, 25 Jan 2002 17:01:02 -0500 Date: Fri, 25 Jan 2002 17:01:02 -0500 (EST) From: Donald Becker X-Sender: becker@vaio.greennet To: kuznet@ms2.inr.ac.ru cc: Martin Devera , davem@redhat.COM, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion In-Reply-To: <200201251819.VAA27489@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 25 Jan 2002 kuznet@ms2.inr.ac.ru wrote: > > By the way I don't see how you get over wraparound problem. > > It needs sampling each sevral seconds to calculate rates, > so that the problem just does not exist. This bears repeating: normal-path network statistics should be used to calculate rates, not absolute values. Think 'jiffies'. As a human you might interpret a value as "time since boot" or "number of packets since boot" because you have extra knowledge, but a program shouldn't make this assumption. Error counts are a different beast: there you might want to know the absolute count. But no one expects these to approach 32 bit overflow. > Well, this attitude has grown as a kind of physiological reaction to > proposals sort of adding to struct net_device a Cisco-like > "interface desription" string. Oooohhh, lets have a /proc/* file that describes the system in text. An entire book, with only occasional dynamic values. I do agree with the proposal to add to /proc/net/* New per-interface statistics files that have only decimal numbers A program that is only interested in one interface won't trigger re-reads of all interfaces. A new (static, read-once) file that describes contains text field labels The idea to mmap() that statistics file has issues: We need a timestamp for the reader We need a way for the reader to trigger a hardware update (Currently reading /proc/net/dev does this.) We might need a mechanism so that multiple readers can read stable/synchronized values. > Counter wrapping is pseudoproblem of the same nature. > What will change if netstat will show numbers sort of 761586014605217? > > The next step is: "Oh, well, I want to clear them!" > > And this will not pass. Clearing kernel counters is hard bug, > which makes impossible interaction to multiple information consumers. I used to get a zillion requests (and patches) to clear the counters. The "multiple readers" answer is the most easily understood response to this FRF (Frequently Requested mis-Feature). Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From owner-netdev@oss.sgi.com Fri Jan 25 15:22:30 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PNMUr16690 for netdev-outgoing; Fri, 25 Jan 2002 15:22:30 -0800 Received: from grok.yi.org (IDENT:NZKHhz5SwElu8EHNAlbcyh3leFLOSNDf@cx97923-a.phnx3.az.home.com [24.1.197.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PNMRP16674 for ; Fri, 25 Jan 2002 15:22:27 -0800 Received: from candelatech.com (IDENT:J1AST9eWk51hjOxUO8hC+UjwwFjkEzS+@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.6/8.11.2) with ESMTP id g0PMLVD16043; Fri, 25 Jan 2002 15:21:31 -0700 Message-ID: <3C51DA6B.4090302@candelatech.com> Date: Fri, 25 Jan 2002 15:21:31 -0700 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2 X-Accept-Language: en-us MIME-Version: 1.0 To: Donald Becker CC: kuznet@ms2.inr.ac.ru, Martin Devera , davem@redhat.COM, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Donald Becker wrote: > I used to get a zillion requests (and patches) to clear the counters. > The "multiple readers" answer is the most easily understood response to > this FRF (Frequently Requested mis-Feature). There's a fairly small difference between wrapping a 32-bit number and clearing the counters...both ways the reader has to deal with a system that is not strictly increasing... You do the the crutch of knowing that if something only wraps (and is not cleared) that you have *at least* wrapped once.... Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Fri Jan 25 15:32:28 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PNWSD18799 for netdev-outgoing; Fri, 25 Jan 2002 15:32:28 -0800 Received: from www.linux.org.uk (IDENT:exim@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PNWPP18789 for ; Fri, 25 Jan 2002 15:32:25 -0800 Received: from adsl-63-175-52.asm.bellsouth.net ([208.63.175.52] helo=mandrakesoft.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 16UEtJ-0006I7-00; Fri, 25 Jan 2002 22:32:14 +0000 Message-ID: <3C51DCEC.F2C61603@mandrakesoft.com> Date: Fri, 25 Jan 2002 17:32:12 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.18-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: Ben Greear CC: Donald Becker , kuznet@ms2.inr.ac.ru, Martin Devera , davem@redhat.com, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <3C51DA6B.4090302@candelatech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Ben Greear wrote: > > Donald Becker wrote: > > > I used to get a zillion requests (and patches) to clear the counters. > > The "multiple readers" answer is the most easily understood response to > > this FRF (Frequently Requested mis-Feature). > > There's a fairly small difference between wrapping a 32-bit number and > clearing the counters... Not really... the wrapping happens for free, clearing counters is an operation that (as I say) you have to think about. Jeff -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com From owner-netdev@oss.sgi.com Fri Jan 25 15:58:00 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0PNw0j23911 for netdev-outgoing; Fri, 25 Jan 2002 15:58:00 -0800 Received: from www.linux.org.uk (IDENT:exim@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0PNvrP23885 for ; Fri, 25 Jan 2002 15:57:53 -0800 Received: from adsl-63-175-52.asm.bellsouth.net ([208.63.175.52] helo=mandrakesoft.com) by www.linux.org.uk with esmtp (Exim 3.33 #5) id 16UFI5-0006mB-00; Fri, 25 Jan 2002 22:57:49 +0000 Message-ID: <3C51E2EC.E4CECA82@mandrakesoft.com> Date: Fri, 25 Jan 2002 17:57:48 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.79 [en] (X11; U; Linux 2.4.18-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: Donald Becker CC: kuznet@ms2.inr.ac.ru, Martin Devera , davem@redhat.com, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Donald Becker wrote: > > Well, this attitude has grown as a kind of physiological reaction to > > proposals sort of adding to struct net_device a Cisco-like > > "interface desription" string. > > Oooohhh, lets have a /proc/* file that describes the system in text. An > entire book, with only occasional dynamic values. heh > I do agree with the proposal to add to /proc/net/* > New per-interface statistics files that have only decimal numbers > A program that is only interested in one interface won't trigger > re-reads of all interfaces. As long as these numbers can be larger than ULONG_MAX on a 32-bit machine, sure... > A new (static, read-once) file that describes contains text field labels Anything but procfs, really. :) It's a jumbled mess of crud, and I would not be surprised if viro managed kill large portions of it during the 2.5 cycle. To tangent, whether viro creates it or we do, I imagine we will end up with 'netfs' filesystem or somesuch, which contains pretty much the same contents as /proc/net/* now. Per-if stats files would be a good candidate for such an fs :) Such a solution would be both forward and backwards compatible, too [just backport netfs to 2.2, etc.] > The idea to mmap() that statistics file has issues: > We need a timestamp for the reader > We need a way for the reader to trigger a hardware update > (Currently reading /proc/net/dev does this.) > We might need a mechanism so that multiple readers can read > stable/synchronized values. mmap'ing statistics is a pretty nice idea, though I'm not sure how one could implement this with the requirements you list, and still maintain performance. [sure, arch-specific hacks like r/w-protecting a page would make this possible, but such a solution would not be completely portable] Attaching a page to an inode [at inode creation time], and read(2)ing through the page cache should provide near-same performance while being able to handle the requirements you mention. More in the response to Alexey... Regards, Jeff -- Jeff Garzik | "I went through my candy like hot oatmeal Building 1024 | through an internally-buttered weasel." MandrakeSoft | - goats.com From owner-netdev@oss.sgi.com Fri Jan 25 16:09:52 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0Q09qR26384 for netdev-outgoing; Fri, 25 Jan 2002 16:09:52 -0800 Received: from tosh.netlab.uky.edu (tosh.netlab.uky.edu [204.198.76.200]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0Q09mP26369 for ; Fri, 25 Jan 2002 16:09:48 -0800 Received: from gum.netlab.uky.edu (gum.netlab.uky.edu [204.198.76.71]) by tosh.netlab.uky.edu (Postfix) with ESMTP id 80D74CE83 for ; Fri, 25 Jan 2002 18:09:43 -0500 (EST) Received: by gum.netlab.uky.edu (Postfix, from userid 1109) id 6F4822819A; Fri, 25 Jan 2002 18:09:43 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by gum.netlab.uky.edu (Postfix) with ESMTP id 6B1B12B926 for ; Fri, 25 Jan 2002 18:09:43 -0500 (EST) Date: Fri, 25 Jan 2002 18:09:43 -0500 (EST) From: Krishna Prabhala To: netdev@oss.sgi.com Subject: RSIP implementation In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi, I am a graduate student in the dept. of computer science at University of Kentucky. I would like to know if there is an implementation of the RSIP server that is publicly available for educational purposes. I require the RSIP server for my master's project. I am currently working on implementing IP telephony support for ad hoc networks. I would appreciate any kind of help. Thanks, Krishna. From owner-netdev@oss.sgi.com Fri Jan 25 16:22:37 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0Q0Mb428964 for netdev-outgoing; Fri, 25 Jan 2002 16:22:37 -0800 Received: from dibbler.ne.mediaone.net (IDENT:root@dibbler.ne.mediaone.net [24.218.57.139]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0Q0MYP28949 for ; Fri, 25 Jan 2002 16:22:35 -0800 Received: (from rodrigc@localhost) by dibbler.ne.mediaone.net (8.11.0/8.11.0) id g0PNMLs06429; Fri, 25 Jan 2002 18:22:21 -0500 Date: Fri, 25 Jan 2002 18:22:21 -0500 From: Craig Rodrigues To: Krishna Prabhala Cc: netdev@oss.sgi.com Subject: Re: RSIP implementation Message-ID: <20020125182221.A6389@mediaone.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from krisp@netlab.uky.edu on Fri, Jan 25, 2002 at 06:09:43PM -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Jan 25, 2002 at 06:09:43PM -0500, Krishna Prabhala wrote: > Hi, > I am a graduate student in the dept. of computer science at > University of Kentucky. I would like to know if there is an implementation > of the RSIP server that is publicly available for educational purposes. I > require the RSIP server for my master's project. I am currently working on > implementing IP telephony support for ad hoc networks. > I would appreciate any kind of help. Try searching with Google: http://www.google.com/linux?hl=en&q=RSIP&btnG=Google+Search http://openresources.info.ucl.ac.be/rsip -- Craig Rodrigues http://www.gis.net/~craigr rodrigc@mediaone.net From owner-netdev@oss.sgi.com Fri Jan 25 19:10:33 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0Q3AX824077 for netdev-outgoing; Fri, 25 Jan 2002 19:10:33 -0800 Received: from chmls05.mediaone.net (chmls05.mediaone.net [24.147.1.143]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0Q3AUP24070 for ; Fri, 25 Jan 2002 19:10:31 -0800 Received: from [192.168.1.124] (h00045ace1c92.ne.mediaone.net [24.91.198.11]) by chmls05.mediaone.net (8.11.1/8.11.1) with ESMTP id g0Q2AEu00311 for ; Fri, 25 Jan 2002 21:10:14 -0500 (EST) Subject: TCP MD5 signature option (RFC2385) From: Frank Solensky To: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/1.0.1 Date: 25 Jan 2002 20:44:48 -0500 Message-Id: <1012009515.1850.36.camel@localhost.localdomain> Mime-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk I noticed that Linux stack doesn't currently support for RFC2385 (MD5 signatures for TCP packets). This could be useful for the zebra project for authenticating BGP connections with other implementations. I checked various list archives and didn't see any mention of work being underway on this -- what's the best way for me to proceed, download code and just start implementing? Also: any preference as to what the API should look like? Let me know if this should go to linux-kernel list or elsewhere. -- Frank From owner-netdev@oss.sgi.com Fri Jan 25 19:38:11 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0Q3cBM28863 for netdev-outgoing; Fri, 25 Jan 2002 19:38:11 -0800 Received: from mail.storm.ca (storm.ca [209.87.239.69]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0Q3c8P28854 for ; Fri, 25 Jan 2002 19:38:08 -0800 Received: from storm.ca (ppp-209-87-255-134.ottawa.storm.ca [209.87.255.134]) by mail.storm.ca (8.10.2+Sun/8.10.2) with ESMTP id g0Q2c0p27579 for ; Fri, 25 Jan 2002 21:38:00 -0500 (EST) Message-ID: <3C5216EF.DE4A4A81@storm.ca> Date: Fri, 25 Jan 2002 21:39:43 -0500 From: Sandy Harris X-Mailer: Mozilla 4.76 [en] (Win98; U) X-Accept-Language: en,fr MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Re: TCP MD5 signature option (RFC2385) References: <1012009515.1850.36.camel@localhost.localdomain> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Frank Solensky wrote: > > I noticed that Linux stack doesn't currently support for RFC2385 (MD5 > signatures for TCP packets). This could be useful for the zebra project > for authenticating BGP connections with other implementations. Can you use IPsec authentication? See www.freeswan.org for the Linux implementation. > I checked various list archives and didn't see any mention of work being > underway on this -- what's the best way for me to proceed, download code > and just start implementing? I don't know how useful these are, but some things to consider: The /dev/random driver includes MD5 and some code for generating TCP sequence numbers. I'm inclined to doubt a device driver is the right place to put what you want to do, but you might want to look at that code. From owner-netdev@oss.sgi.com Fri Jan 25 20:18:12 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0Q4ICf03889 for netdev-outgoing; Fri, 25 Jan 2002 20:18:12 -0800 Received: from chmls20.mediaone.net (chmls20.mediaone.net [24.147.1.156]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0Q4I9P03875 for ; Fri, 25 Jan 2002 20:18:09 -0800 Received: from [192.168.1.124] (h00045ace1c92.ne.mediaone.net [24.91.198.11]) by chmls20.mediaone.net (8.11.1/8.11.1) with ESMTP id g0Q3Jix22907; Fri, 25 Jan 2002 22:19:44 -0500 (EST) Subject: Re: TCP MD5 signature option (RFC2385) From: Frank Solensky To: Sandy Harris Cc: netdev@oss.sgi.com In-Reply-To: <3C5216EF.DE4A4A81@storm.ca> References: <1012009515.1850.36.camel@localhost.localdomain> <3C5216EF.DE4A4A81@storm.ca> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/1.0.1 Date: 25 Jan 2002 21:52:10 -0500 Message-Id: <1012013557.1850.63.camel@localhost.localdomain> Mime-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 2002-01-25 at 21:39, Sandy Harris wrote: > Frank Solensky wrote: > > > > I noticed that Linux stack doesn't currently support for RFC2385 (MD5 > > signatures for TCP packets). > > Can you use IPsec authentication? > See www.freeswan.org for the Linux implementation. This is a bit different -- the RFC describes an option that would be added to the tcp options procesing while freeswan provides AH which is between the IP and TCP headers. > I don't know how useful these are, but some things to consider: > > The /dev/random driver includes MD5 and some code for generating TCP > sequence numbers. Yeah, I noticed that drivers/char/random.c has the necessary routines (though I'd have to look for what causes USE_SHA to get defined since this would lose the MD5Transform routine). From owner-netdev@oss.sgi.com Fri Jan 25 20:52:51 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0Q4qpo09796 for netdev-outgoing; Fri, 25 Jan 2002 20:52:51 -0800 Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0Q4qmP09784 for ; Fri, 25 Jan 2002 20:52:48 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id B338E1E163; Sat, 26 Jan 2002 04:52:40 +0100 (MET) Date: Sat, 26 Jan 2002 04:52:40 +0100 From: Andi Kleen To: Frank Solensky Cc: netdev@oss.sgi.com Subject: Re: TCP MD5 signature option (RFC2385) Message-ID: <20020126045240.A30893@wotan.suse.de> References: <1012009515.1850.36.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1012009515.1850.36.camel@localhost.localdomain> User-Agent: Mutt/1.3.22.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Jan 25, 2002 at 08:44:48PM -0500, Frank Solensky wrote: > I noticed that Linux stack doesn't currently support for RFC2385 (MD5 > signatures for TCP packets). This could be useful for the zebra project > for authenticating BGP connections with other implementations. > > I checked various list archives and didn't see any mention of work being > underway on this -- what's the best way for me to proceed, download code > and just start implementing? TCP is not very well fitted to add a new 'go over all data in packet' pass. It is heavily optimized for copy-csum-and-forget in one go. You could add a new pass for MD5, but it would not be nice. As TCP MD5 is rather obscure I think I would nearly recommend to not touch the core TCP stack for it and instead implement it in a netfilter module. -Andi From owner-netdev@oss.sgi.com Fri Jan 25 21:17:45 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0Q5HjJ14268 for netdev-outgoing; Fri, 25 Jan 2002 21:17:45 -0800 Received: from www.linux.org.uk (IDENT:exim@parcelfarce.linux.theplanet.co.uk [195.92.249.252]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0Q5HfP14259 for ; Fri, 25 Jan 2002 21:17:41 -0800 Received: from pakrat by www.linux.org.uk with local (Exim 3.33 #5) id 16UKHY-00030I-00; Sat, 26 Jan 2002 04:17:36 +0000 Date: Sat, 26 Jan 2002 04:17:36 +0000 From: Chris Dukes To: Andi Kleen Cc: Frank Solensky , netdev@oss.sgi.com Subject: Re: TCP MD5 signature option (RFC2385) Message-ID: <20020126041736.Q21595@parcelfarce.linux.theplanet.co.uk> References: <1012009515.1850.36.camel@localhost.localdomain> <20020126045240.A30893@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20020126045240.A30893@wotan.suse.de>; from ak@suse.de on Sat, Jan 26, 2002 at 04:52:40AM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, Jan 26, 2002 at 04:52:40AM +0100, Andi Kleen wrote: > On Fri, Jan 25, 2002 at 08:44:48PM -0500, Frank Solensky wrote: > > I noticed that Linux stack doesn't currently support for RFC2385 (MD5 > > signatures for TCP packets). This could be useful for the zebra project > > for authenticating BGP connections with other implementations. > > > > I checked various list archives and didn't see any mention of work being > > underway on this -- what's the best way for me to proceed, download code > > and just start implementing? > > TCP is not very well fitted to add a new 'go over all data in packet' > pass. It is heavily optimized for copy-csum-and-forget in one go. > You could add a new pass for MD5, but it would not be nice. > As TCP MD5 is rather obscure I think I would nearly recommend to not > touch the core TCP stack for it and instead implement it in a netfilter module. Odd, NetBSD and OpenBSD provide TCP_SIGNATURE as a kernel config option. I suspect that FreeBSD, BSDI, and BSD/OS do as well. I've already asked Frank offline if what he is trying to do actually requires linux (The "I need to get this running" factor vs. the "How about a little standardization" factor). Unfortunately, I have no idea if or how AIX, HPUX, and Solaris do TCP signatures, let alone if their API is similar to the BSD interface. In any case, the average user should almost never need this feature to be enabled. -- Chris Dukes "Bert is apparently EEEEVIL, whereas Oscar is just a sysadmin^Wgrouch." -- gorski From owner-netdev@oss.sgi.com Sat Jan 26 06:28:21 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0QESLH01896 for netdev-outgoing; Sat, 26 Jan 2002 06:28:21 -0800 Received: from shell.cyberus.ca (shell.cyberus.ca [216.191.240.114]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0QESGP01884 for ; Sat, 26 Jan 2002 06:28:17 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id IAA04590; Sat, 26 Jan 2002 08:23:41 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Sat, 26 Jan 2002 08:23:41 -0500 (EST) From: jamal To: Andi Kleen cc: Frank Solensky , Subject: Re: TCP MD5 signature option (RFC2385) In-Reply-To: <20020126045240.A30893@wotan.suse.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, 26 Jan 2002, Andi Kleen wrote: > On Fri, Jan 25, 2002 at 08:44:48PM -0500, Frank Solensky wrote: > > I noticed that Linux stack doesn't currently support for RFC2385 (MD5 > > signatures for TCP packets). This could be useful for the zebra project > > for authenticating BGP connections with other implementations. > > > > I checked various list archives and didn't see any mention of work being > > underway on this -- what's the best way for me to proceed, download code > > and just start implementing? > > TCP is not very well fitted to add a new 'go over all data in packet' > pass. It is heavily optimized for copy-csum-and-forget in one go. > You could add a new pass for MD5, but it would not be nice. > As TCP MD5 is rather obscure I think I would nearly recommend to not > touch the core TCP stack for it and instead implement it in a netfilter module. > Andi, This is a TCP option; so should fit well in the slow path. Of course it brings a whole new meaning to DoS;-> IIRC, not all packets within a flow will have this option turned on; cheers, jamal From owner-netdev@oss.sgi.com Sat Jan 26 13:51:29 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0QLpTS02904 for netdev-outgoing; Sat, 26 Jan 2002 13:51:29 -0800 Received: from chmls20.mediaone.net (chmls20.mediaone.net [24.147.1.156]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0QLpNP02889 for ; Sat, 26 Jan 2002 13:51:23 -0800 Received: from [192.168.1.127] (h00045ace1c92.ne.mediaone.net [24.91.198.11]) by chmls20.mediaone.net (8.11.1/8.11.1) with ESMTP id g0QKr0x08305; Sat, 26 Jan 2002 15:53:00 -0500 (EST) Subject: Re: TCP MD5 signature option (RFC2385) From: Frank Solensky To: Andi Kleen , Chris Dukes , jamal Cc: netdev@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/1.0.1 Date: 26 Jan 2002 15:25:02 -0500 Message-Id: <1012076731.2212.52.camel@localhost.localdomain> Mime-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk > On Sat, 26 Jan 2002, Andi Kleen wrote: > > > TCP is not very well fitted to add a new 'go over all data in packet' > > pass. It is heavily optimized for copy-csum-and-forget in one go. > > You could add a new pass for MD5, but it would not be nice. True -- as you say, it is rather obscure. When it is used, it's generally expected that the connection will be slower. Once the BGP table feed has completed, though, a stable connection won't send much more than periodic keepalive messages (but then we all know the difference between 'theory' and 'practice'). On Fri, 2002-01-25 at 23:17, Chris Dukes wrote: > > I've already asked Frank offline if what he is trying to do actually > requires linux (The "I need to get this running" factor vs. the "How > about a little standardization" factor). And I was probably a bit vague in my response -- more the latter. I had been doing some BGP testing a while ago and was using zebra in one of the peers but couldn't test the authentication option since it's not currently available. > Unfortunately, I have no idea if or how AIX, > HPUX, and Solaris do TCP signatures, let alone if their API > is similar to the BSD interface. I sent a query to a friend of mine at Sun earlier today to see if they do; my guess is no but we'll see. > In any case, the average user should almost never need this feature to > be enabled. Agreed; I was planning on making it configurable, off by default. On Sat, 2002-01-26 at 08:23, jamal wrote: > This is a TCP option; so should fit well in the slow path. > Of course it brings a whole new meaning to DoS;-> IIRC, not all packets > within a flow will have this option turned on; I'm pretty sure when I was looking at the OpenBSD implementation, it was an all-or-nothing approach: if a socket had enabled the option, a packet that didn't include the option would be dropped. Vice versa, also: an MD5 signed packet sent to a socket that wasn't expecting it causes a packet drop. -- Frank From owner-netdev@oss.sgi.com Sat Jan 26 20:14:06 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0R4E6V23962 for netdev-outgoing; Sat, 26 Jan 2002 20:14:06 -0800 Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0R4DhP23902 for ; Sat, 26 Jan 2002 20:13:43 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 760BD1E815 for ; Sun, 27 Jan 2002 04:13:35 +0100 (MET) Date: Sun, 27 Jan 2002 04:13:35 +0100 From: Andi Kleen To: netdev@oss.sgi.com Subject: [PATCH] Fix TCP EFAULT error reporting Message-ID: <20020127041335.A12250@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.22.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk davem doesn't seem to like this one, so it'll probably not go in. In case someone needs proper EFAULT reporting for 2.4/2.5 TCP anyways here is the patch for reference. It fixes all network related system call testcases in LTP (except for one which was a bug in LTP). Patch against 2.5.3pre, but should apply to 2.4 with at best minor changes. Also does some minor cleanups in the TCP input path. Also very carefull to not pollute any hot paths; it only adds a single new check to them. -Andi --- linux-work/include/net/tcp.h-TCPFAULT Sat Jan 5 18:18:28 2002 +++ linux-work/include/net/tcp.h Thu Jan 10 19:57:12 2002 @@ -573,6 +573,8 @@ int (*remember_stamp) (struct sock *sk); + void (*send_reset) (struct sk_buff *skb); + __u16 net_header_len; int (*setsockopt) (struct sock *sk, @@ -637,6 +639,8 @@ extern int tcp_v4_remember_stamp(struct sock *sk); extern int tcp_v4_tw_remember_stamp(struct tcp_tw_bucket *tw); + +extern void tcp_v4_send_reset(struct sk_buff *skb); extern int tcp_sendmsg(struct sock *sk, struct msghdr *msg, int size); extern ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags); --- linux-work/net/ipv4/tcp_input.c-TCPFAULT Fri Jan 4 10:51:50 2002 +++ linux-work/net/ipv4/tcp_input.c Thu Jan 10 20:03:17 2002 @@ -3315,7 +3315,7 @@ __set_current_state(TASK_RUNNING); if (tcp_copy_to_iovec(sk, skb, tcp_header_len)) - goto csum_error; + goto csum_error_fault; __skb_pull(skb,tcp_header_len); @@ -3413,7 +3413,8 @@ TCP_INC_STATS_BH(TcpInErrs); NET_INC_STATS_BH(TCPAbortOnSyn); tcp_reset(sk); - return 1; + tp->af_specific->send_reset(skb); + goto discard; } step5: @@ -3436,6 +3437,11 @@ discard: __kfree_skb(skb); return 0; + +csum_error_fault: + TCP_INC_STATS_BH(TcpInErrs); + __kfree_skb(skb); + return -EFAULT; } static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, --- linux-work/net/ipv4/tcp_ipv4.c-TCPFAULT Fri Jan 4 10:51:50 2002 +++ linux-work/net/ipv4/tcp_ipv4.c Thu Jan 10 23:35:38 2002 @@ -53,6 +53,7 @@ #include #include #include +#include #include #include @@ -1033,7 +1034,7 @@ * Exception: precedence violation. We do not implement it in any case. */ -static void tcp_v4_send_reset(struct sk_buff *skb) +void tcp_v4_send_reset(struct sk_buff *skb) { struct tcphdr *th = skb->h.th; struct tcphdr rth; @@ -1546,12 +1547,12 @@ IP_INC_STATS_BH(IpInDelivers); - if (sk->state == TCP_ESTABLISHED) { /* Fast path */ + if (likely(sk->state == TCP_ESTABLISHED)) { /* Fast path */ + int err; TCP_CHECK_TIMER(sk); - if (tcp_rcv_established(sk, skb, skb->h.th, skb->len)) - goto reset; + err = tcp_rcv_established(sk, skb, skb->h.th, skb->len); TCP_CHECK_TIMER(sk); - return 0; + return err; } if (skb->len < (skb->h.th->doff<<2) || tcp_checksum_complete(skb)) @@ -1875,6 +1876,7 @@ tcp_v4_syn_recv_sock, tcp_v4_hash_connecting, tcp_v4_remember_stamp, + tcp_v4_send_reset, sizeof(struct iphdr), ip_setsockopt, --- linux-work/net/ipv4/tcp.c-TCPFAULT Fri Jan 4 21:36:38 2002 +++ linux-work/net/ipv4/tcp.c Thu Jan 10 20:01:33 2002 @@ -1359,21 +1359,23 @@ return timeo; } -static void tcp_prequeue_process(struct sock *sk) +static int tcp_prequeue_process(struct sock *sk) { struct sk_buff *skb; struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp); + int err = 0; net_statistics[smp_processor_id()*2+1].TCPPrequeued += skb_queue_len(&tp->ucopy.prequeue); /* RX process wants to run with disabled BHs, though it is not necessary */ local_bh_disable(); while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL) - sk->backlog_rcv(sk, skb); + err |= sk->backlog_rcv(sk, skb); local_bh_enable(); /* Clear memory counter. */ tp->ucopy.memory = 0; + return err; } /* @@ -1573,7 +1575,8 @@ if (tp->rcv_nxt == tp->copied_seq && skb_queue_len(&tp->ucopy.prequeue)) { do_prequeue: - tcp_prequeue_process(sk); + if ((err = tcp_prequeue_process(sk)) < 0) + goto out; if ((chunk = len - tp->ucopy.len) != 0) { net_statistics[smp_processor_id()*2+1].TCPDirectCopyFromPrequeue += chunk; --- linux-work/net/ipv6/tcp_ipv6.c-TCPFAULT Tue Nov 27 22:58:12 2001 +++ linux-work/net/ipv6/tcp_ipv6.c Thu Jan 10 19:57:12 2002 @@ -1413,6 +1413,7 @@ struct sk_filter *filter; #endif struct sk_buff *opt_skb = NULL; + int err; /* Imagine: socket is IPv6. IPv4 packet arrives, goes to IPv4 receive handler and backlogged. @@ -1456,12 +1457,11 @@ if (sk->state == TCP_ESTABLISHED) { /* Fast path */ TCP_CHECK_TIMER(sk); - if (tcp_rcv_established(sk, skb, skb->h.th, skb->len)) - goto reset; + err = tcp_rcv_established(sk, skb, skb->h.th, skb->len); TCP_CHECK_TIMER(sk); if (opt_skb) goto ipv6_pktoptions; - return 0; + return err; } if (skb->len < (skb->h.th->doff<<2) || tcp_checksum_complete(skb)) @@ -1486,6 +1486,7 @@ } } + err = 0; TCP_CHECK_TIMER(sk); if (tcp_rcv_state_process(sk, skb, skb->h.th, skb->len)) goto reset; @@ -1531,7 +1532,7 @@ if (opt_skb) kfree_skb(opt_skb); - return 0; + return err; } int tcp_v6_rcv(struct sk_buff *skb) @@ -1763,6 +1764,7 @@ tcp_v6_syn_recv_sock, tcp_v6_hash_connecting, tcp_v6_remember_stamp, + tcp_v6_send_reset, sizeof(struct ipv6hdr), ipv6_setsockopt, @@ -1783,6 +1785,7 @@ tcp_v6_syn_recv_sock, tcp_v4_hash_connecting, tcp_v4_remember_stamp, + tcp_v4_send_reset, sizeof(struct iphdr), ipv6_setsockopt, --- linux-work/net/netsyms.c-TCPFAULT Fri Jan 4 10:51:52 2002 +++ linux-work/net/netsyms.c Thu Jan 10 19:57:12 2002 @@ -392,6 +392,7 @@ EXPORT_SYMBOL(sysctl_tcp_ecn); EXPORT_SYMBOL(tcp_cwnd_application_limited); EXPORT_SYMBOL(tcp_sendpage); +EXPORT_SYMBOL(tcp_v4_send_reset); EXPORT_SYMBOL(tcp_write_xmit); From owner-netdev@oss.sgi.com Sat Jan 26 21:38:42 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0R5cgD03394 for netdev-outgoing; Sat, 26 Jan 2002 21:38:42 -0800 Received: from tapu.f00f.org (tapu.cryptoapps.com [63.108.153.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0R5ceP03388 for ; Sat, 26 Jan 2002 21:38:40 -0800 Received: by tapu.f00f.org (Postfix, from userid 1000) id C42DF5763; Sat, 26 Jan 2002 20:37:30 -0800 (PST) Date: Sat, 26 Jan 2002 20:37:30 -0800 From: Chris Wedgwood To: Andi Kleen Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Fix TCP EFAULT error reporting Message-ID: <20020127043730.GA9892@tapu.f00f.org> References: <20020127041335.A12250@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020127041335.A12250@wotan.suse.de> User-Agent: Mutt/1.3.27i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 400 Lines: 14 On Sun, Jan 27, 2002 at 04:13:35AM +0100, Andi Kleen wrote: davem doesn't seem to like this one, so it'll probably not go in. In case someone needs proper EFAULT reporting for 2.4/2.5 TCP anyways here is the patch for reference. It fixes all network related system call testcases in LTP (except for one which was a bug in LTP). LTP is what and available from where? --cw From owner-netdev@oss.sgi.com Sun Jan 27 01:18:18 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0R9II231007 for netdev-outgoing; Sun, 27 Jan 2002 01:18:18 -0800 Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0R9IEP30994 for ; Sun, 27 Jan 2002 01:18:14 -0800 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 21F5C1E415; Sun, 27 Jan 2002 09:17:58 +0100 (MET) Received: from aj by arthur.inka.de with local (Exim 3.34 #1) id 16UkVg-00042d-00; Sun, 27 Jan 2002 09:17:56 +0100 To: Chris Wedgwood Cc: Andi Kleen , netdev@oss.sgi.com Subject: Re: [PATCH] Fix TCP EFAULT error reporting References: <20020127041335.A12250@wotan.suse.de> <20020127043730.GA9892@tapu.f00f.org> From: Andreas Jaeger Date: Sun, 27 Jan 2002 09:17:56 +0100 In-Reply-To: <20020127043730.GA9892@tapu.f00f.org> (Chris Wedgwood's message of "Sat, 26 Jan 2002 20:37:30 -0800") Message-ID: User-Agent: Gnus/5.090006 (Oort Gnus v0.06) XEmacs/21.4 (Artificial Intelligence, i386-suse-linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 627 Lines: 21 Chris Wedgwood writes: > On Sun, Jan 27, 2002 at 04:13:35AM +0100, Andi Kleen wrote: > > davem doesn't seem to like this one, so it'll probably not go in. > In case someone needs proper EFAULT reporting for 2.4/2.5 TCP > anyways here is the patch for reference. It fixes all network > related system call testcases in LTP (except for one which was a > bug in LTP). > > LTP is what and available from where? Linux Test Project, available from sourceforge: http://ltp.sourceforge.net/ Andreas -- Andreas Jaeger SuSE Labs aj@suse.de private aj@arthur.inka.de http://www.suse.de/~aj From owner-netdev@oss.sgi.com Sun Jan 27 01:59:02 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0R9x2S04122 for netdev-outgoing; Sun, 27 Jan 2002 01:59:02 -0800 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0R9weP04060 for ; Sun, 27 Jan 2002 01:58:40 -0800 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.33 #1) id 16Ul92-0000mW-00 for netdev@oss.sgi.com; Sun, 27 Jan 2002 09:58:36 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.34 #1) id 16Ul7k-0006Kt-00; Sun, 27 Jan 2002 09:57:16 +0100 Date: Sun, 27 Jan 2002 09:57:16 +0100 From: Harald Welte To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: [PATCH] Make netfilter handle SACK in NAT'ed connections (was Re: Fw: oops/bug in tcp, SACK doesn't work?) Message-ID: <20020127095716.H16571@sunbeam.de.gnumonks.org> References: <20010728004447.I1240@obroa-skai.gnumonks.org> <200107291653.UAA18260@ms2.inr.ac.ru> <20010731033801.M1486@obroa-skai.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010731033801.M1486@obroa-skai.gnumonks.org>; from laforge@gnumonks.org on Tue, Jul 31, 2001 at 03:38:01AM -0300 X-Operating-System: Linux sunbeam.de.gnumonks.org 2.4.17 X-Date: Today is Setting Orange, the 25th day of Chaos in the YOLD 3168 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 6089 Lines: 206 On Tue, Jul 31, 2001 at 03:38:01AM -0300, Harald Welte wrote: Hi Alexey & Others. I'm now following up a very old thread about netfilter deleting SACKPERM in the case of NAT'ing protocols with helpers (ftp, irc, ...) > On Sun, Jul 29, 2001 at 08:53:36PM +0400, Alexey Kuznetsov wrote: > > > > It is not a valid justification. It is difficult to rewrite sequence > > numbers. As soon as nat does this, rewriting sacks is easy. Even not easy, > > trivial. > > > Sad and not expected behaviour. I used to ridicule commercial firewall > > vendors, sometimes doing shit of this kind without any clear reasons. :-) > > Ok, I am willing to extend netfilter conntrack/nat in order to deal with > SACK. It is really not about being too lazy to do it. I've now implemented 'correct' SACK alteration (as correct as we can with the current codebase).. It has gotten quite some bit of code, and I fear it adds significant complexity (need to wade through all TCP options of every packet, alter all SACK's, ...). It has been tested for some time in a couple of machines, and is working so far (I have artificially removed packets from the TCP flow in order to cause the receiver generate SACK's and they have been altered correctly). The only question remaining is: Is it worth the effort? What do the core linux developers think? We could put this patch into netfilter and remove the old delete-sackperm code. We could also make it a config option (like the patch below does). Or we could just keep the current behaviour of deleting sackperm. Please _don't_ apply it to the kernel yet, this is just a proposal for discussion. --- linuxppc-031201-nfpom/net/ipv4/netfilter/ip_nat_helper.c Sun Dec 2 21:13:35 2001 +++ linuxppc-031201-nfpom-sack/net/ipv4/netfilter/ip_nat_helper.c Mon Jan 14 21:54:58 2002 @@ -1,8 +1,11 @@ /* ip_nat_mangle.c - generic support functions for NAT helpers * - * (C) 2000 by Harald Welte + * (C) 2000-2002 by Harald Welte * * distributed under the terms of GNU GPL + * + * 14 Jan 2002 Harald Welte : + * - add support for SACK adjustment */ #include #include @@ -32,6 +35,9 @@ #define DEBUGP(format, args...) #define DUMP_OFFSET(x) #endif + +/* FIXME */ +#define CONFIG_IP_NF_NAT_SACK 1 DECLARE_LOCK(ip_nat_seqofs_lock); @@ -182,6 +188,102 @@ return 1; } +#ifdef CONFIG_IP_NF_NAT_SACK +/* Adjust one found SACK option including checksum correction */ +static void +sack_adjust(struct tcphdr *tcph, + unsigned char *ptr, + struct ip_nat_seq *natseq) +{ + struct tcp_sack_block *sp = (struct tcp_sack_block *)(ptr+2); + int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3; + int i; + + for (i = 0; i < num_sacks; i++, sp++) { + u_int32_t new_start_seq, new_end_seq; + + if (after(ntohl(sp->start_seq) - natseq->offset_before, + natseq->correction_pos)) + new_start_seq = ntohl(sp->start_seq) + - natseq->offset_after; + else + new_start_seq = ntohl(sp->start_seq) + - natseq->offset_before; + new_start_seq = htonl(new_start_seq); + + if (after(ntohl(sp->end_seq) - natseq->offset_before, + natseq->correction_pos)) + new_end_seq = ntohl(sp->end_seq) + - natseq->offset_after; + else + new_end_seq = ntohl(sp->end_seq) + - natseq->offset_before; + new_end_seq = htonl(new_end_seq); + + DEBUGP("sack_adjust: start_seq: %d->%d, end_seq: %d->%d\n", + ntohl(sp->start_seq), new_start_seq, + ntohl(sp->end_seq), new_end_seq); + + tcph->check = + ip_nat_cheat_check(~sp->start_seq, new_start_seq, + ip_nat_cheat_check(~sp->end_seq, + new_end_seq, + tcph->check)); + + sp->start_seq = new_start_seq; + sp->end_seq = new_end_seq; + } +} + + +/* TCP SACK sequence number adjustment, return 0 if sack found and adjusted */ +static int +ip_nat_sack_adjust(struct sk_buff *skb, + struct ip_conntrack *ct, + enum ip_conntrack_info ctinfo) +{ + struct iphdr *iph; + struct tcphdr *tcph; + unsigned char *ptr; + int length, dir, sack_adjusted = 0; + + iph = skb->nh.iph; + tcph = (void *)iph + iph->ihl*4; + length = (tcph->doff*4)-sizeof(struct tcphdr); + ptr = (unsigned char *)(tcph+1); + + dir = CTINFO2DIR(ctinfo); + + while (length > 0) { + int opcode = *ptr++; + int opsize; + + switch (opcode) { + case TCPOPT_EOL: + return !sack_adjusted; + case TCPOPT_NOP: + length--; + continue; + default: + opsize = *ptr++; + if (opsize > length) /* no partial opts */ + return !sack_adjusted; + if (opcode == TCPOPT_SACK) { + /* found SACK */ + if((opsize >= (TCPOLEN_SACK_BASE + TCPOLEN_SACK_PERBLOCK)) && + !((opsize - TCPOLEN_SACK_BASE) % TCPOLEN_SACK_PERBLOCK)) + sack_adjust(tcph, ptr-2, &ct->nat.info.seq[!dir]); + + sack_adjusted = 1; + } + ptr += opsize-2; + length -= opsize; + } + } + return !sack_adjusted; +} +#endif /* CONFIG_IP_NF_NAT_SACK */ + /* TCP sequence number adjustment */ int ip_nat_seq_adjust(struct sk_buff *skb, @@ -226,9 +328,24 @@ tcph->seq = newseq; tcph->ack_seq = newack; +#ifdef CONFIG_IP_NF_NAT_SACK + ip_nat_sack_adjust(skb, ct, ctinfo); +#endif + return 0; } +#ifdef CONFIG_IP_NF_NAT_SACK + +/* Well, no need to fuck rusty anymor. We _can_ deal correctly with SACK */ +void +ip_nat_delete_sack(struct sk_buff *skb, struct tcphdr *tcph) +{ + return; +} + +#else /* CONFIG_IP_NF_NAT_SACK */ + /* Grrr... SACK. Fuck me even harder. Don't want to fix it on the fly, so blow it away. */ void @@ -272,6 +389,8 @@ } else DEBUGP("Something wrong with SACK_PERM.\n"); } + +#endif /* CONFIG_IP_NF_NAT_SACK */ static inline int helper_cmp(const struct ip_nat_helper *helper, -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Sun Jan 27 14:02:44 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0RM2iQ23721 for netdev-outgoing; Sun, 27 Jan 2002 14:02:44 -0800 Received: from smtp3.libero.it (smtp3.libero.it [193.70.192.53]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0RM2bP23711 for ; Sun, 27 Jan 2002 14:02:38 -0800 Received: from trantor.ferrara.linux.it (151.26.186.55) by smtp3.libero.it (6.0.032) id 3BD43E25025454C5; Sun, 27 Jan 2002 22:02:03 +0100 Received: from localhost (localhost.localdomain [127.0.0.1]) by trantor.ferrara.linux.it (Postfix) with ESMTP id B23561FAD1; Sun, 27 Jan 2002 21:02:25 +0100 (CET) Date: Sun, 27 Jan 2002 21:02:25 +0100 (CET) From: Mauro Tortonesi To: Jon Grimm Cc: , "sctp-developers-list@cig.mot.com" , Subject: Re: SCTP and IPv6 roadmap In-Reply-To: <3C4CA2E9.26253A69@austin.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1869 Lines: 50 On Mon, 21 Jan 2002, Jon Grimm wrote: > Mauro, > > I'm glad you ask, since it gives me a chance to put in a little plug for > the lksctp project. > > No, we aren't abandoned at all. > > It is quite true that we've been on a base of 2.4.1 overly long. We > are in the active process of moving to a 2.4.17 base and redoing our > file hierarchy to better allow us to stay up with current kernels. We > will hopefully be more nimble in the future, but our current code is > only available as anonymous download in CVS. > > Overall, we'd love to get into 2.5, however realize we have a bit of > work to focus on first. > > For more information, please visit the project's website at: > http://www.sourceforge.net/projects/lksctp > > For more information on the SCTP protocol, see RFC 2960 at: > http://www.ietf.org/rfc/rfc2960.txt thanks for your kind answer, jon. i am looking for an interesting topic for my master thesis, and i am especially interested in next generation networking protocols, like ipv6 and, of course, sctp. for my thesis, i'd really like to implement some functionality in the linux kernel which could also be useful to the opensource community. unfortunately, it seems that much of ipv6 and sctp features have already been implemented in linux, from the usagi and lksctp projects, respectively, and there's not much left to be done - at least not so much IMHO to do a thesis about it. so, if any of you has a good idea or knows an interesting networking feature which still needs to be implemented in linux (especially some kernel stuff) and deserves someone to write a thesis about it, please let me know. thank you in advance. -- Aequam memento rebus in arduis servare mentem... Mauro Tortonesi mauro@ferrara.linux.it Ferrara Linux User Group http://www.ferrara.linux.it Project6 - IPv6 for Linux http://project6.ferrara.linux.it From owner-netdev@oss.sgi.com Sun Jan 27 23:05:39 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0S75dM19408 for netdev-outgoing; Sun, 27 Jan 2002 23:05:39 -0800 Received: from web14006.mail.yahoo.com (web14006.mail.yahoo.com [216.136.175.122]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0S75ZP19401 for ; Sun, 27 Jan 2002 23:05:35 -0800 Message-ID: <20020128060533.97925.qmail@web14006.mail.yahoo.com> Received: from [156.153.255.243] by web14006.mail.yahoo.com via HTTP; Sun, 27 Jan 2002 22:05:33 PST Date: Sun, 27 Jan 2002 22:05:33 -0800 (PST) From: Cacophonix Subject: Re: SCTP and IPv6 roadmap To: Mauro Tortonesi Cc: netdev@oss.sgi.com, "sctp-developers-list@cig.mot.com" , usagi-users@linux-ipv6.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 997 Lines: 30 How about sctp for IPv6? Or an iSCSI mapping to SCTP? --karthik --- Mauro Tortonesi wrote: > unfortunately, it seems that much of ipv6 and sctp features have already > been implemented in linux, from the usagi and lksctp projects, > respectively, and there's not much left to be done - at least not so much > IMHO to do a thesis about it. > > so, if any of you has a good idea or knows an interesting networking > feature which still needs to be implemented in linux (especially some > kernel stuff) and deserves someone to write a thesis about it, please let > me know. > > thank you in advance. > > -- > Aequam memento rebus in arduis servare mentem... > > Mauro Tortonesi mauro@ferrara.linux.it > Ferrara Linux User Group http://www.ferrara.linux.it > Project6 - IPv6 for Linux http://project6.ferrara.linux.it > __________________________________________________ Do You Yahoo!? Great stuff seeking new owners in Yahoo! Auctions! http://auctions.yahoo.com From owner-netdev@oss.sgi.com Mon Jan 28 04:22:37 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0SCMb322413 for netdev-outgoing; Mon, 28 Jan 2002 04:22:37 -0800 Received: from hq.pm.waw.pl (hq.pm.waw.pl [195.116.170.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0SCMXP22402 for ; Mon, 28 Jan 2002 04:22:34 -0800 Received: (from uucp@localhost) by hq.pm.waw.pl with UUCP id g0SBKqr08676; Mon, 28 Jan 2002 12:20:52 +0100 Received: (from uucp@localhost) by intrepid.pm.waw.pl (8.11.6/8.11.6) with UUCP id g0S1XB814942; Mon, 28 Jan 2002 02:33:11 +0100 Received: (from khc@localhost) by defiant.pm.waw.pl (8.11.6/8.11.6) id g0RLq6t13355; Sun, 27 Jan 2002 22:52:06 +0100 To: Ben Greear Cc: Donald Becker , kuznet@ms2.inr.ac.ru, Martin Devera , davem@redhat.COM, netdev@oss.sgi.com Subject: Re: netdev.stats change suggestion References: <3C51DA6B.4090302@candelatech.com> From: Krzysztof Halasa Date: 27 Jan 2002 22:52:06 +0100 In-Reply-To: <3C51DA6B.4090302@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 527 Lines: 13 Ben Greear writes: > There's a fairly small difference between wrapping a 32-bit number and > clearing the counters...both ways the reader has to deal with a system > that is not strictly increasing... You do the the crutch of knowing that > if something only wraps (and is not cleared) that you have *at least* > wrapped once.... ... and you can't say for sure that there was no wrap if the previous value was smaller. You may have just missed the wrap. -- Krzysztof Halasa Network Administrator From owner-netdev@oss.sgi.com Mon Jan 28 10:39:04 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0SId4O11867 for netdev-outgoing; Mon, 28 Jan 2002 10:39:04 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0SId0P11860 for ; Mon, 28 Jan 2002 10:39:00 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA00878; Mon, 28 Jan 2002 20:38:32 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201281738.UAA00878@ms2.inr.ac.ru> Subject: Re: [PATCH] Make netfilter handle SACK in NAT'ed connections (was Re: Fw: oops/bug in tcp, SACK doesn't work?) To: laforge@gnumonks.org (Harald Welte) Date: Mon, 28 Jan 2002 20:38:32 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20020127095716.H16571@sunbeam.de.gnumonks.org> from "Harald Welte" at Jan 27, 2 09:57:16 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 748 Lines: 24 Hello! > The only question remaining is: Is it worth the effort? What do the > core linux developers think? What's about me, I think it is required. There are no reasons to drop sacks, when you already have code to mangle data. About complexity... does not matter, "complexity" happens when something is logically not quite trivial. SACK mangling is just straight hand work rather than complexity. It is even not long looking at the patch. :-) Unlike timestamps. Timestamps are better to delete even when not mangling. BTW what is this? /* Half a match? This means a partial retransmisison. It's a cracker being funky. */ >From code I cannot guess, what does it mean. Does this mean that NAT can block some valid data? Alexey From owner-netdev@oss.sgi.com Mon Jan 28 11:21:38 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0SJLcP17926 for netdev-outgoing; Mon, 28 Jan 2002 11:21:38 -0800 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0SJLVP17917 for ; Mon, 28 Jan 2002 11:21:31 -0800 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.33 #1) id 16VGPH-00087H-00 for netdev@oss.sgi.com; Mon, 28 Jan 2002 19:21:27 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.34 #1) id 16VGNH-0007Kf-00; Mon, 28 Jan 2002 19:19:23 +0100 Date: Mon, 28 Jan 2002 19:19:23 +0100 From: Harald Welte To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Make netfilter handle SACK in NAT'ed connections (was Re: Fw: oops/bug in tcp, SACK doesn't work?) Message-ID: <20020128191923.V26676@sunbeam.de.gnumonks.org> References: <20020127095716.H16571@sunbeam.de.gnumonks.org> <200201281738.UAA00878@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <200201281738.UAA00878@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Mon, Jan 28, 2002 at 08:38:32PM +0300 X-Operating-System: Linux sunbeam.de.gnumonks.org 2.4.17 X-Date: Today is Pungenday, the 28th day of Chaos in the YOLD 3168 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2214 Lines: 54 On Mon, Jan 28, 2002 at 08:38:32PM +0300, Alexey Kuznetsov wrote: > Hello! Hi Alexey. > > The only question remaining is: Is it worth the effort? What do the > > core linux developers think? > > About complexity... does not matter, "complexity" happens when something > is logically not quite trivial. SACK mangling is just straight hand work > rather than complexity. It is even not long looking at the patch. :-) Mh. Ok, I will consider submitting it to the kernel after it was tested for some more time. > BTW what is this? > > /* Half a match? This means a partial retransmisison. > It's a cracker being funky. */ > > From code I cannot guess, what does it mean. Does this mean that NAT can > block some valid data? It means that the connection tracking has found something to be modified (like ftp PORT or irc DCC) by the nat module between two particular sequence number (e.g. start and end of the PORT command), but the packet arriving at the nat helper (which is called after the conntrack helper) is operating on a packet which doesn't contain the full PORT command. Because in this case the conntrack helper has already seen the full PORT command, the nat helper is definitely dealing with a partial retransmission. this partial retransmission is dropped, assuming that the next retransmission will be a retransmission of the whole packet, as we have seen it before. I'm not really sure what rusty's exact reasoning about this was, but I guess it was too complicated to replace only a part of the expectation- causing string (PORT command). We could handle this correctly, if we'd really try hard. But there are a lot of cases (i.e. PORT command split over two seperate packets) where we could deal better but it's just way more than you want to have inside your kernel. transparent proxies are better if you want to be perfect in this. > Alexey -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Mon Jan 28 12:02:40 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0SK2eu27380 for netdev-outgoing; Mon, 28 Jan 2002 12:02:40 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0SK2ZP27365 for ; Mon, 28 Jan 2002 12:02:36 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA01789; Mon, 28 Jan 2002 22:02:13 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201281902.WAA01789@ms2.inr.ac.ru> Subject: Re: [PATCH] Make netfilter handle SACK in NAT'ed connections (was Re: Fw: oops/bug in tcp, SACK doesn't work?) To: laforge@gnumonks.org (Harald Welte) Date: Mon, 28 Jan 2002 22:02:13 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20020128191923.V26676@sunbeam.de.gnumonks.org> from "Harald Welte" at Jan 28, 2 07:19:23 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 922 Lines: 26 Hello! > this partial retransmission is dropped, assuming that the next retransmission > will be a retransmission of the whole packet, as we have seen it before. The assumption can be wrong. This happens with linux. Even if tcp_retrans_collapse is on, collapcing may have obstacles not allowing to collapse. > a lot of cases (i.e. PORT command split over two seperate packets) What is difficult in this case? I simply do not understand this... If you have a defined transofrm, there is no problems in partial rewrites. > your kernel. transparent proxies are better if you want to be perfect in > this. No ack. If it were a real fault of approach, it would be true. But as soon as it is explained only by lazyness of author... no ack. It is simply unpleasant. When seeing report of Cisco director blocking some valid data, we refer to Cisco. But when our own code does the same shit, it is _double_ shame. Alexey From owner-netdev@oss.sgi.com Mon Jan 28 12:31:35 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0SKVZl32572 for netdev-outgoing; Mon, 28 Jan 2002 12:31:35 -0800 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0SKVVP32566 for ; Mon, 28 Jan 2002 12:31:31 -0800 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.33 #1) id 16VHV2-0008OK-00 for netdev@oss.sgi.com; Mon, 28 Jan 2002 20:31:28 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.34 #1) id 16VHQQ-0007OY-00; Mon, 28 Jan 2002 20:26:42 +0100 Date: Mon, 28 Jan 2002 20:26:42 +0100 From: Harald Welte To: Chris Wedgwood Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [PATCH] Make netfilter handle SACK in NAT'ed connections (was Re: Fw: oops/bug in tcp, SACK doesn't work?) Message-ID: <20020128202642.X26676@sunbeam.de.gnumonks.org> References: <20010728004447.I1240@obroa-skai.gnumonks.org> <200107291653.UAA18260@ms2.inr.ac.ru> <20010731033801.M1486@obroa-skai.gnumonks.org> <20020127095716.H16571@sunbeam.de.gnumonks.org> <20020127122036.GA10858@tapu.f00f.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20020127122036.GA10858@tapu.f00f.org>; from cw@f00f.org on Sun, Jan 27, 2002 at 04:20:36AM -0800 X-Operating-System: Linux sunbeam.de.gnumonks.org 2.4.17 X-Date: Today is Pungenday, the 28th day of Chaos in the YOLD 3168 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 870 Lines: 22 On Sun, Jan 27, 2002 at 04:20:36AM -0800, Chris Wedgwood wrote: > On Sun, Jan 27, 2002 at 09:57:16AM +0100, Harald Welte wrote: > > Hi Alexey & Others. > > I'm now following up a very old thread about netfilter deleting > SACKPERM in the case of NAT'ing protocols with helpers (ftp, irc, ...) > > Why not just strip SACK when using NAT? because SACK is generally a very useful extension of the TCP protocol, and we shouldn't just be arrogant and decide that our users are not allowed to use it in combination of nat. > --cw -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Mon Jan 28 12:31:41 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0SKVf732614 for netdev-outgoing; Mon, 28 Jan 2002 12:31:41 -0800 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0SKVUP32563 for ; Mon, 28 Jan 2002 12:31:30 -0800 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.33 #1) id 16VHV1-0008OB-00 for netdev@oss.sgi.com; Mon, 28 Jan 2002 20:31:27 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.34 #1) id 16VHKm-0007OM-00; Mon, 28 Jan 2002 20:20:52 +0100 Date: Mon, 28 Jan 2002 20:20:52 +0100 From: Harald Welte To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Make netfilter handle SACK in NAT'ed connections (was Re: Fw: oops/bug in tcp, SACK doesn't work?) Message-ID: <20020128202052.W26676@sunbeam.de.gnumonks.org> References: <20020128191923.V26676@sunbeam.de.gnumonks.org> <200201281902.WAA01789@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <200201281902.WAA01789@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Mon, Jan 28, 2002 at 10:02:13PM +0300 X-Operating-System: Linux sunbeam.de.gnumonks.org 2.4.17 X-Date: Today is Pungenday, the 28th day of Chaos in the YOLD 3168 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3689 Lines: 87 On Mon, Jan 28, 2002 at 10:02:13PM +0300, Alexey Kuznetsov wrote: > > this partial retransmission is dropped, assuming that the next > > retransmission will be a retransmission of the whole packet, as we have > > seen it before. > > The assumption can be wrong. This happens with linux. Even if > tcp_retrans_collapse is on, collapcing may have obstacles not allowing > to collapse. but why would we have a partial retransmission in the first place? under which circumstances would the receiver only ACK half the packet it has received, causing a partial retransmission on the sender side? The problem the comment was referring to was only if we already have seen a full PORT command inside the first packet, but a retransmission does not contain the full PORT. > > a lot of cases (i.e. PORT command split over two seperate packets) > > What is difficult in this case? I simply do not understand this... > If you have a defined transofrm, there is no problems in partial rewrites. It is not difficult in the case above (i.e. conntrack has seen PORT command in full length, only nat is seeing retransmission). It's difficult in the case where we have a PORT command (or similar) split over two packets all the time. Not talking about retransmissions. W'd have something like tcp packet one: "PORT 192,16" packet two: "8,1,1,12,34\r" (or even more packets). How would we reliably match against the command? Where do you draw the line? what is about each character in one packet? To cover this (other) case, we would need to remember a certain amount of payload of every ftp connection. to be precise: strlen(MAX_PORTCMD_LEN-1) bytes. We would need to keep this like a ringbuffer. We've been discussing this on the netfilter developer workshop - but I was the only one arguing for this approach. Normally I'm arguing from your point of view and the others have to defend ;) I rarely see the 'partial port command' issue on all the systems under my adminstration - and we haven't received more than a hand full of people reporting the respective printk() during all the time this code exists. And please don't think that any previous linux version did better than we currently do. > > your kernel. transparent proxies are better if you want to be perfect in > > this. > > No ack. If it were a real fault of approach, it would be true. > But as soon as it is explained only by lazyness of author... no ack. well, don't blame me - I'm not the author ;). I think we have to find a practical trade-off between complexity and staying simple. It's always a question of how much effort do you want to spend on those things. You definitely _can_ implement the conntrack / nat / payload modification / sack handling / sequence number alteration in a way which covers all cases, and handles everything correctly. But then, nat is always ugly and un-perfect. And how much additional code complexity and performance penalty do you want to have for covering 0.00001% cases? > It is simply unpleasant. When seeing report of Cisco director blocking > some valid data, we refer to Cisco. But when our own code does the same > shit, it is _double_ shame. I know you are a perfectionist. I consider myself as a perfectionist as well, and I think the current netfilter conntrack/nat code tries quite hard to get it right in almost all cases. > Alexey -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Mon Jan 28 13:22:02 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0SLM2I06182 for netdev-outgoing; Mon, 28 Jan 2002 13:22:02 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0SLLtP06166 for ; Mon, 28 Jan 2002 13:21:55 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA02291; Mon, 28 Jan 2002 23:21:40 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200201282021.XAA02291@ms2.inr.ac.ru> Subject: Re: [PATCH] Make netfilter handle SACK in NAT'ed connections (was Re: Fw: oops/bug in tcp, SACK doesn't work?) To: laforge@gnumonks.org (Harald Welte) Date: Mon, 28 Jan 2002 23:21:40 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <20020128202052.W26676@sunbeam.de.gnumonks.org> from "Harald Welte" at Jan 28, 2 08:20:52 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2083 Lines: 57 Hello! > but why would we have a partial retransmission in the first place? > under which circumstances would the receiver only ACK half the packet > it has received, causing a partial retransmission on the sender side? F.e. path mtu discovery. Well, this is really marginal effect, almost impossible in real life. Especially taking into account that PORT is short. > tcp packet one: "PORT 192,16" > packet two: "8,1,1,12,34\r" > > (or even more packets). How would we reliably match against the command? > Where do you draw the line? what is about each character in one packet? "PORT xxx" is removed from stream and replacement is inserted to this point. No problems with location. The problem is present, but it is different: before you received all the information, you cannot send subsequent packets and have to drop them. It is already flaw of approach, not a question of quality. [ Note: I really think the approach is capitally and badly wrong with no rights to survive. Ring buffer is already part of right approach. Probably, it is worth to say what I think about this. Right start point is two TCP connections connected back-to-back. All the scheme should be based on this. After this you start to optimize shortcutting some things, but not breaking semantics. With plain NAT nothing but TCP state monitoring remains, with rewriting you leave more, or even all in the most hard situations. ] > And please don't think that any previous linux version did better than > we currently do. I do not think this. It was surely worse. > But then, nat is always ugly and un-perfect. And how much additional > code complexity Well, I just found this funny comment occasionally and wanted to know what is this and how it was possible to classify some sender of some tcp as a "cracker". :-) > 0.00001% cases? Yes, maybe even much less. Multiply by amount of users and one day person of this 0.00001% will find this and bug you. Well, when something is easy it should be made. Counting of presents apply to the case when something is difficult. Alexey From owner-netdev@oss.sgi.com Mon Jan 28 15:30:15 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0SNUFi20702 for netdev-outgoing; Mon, 28 Jan 2002 15:30:15 -0800 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0SNU9P20687 for ; Mon, 28 Jan 2002 15:30:09 -0800 Received: (qmail 16459 invoked from network); 28 Jan 2002 22:30:00 -0000 Received: from pd9e4e59f.dip.t-dialin.net (HELO worker.muc.bieringer.de) (217.228.229.159) by mail.bieringer.de with SMTP; 28 Jan 2002 22:30:00 -0000 Date: Mon, 28 Jan 2002 23:29:58 +0100 From: Peter Bieringer To: Harald Welte , kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com Subject: Re: [PATCH] Make netfilter handle .. perhaps offtopic Message-ID: <18030000.1012256998@localhost> In-Reply-To: <20020128202052.W26676@sunbeam.de.gnumonks.org> References: <20020128191923.V26676@sunbeam.de.gnumonks.org> <200201281902.WAA01789@ms2.inr.ac.ru> <20020128202052.W26676@sunbeam.de.gnumonks.org> X-Mailer: Mulberry/2.1.2 (Linux/x86) X-Echelon: GRU NSA GCHQ CIA Pentagon nuclear war terror anthrax X-URL: http://www.bieringer.de/pb/ X-OS: Linux MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1793 Lines: 45 --On Monday, January 28, 2002 08:20:52 PM +0100 Harald Welte wrote: > It's difficult in the case where we have a PORT command (or > similar) split over two packets all the time. Not talking about > retransmissions. W'd have something like > > tcp packet one: "PORT 192,16" > packet two: "8,1,1,12,34\r" > > (or even more packets). How would we reliably match against the > command? Where do you draw the line? what is about each character > in one packet? > > To cover this (other) case, we would need to remember a certain > amount of payload of every ftp connection. to be precise: > strlen(MAX_PORTCMD_LEN-1) bytes. We would need to keep this like > a ringbuffer. Don't know if it's still the case, but some time ago (~ 2 years) reading Cisco doc of PIX firewall and their Javascript filter, they mentioned also that this isn't (wasn't) detected if the Javascript tag in HTML was splitted between two TCP packets. Therefore I think *all* "looking for interesting text in TCP streams" (FTP "PORT", Javascript tag, or something else which is interesting or important) should take care about that this string can be splitted between 2 packets. Otherwise the probability of "not hit because of splitted" will be not zero. And this is imho a security issue. Think about e.g. (don't know, if ever possible, but) a special modified web server, which checks MTU and split candidates for filtering to do unwanted things... mho: netfilter is (or should/will be hopefully) a stateful inspection engine comparable to (or better: superseed) the current market leader of commercial firewalls...therefore splitting of text between TCP packets should always be catched and no issue for perhaps later possibilities of upcoming security issues. Comments? Peter From owner-netdev@oss.sgi.com Tue Jan 29 05:31:43 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0TDVha09159 for netdev-outgoing; Tue, 29 Jan 2002 05:31:43 -0800 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0TDVcP09154 for ; Tue, 29 Jan 2002 05:31:38 -0800 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.33 #1) id 16VXQD-0004GO-00 for netdev@oss.sgi.com; Tue, 29 Jan 2002 13:31:33 +0100 Received: from laforge by sunbeam.gnumonks.org with local (Exim 3.34 #1) id 16VXFA-0007xT-00; Tue, 29 Jan 2002 13:20:08 +0100 Date: Tue, 29 Jan 2002 13:20:08 +0100 From: Harald Welte To: Peter Bieringer Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [PATCH] Make netfilter handle .. perhaps offtopic Message-ID: <20020129132008.C26676@sunbeam.de.gnumonks.org> References: <20020128191923.V26676@sunbeam.de.gnumonks.org> <200201281902.WAA01789@ms2.inr.ac.ru> <20020128202052.W26676@sunbeam.de.gnumonks.org> <18030000.1012256998@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <18030000.1012256998@localhost>; from pb@bieringer.de on Mon, Jan 28, 2002 at 11:29:58PM +0100 X-Operating-System: Linux sunbeam.de.gnumonks.org 2.4.17 X-Date: Today is Pungenday, the 28th day of Chaos in the YOLD 3168 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1453 Lines: 33 On Mon, Jan 28, 2002 at 11:29:58PM +0100, Peter Bieringer wrote: > Therefore I think *all* "looking for interesting text in TCP streams" > (FTP "PORT", Javascript tag, or something else which is interesting > or important) should take care about that this string can be splitted > between 2 packets. Otherwise the probability of "not hit because of > splitted" will be not zero. yes, it should. But is it worth the extra effort?? > And this is imho a security issue. Think about e.g. (don't know, if > ever possible, but) a special modified web server, which checks MTU > and split candidates for filtering to do unwanted things... In this case we are talking about NAT. it's not connection tracking. > mho: netfilter is (or should/will be hopefully) a stateful inspection > engine comparable to (or better: superseed) the current market leader > of commercial firewalls...therefore splitting of text between TCP > packets should always be catched and no issue for perhaps later > possibilities of upcoming security issues. I agree that in a perfect world we would cover those cases, yes. > Comments? > Peter -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Tue Jan 29 06:49:24 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0TEnOG11006 for netdev-outgoing; Tue, 29 Jan 2002 06:49:24 -0800 Received: from mx2.elte.hu (mx2.elte.hu [157.181.151.9]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0TEnKP10996 for ; Tue, 29 Jan 2002 06:49:23 -0800 Received: from chiara.elte.hu (chiara.elte.hu [157.181.150.200]) by mx2.elte.hu (Postfix) with ESMTP id 8ABCF49F53 for ; Tue, 29 Jan 2002 14:49:00 +0100 (CET) Received: by chiara.elte.hu (Postfix, from userid 17806) id 5A1EC202A; Tue, 29 Jan 2002 14:47:30 +0100 (CET) To: netdev@oss.sgi.com Message-Id: <20020129134730.5A1EC202A@chiara.elte.hu> Date: Tue, 29 Jan 2002 14:47:30 +0100 (CET) From: mingo@chiara.elte.hu (MOLNAR Ingo) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1 Lines: 1 From owner-netdev@oss.sgi.com Tue Jan 29 20:32:07 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g0U4W7P11327 for netdev-outgoing; Tue, 29 Jan 2002 20:32:07 -0800 Received: from courage.cs.stevens-tech.edu (courage.cs.stevens-tech.edu [155.246.89.70]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g0U4Vwd11320 for ; Tue, 29 Jan 2002 20:32:03 -0800 Received: by courage.cs.stevens-tech.edu (Postfix, from userid 4041) id 340F59007E; Tue, 29 Jan 2002 15:15:53 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by courage.cs.stevens-tech.edu (Postfix) with ESMTP id 203569007C for ; Tue, 29 Jan 2002 15:15:53 -0500 (EST) Date: Tue, 29 Jan 2002 15:15:53 -0500 (EST) From: Marek Zawadzki To: Subject: implementing options field in the packet's header Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 926 Lines: 23 Hello, 2.4.17 kernel. I am having troubles understanding how to implement options (of possibly variable length) in my transport protocol (similar to TCP-options). For the fixed fields I just extend the structure describing my packet's header and it works fine. However, where should I place the parsing required for a variable-length list of options? I know I'll need a function similar to net/ipv4/tcp_input.c : tcp_parse_options. I believe I'll have to do the parsing of 'skb->data' in my receiving function, but how do I (if at all) decribe those options in include/linux/skbuff.h : skbuff structure? I mean, tcp, for instance, doesn't have any options defined in struct tcphdr, but the options _are_ the part of a packet's header... So which part of the code actually divides the options (which apparently are not defined in the structure describing the header) from the user's data? I'll appreciate any help. -marek