From owner-netdev@oss.sgi.com Wed Aug 1 08:08:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71F87B05018 for netdev-outgoing; Wed, 1 Aug 2001 08:08:07 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71F85V05015 for ; Wed, 1 Aug 2001 08:08:05 -0700 Received: by colin.muc.de id <140576-3>; Wed, 1 Aug 2001 17:07:45 +0200 Message-ID: <20010801170742.51183@colin.muc.de> Date: Wed, 1 Aug 2001 17:07:42 +0200 From: Andi Kleen To: kuznet@ms2.inr.ac.ru Cc: Pekka Savola , therapy@endorphin.org, netdev@oss.sgi.com, Dave Miller , Andi Kleen Subject: Re: missing icmp errors for udp packets References: <200107311917.XAA10862@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <200107311917.XAA10862@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Tue, Jul 31, 2001 at 09:17:55PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk [Sorry for the late answer; I'm still travelling] On Tue, Jul 31, 2001 at 09:17:55PM +0200, kuznet@ms2.inr.ac.ru wrote: > Hello! > > > What I meant (to say) is, for people who _want_ to limit pings too, > > CBQ can do this in any way, which is possible to imagine. > > In any case, I need to get some verdict from Andi and Dave to move > in either way. > > [ For Dave and Andi: should I resume the problem? ] I think just turning the ratelimit sysctls into boolean that turn on/off if the ICMP is checked against a single ratelimit per dst_entry would be fine. I thought about splitting it into informational and error, but do not see a real benefit in it, it would also make configuration for the user harder (he would need to find out to what type an ICMP belongs). And as you note for more complicated setups there is always CBQ. -Andi From owner-netdev@oss.sgi.com Wed Aug 1 08:29:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71FTmK05516 for netdev-outgoing; Wed, 1 Aug 2001 08:29:48 -0700 Received: from blueyonder.co.uk (pcow028o.blueyonder.co.uk [195.188.53.124]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71FTlV05513 for ; Wed, 1 Aug 2001 08:29:47 -0700 Received: from mail pickup service by blueyonder.co.uk with Microsoft SMTPSVC; Wed, 1 Aug 2001 16:26:24 +0100 Received: from ddmi.he.net ([216.218.177.2]) by blueyonder.co.uk with Microsoft SMTPSVC(5.5.1877.687.68); Tue, 31 Jul 2001 20:10:16 +0100 Received: from vger.kernel.org (vger.kernel.org [199.183.24.194]) by ddmi.he.net (8.8.6/8.8.2) with ESMTP id MAA25329 for ; Tue, 31 Jul 2001 12:10:09 -0700 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Tue, 31 Jul 2001 15:04:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Tue, 31 Jul 2001 15:04:43 -0400 Received: from minus.inr.ac.ru ([193.233.7.97]:48401 "HELO ms2.inr.ac.ru") by vger.kernel.org with SMTP id ; Tue, 31 Jul 2001 15:04:35 -0400 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA10312; Tue, 31 Jul 2001 23:04:06 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311904.XAA10312@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: therapy@endorphin.org (clemens) Date: Tue, 31 Jul 2001 23:04:06 +0400 (MSK DST) Cc: pekkas@netcore.fi, therapy@endorphin.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com In-Reply-To: <20010731205101.B8211@ghanima.endorphin.org> from "clemens" at Jul 31, 1 08:51:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 X-Mailing-List: linux-kernel@vger.kernel.org Original-Recipient: rfc822;linux-kernel-outgoing Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > your patch will not prevent the first ping to empty the token bucket, > because burst is still 0, which is larger than dst->rate_token, and since > XRLIM_BURST_FACTOR times the timeout (which is 6*0=0 in that case) is the > token maximum, it will be truncated to 0, > causing the following packets (if in time) to be dropped. Argh... I see, gap is too short and not enough of tokens are accumulated. Thank you. Damn, I see two ways: 1. to make sysctl active function and recalculate max/sum of rates over classes and fill bucket. Or to remove limiting distinguishing types, which is ideal logically. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Wed Aug 1 08:44:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71Fi4v05921 for netdev-outgoing; Wed, 1 Aug 2001 08:44:04 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71FhjV05839; Wed, 1 Aug 2001 08:43:45 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f71Fhda25113; Wed, 1 Aug 2001 08:43:39 -0700 Message-ID: <3B6823AB.4D931195@candelatech.com> Date: Wed, 01 Aug 2001 08:43:39 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7 i686) X-Accept-Language: en MIME-Version: 1.0 To: Ralf Baechle CC: kuznet@ms2.inr.ac.ru, Jacob Avraham , netdev@oss.sgi.com Subject: Re: conflicting alignment requirements References: <200107311712.VAA04463@ms2.inr.ac.ru> <20010801043638.A17397@bacchus.dhis.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Ralf Baechle wrote: > > On Tue, Jul 31, 2001 at 09:12:22PM +0400, kuznet@ms2.inr.ac.ru wrote: > > > > copy the packet to a fresh skb (rx_copybreak = 0), the packet will > > > traverse the net layer with unalinged IP header. > > > > Doing this for an arch which traps wrong alignment, you can expect > > everything (except for crash, which could be bug). > > Afaik all such architectures have exception handlers to complete the access > transparently in software. Such an access is very slow so where more > frequent unaligned accesses are expected there are get_unaligned() and > put_unaligned(). I was recently asked to remove the get/put_unaligned code from my VLAN patch, which I did. However, I don't want to now pay a performance penalty on Sparc, or whatever... So, what are the drawbacks of using get/put_unaligned? If it's a Macro, it could be defined to do very little extra work on architectures that can handle un-aligned access, which might fix the common case, and yet still be faster than catching the trap on other hardware architectures?? Thanks, Ben > > Ralf -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Wed Aug 1 09:02:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71G20706541 for netdev-outgoing; Wed, 1 Aug 2001 09:02:00 -0700 Received: from blueyonder.co.uk (pcow028o.blueyonder.co.uk [195.188.53.124]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71G1wV06538 for ; Wed, 1 Aug 2001 09:01:59 -0700 Received: from mail pickup service by blueyonder.co.uk with Microsoft SMTPSVC; Wed, 1 Aug 2001 16:41:21 +0100 Received: from ddmi.he.net ([216.218.177.2]) by blueyonder.co.uk with Microsoft SMTPSVC(5.5.1877.687.68); Tue, 31 Jul 2001 20:25:13 +0100 Received: from vger.kernel.org (vger.kernel.org [199.183.24.194]) by ddmi.he.net (8.8.6/8.8.2) with ESMTP id MAA29147 for ; Tue, 31 Jul 2001 12:24:58 -0700 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Tue, 31 Jul 2001 15:23:22 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Tue, 31 Jul 2001 15:23:12 -0400 Received: from weta.f00f.org ([203.167.249.89]:44422 "HELO weta.f00f.org") by vger.kernel.org with SMTP id ; Tue, 31 Jul 2001 15:23:06 -0400 Received: by weta.f00f.org (Postfix, from userid 1000) id 3B24815876; Wed, 1 Aug 2001 07:23:47 +1200 (NZST) Date: Wed, 1 Aug 2001 07:23:47 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: clemens , pekkas@netcore.fi, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801072347.C8228@weta.f00f.org> References: <200107311904.XAA10312@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311904.XAA10312@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes X-Mailing-List: linux-kernel@vger.kernel.org Original-Recipient: rfc822;linux-kernel-outgoing Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, Jul 31, 2001 at 11:04:06PM +0400, kuznet@ms2.inr.ac.ru wrote: Or to remove limiting distinguishing types, which is ideal logically. Why do we do this anyhow? --cw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Wed Aug 1 09:26:12 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71GQC706998 for netdev-outgoing; Wed, 1 Aug 2001 09:26:12 -0700 Received: from blueyonder.co.uk (pcow028o.blueyonder.co.uk [195.188.53.124]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71GQBV06995 for ; Wed, 1 Aug 2001 09:26:11 -0700 Received: from mail pickup service by blueyonder.co.uk with Microsoft SMTPSVC; Wed, 1 Aug 2001 16:54:55 +0100 Received: from ddmi.he.net ([216.218.177.2]) by blueyonder.co.uk with Microsoft SMTPSVC(5.5.1877.687.68); Tue, 31 Jul 2001 20:38:54 +0100 Received: from vger.kernel.org (vger.kernel.org [199.183.24.194]) by ddmi.he.net (8.8.6/8.8.2) with ESMTP id MAA00460 for ; Tue, 31 Jul 2001 12:38:48 -0700 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Tue, 31 Jul 2001 15:34:22 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Tue, 31 Jul 2001 15:34:14 -0400 Received: from weta.f00f.org ([203.167.249.89]:46726 "HELO weta.f00f.org") by vger.kernel.org with SMTP id ; Tue, 31 Jul 2001 15:34:00 -0400 Received: by weta.f00f.org (Postfix, from userid 1000) id AEAE215882; Wed, 1 Aug 2001 07:34:41 +1200 (NZST) Date: Wed, 1 Aug 2001 07:34:41 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: therapy@endorphin.org, pekkas@netcore.fi, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801073441.E8228@weta.f00f.org> References: <200107311925.XAA11038@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311925.XAA11038@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes X-Mailing-List: linux-kernel@vger.kernel.org Original-Recipient: rfc822;linux-kernel-outgoing Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, Jul 31, 2001 at 11:25:50PM +0400, kuznet@ms2.inr.ac.ru wrote: Anyway, it is clear that echos are to be limited differently of errors. Even then I wonder if it is worth the code. If you are rate-limiting, who cares if drop the odd echo/reply? ICMP echo/reply is a useful diagnostic tool --- but on the internet as we have it today, its limitations need to be understood by the user :) --cw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Wed Aug 1 09:39:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71Gds707431 for netdev-outgoing; Wed, 1 Aug 2001 09:39:54 -0700 Received: from blueyonder.co.uk (pcow028o.blueyonder.co.uk [195.188.53.124]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71GdqV07427 for ; Wed, 1 Aug 2001 09:39:53 -0700 Received: from mail pickup service by blueyonder.co.uk with Microsoft SMTPSVC; Wed, 1 Aug 2001 17:00:14 +0100 Received: from ddmi.he.net ([216.218.177.2]) by blueyonder.co.uk with Microsoft SMTPSVC(5.5.1877.687.68); Tue, 31 Jul 2001 20:44:15 +0100 Received: from vger.kernel.org (vger.kernel.org [199.183.24.194]) by ddmi.he.net (8.8.6/8.8.2) with ESMTP id MAA01877 for ; Tue, 31 Jul 2001 12:44:09 -0700 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Tue, 31 Jul 2001 15:41:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Tue, 31 Jul 2001 15:41:02 -0400 Received: from weta.f00f.org ([203.167.249.89]:49286 "HELO weta.f00f.org") by vger.kernel.org with SMTP id ; Tue, 31 Jul 2001 15:40:56 -0400 Received: by weta.f00f.org (Postfix, from userid 1000) id 7CB1E1588C; Wed, 1 Aug 2001 07:41:32 +1200 (NZST) Date: Wed, 1 Aug 2001 07:41:32 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: therapy@endorphin.org, pekkas@netcore.fi, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801074132.G8228@weta.f00f.org> References: <200107311937.XAA11313@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311937.XAA11313@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes X-Mailing-List: linux-kernel@vger.kernel.org Original-Recipient: rfc822;linux-kernel-outgoing Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, Jul 31, 2001 at 11:37:06PM +0400, kuznet@ms2.inr.ac.ru wrote: To bind all of them together? Sure... why not? The kernel normally does one of two things --- multiplex hardware resources for applications or --- cheap router thing "really good ping responder" is a pointless purpose. Then kernel must be shipped out without rate-limiting enabled by default, that's problem. I guess I missed something. That doesn't seem like a problem to me... and if you need to ship with a rate by default, then ship with a very-high rate. I've never managed to respond to more than 60,000 ICMP packets/second, so I suggest 60,001. --cw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Wed Aug 1 09:53:20 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71GrKu07727 for netdev-outgoing; Wed, 1 Aug 2001 09:53:20 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71Gr0V07719 for ; Wed, 1 Aug 2001 09:53:00 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id UAA11455; Wed, 1 Aug 2001 20:52:52 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id CAA00382; Wed, 1 Aug 2001 02:14:26 +0400 Message-Id: <200107312214.CAA00382@mops.inr.ac.ru> Subject: Re: final words on udp/ICMP dest unreach issue [+PATCH] To: therapy@endorphin.ORG (clemens) Date: Wed, 1 Aug 2001 02:14:26 -2000 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: <20010730141359.A450@ghanima.endorphin.org> from "clemens" at Jul 30, 1 04:45:02 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > a patch is attached. ... > alan, please take care of that. No need to take care of, the patch is wrong... timeout=0 has nothing special comparing to other values. Think about anything less 100. So, check for timeout != HZ would be more correct. :-) Alexey From owner-netdev@oss.sgi.com Wed Aug 1 09:53:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71GrLc07732 for netdev-outgoing; Wed, 1 Aug 2001 09:53:21 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71GrEV07722 for ; Wed, 1 Aug 2001 09:53:14 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id UAA11484; Wed, 1 Aug 2001 20:53:06 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id CAA00330; Wed, 1 Aug 2001 02:08:54 +0400 Message-Id: <200107312208.CAA00330@mops.inr.ac.ru> Subject: Re: IPv6 fragmentation and IPv6 header parsing To: kakadu@earthlink.NET (Brad Chapman) Date: Wed, 1 Aug 2001 02:08:54 -2000 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: <3B64B076.6090709@earthlink.net> from "Brad Chapman" at Jul 30, 1 06:15:01 am From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > I am currently completing a port of the Netfilter connection > tracking subsystem from IPv4 to IPv6. Most of the features in this > port are complete, except for fragment handling, This is the last thing to complete transition from IPv6 back to IPv4 wickedness. :-) > I would appreciate any feedback at all regarding this. Feedback follows: make this and do not show to anyone, especially to your mother. :-) If you have some problem, which is not solvable without defragmenation in the middle, go to ipng wg to discuss how to make this. Particularly, NAT rewriting ports for IPv6 is full non-sense. Alexey From owner-netdev@oss.sgi.com Wed Aug 1 10:00:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71H0Gc08003 for netdev-outgoing; Wed, 1 Aug 2001 10:00:16 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71H0DV08000 for ; Wed, 1 Aug 2001 10:00:14 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA12166; Wed, 1 Aug 2001 20:59:59 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108011659.UAA12166@ms2.inr.ac.ru> Subject: Re: Fw: oops/bug in tcp, SACK doesn't work? To: laforge@gnumonks.org (Harald Welte) Date: Wed, 1 Aug 2001 20:59:59 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20010731033801.M1486@obroa-skai.gnumonks.org> from "Harald Welte" at Jul 31, 1 03:38:01 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > The issue is, that we only keep track of the last time a tcp sequence number > was rewritten. Yes, that means that current netfilter NAT code does not > cope correctly with all cases where you have more than one packet size > alteration per window. Wow! But it is fatal bug. Just do not allow to change it more then once (not for window, you have no reliable way to estimate it, probably for 64K< So I'm not sure if enabling selective acknowledgements could make the > situation worse than it is (given this precondition). At least after > giving it some though, I cannot see how. The situation is opposite, actually. If you mangle seq/ack wrongly, it is fatal. But if you make a mistake in sack, nothing happens, sacks are soft. Alexey From owner-netdev@oss.sgi.com Wed Aug 1 10:05:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71H5dT08309 for netdev-outgoing; Wed, 1 Aug 2001 10:05:39 -0700 Received: from blueyonder.co.uk (pcow028o.blueyonder.co.uk [195.188.53.124]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71H5aV08306 for ; Wed, 1 Aug 2001 10:05:36 -0700 Received: from mail pickup service by blueyonder.co.uk with Microsoft SMTPSVC; Wed, 1 Aug 2001 17:21:07 +0100 Received: from ddmi.he.net ([216.218.177.2]) by blueyonder.co.uk with Microsoft SMTPSVC(5.5.1877.687.68); Tue, 31 Jul 2001 21:06:10 +0100 Received: from vger.kernel.org (vger.kernel.org [199.183.24.194]) by ddmi.he.net (8.8.6/8.8.2) with ESMTP id NAA07700 for ; Tue, 31 Jul 2001 13:06:04 -0700 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Tue, 31 Jul 2001 16:00:16 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Tue, 31 Jul 2001 16:00:06 -0400 Received: from netcore.fi ([193.94.160.1]:16900 "EHLO netcore.fi") by vger.kernel.org with ESMTP id ; Tue, 31 Jul 2001 16:00:00 -0400 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6VJxdk21032; Tue, 31 Jul 2001 22:59:39 +0300 Date: Tue, 31 Jul 2001 22:59:39 +0300 (EEST) From: Pekka Savola To: Chris Wedgwood cc: , , , , Subject: Re: missing icmp errors for udp packets In-Reply-To: <20010801074132.G8228@weta.f00f.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Mailing-List: linux-kernel@vger.kernel.org Original-Recipient: rfc822;linux-kernel-outgoing Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, 1 Aug 2001, Chris Wedgwood wrote: > --- cheap router thing > > "really good ping responder" is a pointless purpose. bad ping responder == bad PR ;-) And anyway, who is anyone to judge what the system should be used for? I want a system to respond to ping without limitations; it's good for debugging, diagnostics, etc. If I want, I can just filter the requests out, or rate-limit the responses. However, ICMP error messages cannot be effectively filtered; they may happen due to TTL=0 when forwarding, legit or illegit UDP connection etc.; only way to effectively limit them is by rate-limiting. If rate-limiting with informational and error types are the same, we have an inflexible situation here. > Then kernel must be shipped out without rate-limiting enabled by > default, that's problem. > > I guess I missed something. That doesn't seem like a problem to > me... and if you need to ship with a rate by default, then ship with a > very-high rate. I've never managed to respond to more than 60,000 > ICMP packets/second, so I suggest 60,001. Yes you did. 60,000 responses/sec is effectively no protection at all, and most people would appeaciate protection for the error messages, which are crucial to the working of TCP/IP; not so with informational ICMP messages. And by the way, rate-limiting ICMP error messages is a MUST item for IPv6. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Wed Aug 1 11:15:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71IFUX09540 for netdev-outgoing; Wed, 1 Aug 2001 11:15:30 -0700 Received: from harrier.mail.pas.earthlink.net (harrier.mail.pas.earthlink.net [207.217.121.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71IFSV09537 for ; Wed, 1 Aug 2001 11:15:29 -0700 Received: from earthlink.net (dialup-63.208.186.190.Dial1.Baltimore1.Level3.net [63.208.186.190]) by harrier.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id LAA11478; Wed, 1 Aug 2001 11:15:02 -0700 (PDT) Message-ID: <3B6838AD.6010402@earthlink.net> Date: Wed, 01 Aug 2001 13:13:17 -0400 From: Brad Chapman User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.7 i586; en-US; C-UPD: MaxLinux0301) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: Alexey Kuznetsov CC: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing References: <200107312208.CAA00330@mops.inr.ac.ru> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Mr. Kuznetsov, Alexey Kuznetsov wrote: > Hello! > >> I am currently completing a port of the Netfilter connection >> tracking subsystem from IPv4 to IPv6. Most of the features in this >> port are complete, except for fragment handling, > > > This is the last thing to complete transition from IPv6 back > to IPv4 wickedness. :-) Eeek! Sorry ;-) I have already been properly chastised about on-the-fly fragmenting and have been discussing ideas with Harald Welte that will probably appear in some form in 2.5. > > > >> I would appreciate any feedback at all regarding this. > > > Feedback follows: make this and do not show to anyone, especially > to your mother. :-) Well, my mother is not particularly interested in netfilter hacking, so no worry there. > > > If you have some problem, which is not solvable without defragmenation > in the middle, go to ipng wg to discuss how to make this. I was merely attempting to follow the 1:1 idea of portation I had set out for myself. If you're not familiar with the ip6_conntrack code, here is a quick answer on the question of why it would need on-the-fly fragmenting: 1.) to make it's life easier when tracking layer-3/4 headers and messing with packet data (in NAT, but that's not important anymore) and 2.) in case the idiot on the other end won't allow an MTU of 1500 ;-) > > Particularly, NAT rewriting ports for IPv6 is full non-sense. Well, I suppose now that IPv6 has about 36 bugazillion adresses, it's not a major sticking point anymore ;-) Mostly I'm doing this so that people can match packet states (NEW, ESTABLISHED, RELATED, INVALID) and maybe, later on, direction (ORIGINAL, REPLY), if anyone expresses a desire to have it. BTW: where is the nearest place where I can find the real number of addresses IPv6 supports? > > > Alexey > Brad > From owner-netdev@oss.sgi.com Wed Aug 1 11:20:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71IKw009719 for netdev-outgoing; Wed, 1 Aug 2001 11:20:58 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71IKtV09716 for ; Wed, 1 Aug 2001 11:20:55 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA17102; Wed, 1 Aug 2001 22:20:42 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108011820.WAA17102@ms2.inr.ac.ru> Subject: Re: IPv6 fragmentation and IPv6 header parsing To: kakadu@earthlink.net (Brad Chapman) Date: Wed, 1 Aug 2001 22:20:42 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <3B6838AD.6010402@earthlink.net> from "Brad Chapman" at Aug 1, 1 01:13:17 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > answer on the question of why it would need on-the-fly fragmenting: 1.) > to make > it's life easier when tracking layer-3/4 headers and messing with packet > data (in > NAT, but that's not important anymore) You confirmed yourself that it is meaningless purpose. > and 2.) in case the idiot on the > other > end won't allow an MTU of 1500 ;-) This just sounds as full non-sense to me. What is special with 1500? Alexey From owner-netdev@oss.sgi.com Wed Aug 1 11:32:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71IWLO09927 for netdev-outgoing; Wed, 1 Aug 2001 11:32:21 -0700 Received: from falcon.mail.pas.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71IWKV09924 for ; Wed, 1 Aug 2001 11:32:20 -0700 Received: from earthlink.net (dialup-63.208.186.190.Dial1.Baltimore1.Level3.net [63.208.186.190]) by falcon.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id LAA04599; Wed, 1 Aug 2001 11:32:13 -0700 (PDT) Message-ID: <3B683CB2.1000508@earthlink.net> Date: Wed, 01 Aug 2001 13:30:26 -0400 From: Brad Chapman User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.7 i586; en-US; C-UPD: MaxLinux0301) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing References: <200108011820.WAA17102@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Mr. Kuznetsov, kuznet@ms2.inr.ac.ru wrote: > Hello! > >> answer on the question of why it would need on-the-fly fragmenting: 1.) >> to make >> it's life easier when tracking layer-3/4 headers and messing with packet >> data (in >> NAT, but that's not important anymore) > > > You confirmed yourself that it is meaningless purpose. Sorry. Well, enough on NAT. > > >> and 2.) in case the idiot on the >> other >> end won't allow an MTU of 1500 ;-) > >> > > This just sounds as full non-sense to me. What is special with 1500? I read somewhere that the correct size for an IPv6 link was 1500. Is this wrong? Is the correct MTU smaller? If it is, then sorry in advance ;-) > > > Alexey > Brad > From owner-netdev@oss.sgi.com Wed Aug 1 11:48:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71ImoX10147 for netdev-outgoing; Wed, 1 Aug 2001 11:48:50 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71ImlV10144 for ; Wed, 1 Aug 2001 11:48:48 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA18243; Wed, 1 Aug 2001 22:48:34 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108011848.WAA18243@ms2.inr.ac.ru> Subject: Re: IPv6 fragmentation and IPv6 header parsing To: kakadu@earthlink.net (Brad Chapman) Date: Wed, 1 Aug 2001 22:48:34 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <3B683CB2.1000508@earthlink.net> from "Brad Chapman" at Aug 1, 1 01:30:26 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > I read somewhere that the correct size for an IPv6 link was 1500. > Is this wrong? Is the correct MTU smaller? If it is, then sorry in > advance ;-) Yes, it is 1280. And lower MTUs are simply prohibited for IPv6 networks by law, it is difference of IPv4, where network can have any mtu. IPv6 is designed specially to avoid such things. If you have some idea, when this can be useful, right starting point is not to hack something contradicting to the ideology, but to workaround this at protocol level. Side note: connection tracking is serious offence even for IPv4. I am puzzled, why the code is so primitive and forces defragmentation even when it is possible just to save fragments and resent them. This prevents f.e. usage of conntrack on routers, which need only to account. In fact, the most rarely happening case is considered as the most common one... It would be good if IPv6 did this right from the very beginning, rather than repeated mistakes of conntrack in IP. Alexey From owner-netdev@oss.sgi.com Wed Aug 1 12:13:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71JDsM10416 for netdev-outgoing; Wed, 1 Aug 2001 12:13:54 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71JDCV10413; Wed, 1 Aug 2001 12:13:12 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA17347; Wed, 1 Aug 2001 22:24:37 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108011824.WAA17347@ms2.inr.ac.ru> Subject: Re: conflicting alignment requirements To: greearb@candelatech.com (Ben Greear) Date: Wed, 1 Aug 2001 22:24:37 +0400 (MSK DST) Cc: ralf@oss.sgi.com, jacoba@cisco.com, netdev@oss.sgi.com In-Reply-To: <3B6823AB.4D931195@candelatech.com> from "Ben Greear" at Aug 1, 1 08:43:39 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > I was recently asked to remove the get/put_unaligned code from my > VLAN patch, which I did. However, I don't want to now pay a > performance penalty on Sparc, or whatever... I am sorry, but your get/put_unligned were 16 bit, which is full non-sense. :-) Anyway, even if you got 32 bit values, nobody suffers, including even those arms which corrupt data on unaligned accesses. 802.1q is aligned to blue book and it has the same alignnment as IP. > So, what are the drawbacks of using get/put_unaligned? It is prohibited for anything but IPX etc, which really need to get unaligned values. Alexey From owner-netdev@oss.sgi.com Wed Aug 1 16:18:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f71NIqg19121 for netdev-outgoing; Wed, 1 Aug 2001 16:18:52 -0700 Received: from harrier.mail.pas.earthlink.net (harrier.mail.pas.earthlink.net [207.217.121.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f71NIpV19116 for ; Wed, 1 Aug 2001 16:18:51 -0700 Received: from earthlink.net (dialup-63.208.222.6.Dial1.Baltimore1.Level3.net [63.208.222.6]) by harrier.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id QAA29649; Wed, 1 Aug 2001 16:18:43 -0700 (PDT) Message-ID: <3B687FD8.1030102@earthlink.net> Date: Wed, 01 Aug 2001 18:16:56 -0400 From: Brad Chapman User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.7 i586; en-US; C-UPD: MaxLinux0301) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing References: <200108011848.WAA18243@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Mr. Kuznetsov, kuznet@ms2.inr.ac.ru wrote: > Hello! > >> I read somewhere that the correct size for an IPv6 link was 1500. >> Is this wrong? Is the correct MTU smaller? If it is, then sorry in >> advance ;-) > > > Yes, it is 1280. And lower MTUs are simply prohibited for IPv6 networks by law, > it is difference of IPv4, where network can have any mtu. Sorry again :-( I remember reading somewhere a while ago that an IPv6 packet was sized around 1500, but I also remember reading online that it was 1280....Guess I was wrong. Sorry :-( > > > IPv6 is designed specially to avoid such things. If you have some > idea, when this can be useful, right starting point is not to hack something > contradicting to the ideology, but to workaround this at protocol level. > > > Side note: connection tracking is serious offence even for IPv4. > I am puzzled, why the code is so primitive and forces defragmentation > even when it is possible just to save fragments and resent them. > This prevents f.e. usage of conntrack on routers, which need only to account. > In fact, the most rarely happening case is considered as the most > common one... It would be good if IPv6 did this right from the very beginning, > rather than repeated mistakes of conntrack in IP. Since I didn't write the original conntrack code, I'm not sure of what Rusty Russell thought when he wrote it. But, IMHO he made ip_conntrack do fragmentation- on-the-fly because it would then be easier to track the guts of the packet, and do NAT. Now that NAT, as we have both said, is not necessary anymore for IPv6; we may not even need redirection and port forwarding either. Anyway, like I said to Harald, anything regarding a system in ip6_conntrack where we save fragments and resend them and/or block them, will have to wait until 2.5. For now, just make your MTU a proper size and ip6_conntrack should work. BTW: It will probably be another month or so before ip6_conntrack is stable. > > > Alexey > > Brad From owner-netdev@oss.sgi.com Wed Aug 1 17:11:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f720Bg822336 for netdev-outgoing; Wed, 1 Aug 2001 17:11:42 -0700 Received: from dea.waldorf-gmbh.de (u-183-10.karlsruhe.ipdial.viaginterkom.de [62.180.10.183]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f720BbV22333 for ; Wed, 1 Aug 2001 17:11:37 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f720AsU22222; Thu, 2 Aug 2001 02:10:54 +0200 Date: Thu, 2 Aug 2001 02:10:54 +0200 From: Ralf Baechle To: Ben Greear Cc: kuznet@ms2.inr.ac.ru, Jacob Avraham , netdev@oss.sgi.com Subject: Re: conflicting alignment requirements Message-ID: <20010802021054.A19983@bacchus.dhis.org> References: <200107311712.VAA04463@ms2.inr.ac.ru> <20010801043638.A17397@bacchus.dhis.org> <3B6823AB.4D931195@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3B6823AB.4D931195@candelatech.com>; from greearb@candelatech.com on Wed, Aug 01, 2001 at 08:43:39AM -0700 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, Aug 01, 2001 at 08:43:39AM -0700, Ben Greear wrote: > > > > copy the packet to a fresh skb (rx_copybreak = 0), the packet will > > > > traverse the net layer with unalinged IP header. > > > > > > Doing this for an arch which traps wrong alignment, you can expect > > > everything (except for crash, which could be bug). > > > > Afaik all such architectures have exception handlers to complete the access > > transparently in software. Such an access is very slow so where more > > frequent unaligned accesses are expected there are get_unaligned() and > > put_unaligned(). > > I was recently asked to remove the get/put_unaligned code from my > VLAN patch, which I did. However, I don't want to now pay a > performance penalty on Sparc, or whatever... > > So, what are the drawbacks of using get/put_unaligned? If it's a > Macro, it could be defined to do very little extra work on architectures > that can handle un-aligned access, which might fix the common case, and > yet still be faster than catching the trap on other hardware architectures?? For machines that handle unaligned access in hardware {get,put}_unaligned are exactly identical to a normal reference to the same memory location in C. As such there is never any drawback. For architecture which need software asistance for unaligned accesses these macros expand in a different instruction sequence, for example on MIPS it will always be a two instruction sequence and on Alpha it's yet more complex, there two load and a mask, shift and or sequence will be generated which has even more overhead. The alternative to this *_unaligned overhead is taking one exception per unaligned access which can be rather painful. So the choice between either mechanisms is a performance tradeoff and as always the choice was to optimize the common case at cost of the rare case. Ralf From owner-netdev@oss.sgi.com Wed Aug 1 22:44:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f725iml11011 for netdev-outgoing; Wed, 1 Aug 2001 22:44:48 -0700 Received: from out-mx1.crosswinds.net (out-mx1.crosswinds.net [209.208.163.38]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f725ilV11006 for ; Wed, 1 Aug 2001 22:44:47 -0700 Received: from member-mx1.crosswinds.net (member-mx1.crosswinds.net [209.208.163.43]) by out-mx1.crosswinds.net (Postfix) with ESMTP id 19B3B5D043; Thu, 2 Aug 2001 01:43:16 -0400 (EDT) Received: from zombie (unknown [202.164.97.77]) by member-mx1.crosswinds.net (Postfix) with SMTP id 53CEB4CB99; Thu, 2 Aug 2001 01:43:07 -0400 (EDT) Message-ID: <003801c11b15$4fb35320$4d61a4ca@zombie> From: "Imran Patel" To: "Brad Chapman" , "Alexey Kuznetsov" Cc: References: <200107312208.CAA00330@mops.inr.ac.ru> Subject: Re: IPv6 fragmentation and IPv6 header parsing Date: Thu, 2 Aug 2001 11:07:58 +0530 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2014.211 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2014.211 Sender: owner-netdev@oss.sgi.com Precedence: bulk > > I am currently completing a port of the Netfilter connection > > tracking subsystem from IPv4 to IPv6. Most of the features in this > > port are complete, except for fragment handling, > > This is the last thing to complete transition from IPv6 back > to IPv4 wickedness. :-) On the contrary, it might be useful for transition from IPv4 to IPv6 ;-) IPv6 connection tracking is useful for NAT-PT. However, other options on top of IPv6 conntrack like masquerading, v6-v6 NAT, etc look useless and silly. imran From owner-netdev@oss.sgi.com Wed Aug 1 22:45:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f725j9G11074 for netdev-outgoing; Wed, 1 Aug 2001 22:45:09 -0700 Received: from out-mx1.crosswinds.net (out-mx1.crosswinds.net [209.208.163.38]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f725j8V11067 for ; Wed, 1 Aug 2001 22:45:09 -0700 Received: from member-mx1.crosswinds.net (member-mx1.crosswinds.net [209.208.163.43]) by out-mx1.crosswinds.net (Postfix) with ESMTP id 7EB2D5D79B; Thu, 2 Aug 2001 01:44:23 -0400 (EDT) Received: from zombie (unknown [202.164.97.77]) by member-mx1.crosswinds.net (Postfix) with SMTP id 90FF74CABB; Thu, 2 Aug 2001 01:44:06 -0400 (EDT) Message-ID: <004701c11b15$77c5cf00$4d61a4ca@zombie> From: "Imran Patel" To: "Brad Chapman" , "Alexey Kuznetsov" Cc: References: <200107312208.CAA00330@mops.inr.ac.ru> Subject: Re: IPv6 fragmentation and IPv6 header parsing Date: Thu, 2 Aug 2001 11:08:50 +0530 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2014.211 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2014.211 Sender: owner-netdev@oss.sgi.com Precedence: bulk > > I am currently completing a port of the Netfilter connection > > tracking subsystem from IPv4 to IPv6. Most of the features in this > > port are complete, except for fragment handling, > > This is the last thing to complete transition from IPv6 back > to IPv4 wickedness. :-) On the contrary, it might be useful for transition from IPv4 to IPv6 ;-) IPv6 connection tracking is useful for NAT-PT. However, other options on top of IPv6 conntrack like masquerading, v6-v6 NAT, etc look useless and silly. imran From owner-netdev@oss.sgi.com Wed Aug 1 23:00:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7260w812347 for netdev-outgoing; Wed, 1 Aug 2001 23:00:58 -0700 Received: from localhost (CPE-61-9-150-51.vic.bigpond.net.au [61.9.150.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7260uV12341 for ; Wed, 1 Aug 2001 23:00:56 -0700 Received: from localhost ([127.0.0.1] helo=rustcorp.com.au) by localhost with esmtp (Exim 3.31 #1 (Debian)) id 15SBXD-0002uf-00; Thu, 02 Aug 2001 16:00:39 +1000 From: Rusty Russell To: Alexey Kuznetsov Cc: davem@redhat.com (Dave Miller), netfilter-devel@lists.samba.org, netdev@oss.sgi.com, Marc Boucher Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-reply-to: Your message of "Wed, 01 Aug 2001 02:26:38 -2000." <200107312226.CAA00407@mops.inr.ac.ru> Date: Thu, 02 Aug 2001 16:00:24 +1000 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk In message <200107312226.CAA00407@mops.inr.ac.ru> you write: > Very interesting... Paul, did he really not miss the place where it was > really checked? The bug is elsewhere. Looks like Marc forgot that we don't do NAT if the conntrack code hasn't already set skb->nfct. 8) If the packet gets here, it is an ICMP packet which the connection tracking code has marked RELATED. The only way ICMP packets get marked RELATED is in icmp_error_track. To do this, it has to get past ip_conntrack_core.c:357-359: datalen = skb->len - iph->ihl*4 - sizeof(*hdr); if (skb->len < iph->ihl * 4 + sizeof(struct icmphdr)) { DEBUGP("icmp_error_track: too short\n"); return NULL; } In my first audit, I noticed that this should strictly be ... + sizeof(struct iphdr), but we only read from it. Anyway, so datalen is always >= 0. And this is the killer: line 385 (it's redundant: we check this inside get_tuple anyway): /* Are they talking about one of our connections? */ if (inner->ihl * 4 + 8 > datalen || !get_tuple(inner, datalen, &origtuple, innerproto)) { So, we will always have 8 protocol bytes in the inner packet. This is enough to contain the source and destinations ports (TCP/UDP) or ICMP id, so we're not writing over the end of the packet... Now, if Marc is seeing this, and noone can find a flaw in my logic, then I'd say that someone else is mangling a packet without doing (see ipip.c): nf_conntrack_put(skb->nfct); skb->nfct = NULL; Please find them and hit them hard... Hope that helps, Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Thu Aug 2 03:55:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72Atvf30162 for netdev-outgoing; Thu, 2 Aug 2001 03:55:57 -0700 Received: from do-smtp.nortel-dasa.de (do-smtp.nortel-dasa.de [193.141.241.40]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72AtsV30156 for ; Thu, 2 Aug 2001 03:55:54 -0700 Received: from satcomnt11.ndsatcom.com (satcomcom [131.147.44.70]) by do-smtp.nortel-dasa.de (8.9.3+Sun/8.9.3) with ESMTP id MAA03133; Thu, 2 Aug 2001 12:53:54 +0200 (MET DST) From: BERND.STURM@NDSatcom.com Received: by satcomnt11.ndsatcom.com with Internet Mail Service (5.5.2653.19) id <31Z1PHN1>; Thu, 2 Aug 2001 12:52:23 +0200 Message-ID: To: ak@muc.de Cc: netdev@oss.sgi.com Subject: Linux 2.4 network performance oddities Date: Thu, 2 Aug 2001 12:52:15 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f72AttV30157 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, iīm currently doing my diploma thesis at Nortel about TCP/IP over Satellite in Linux. In the last few weeks we did some performance tests of both Linux 2.2.16 and 2.4.1 (and 2.4.7) over a satellite simulator channel (NiSTNet) with bandwidth set to 2 MBit and delay set to 600 ms. Stacks had maximum window-sizes set to over 200 KB and SACK and RFC1323 enabled and applications set window sizes to the same values with setsockopt. All worked well in Linux 2.2.16, with throughputs up to 1,95 Mbit for long transfers and also not so bad throughputs with 10-7 bit error rates simulated by Nistnet. But when we tested with 2.4.1 and 2.4.7 with netperf and ftp we came accross some strange issues: also everything worked well as long as we only had one transfer at a time and no Bit Error rate set. But when we set a BER and later set BER back to 0 the transfers wouldnīt run at the same transfer rate as before without BER but at only the speeds with BER set. But this is no satellite simulator issue (we also tried FreeBSD with dummynet and same happened). We did packet captures with tcpdump and saw that the sender wouldnīt open up its tcp send window wide enough and every once in a while stall with its sending. This effect didnīt occur with 2.2.16 but all the kernel settings in /proc/sys/net were the same ! Another strange effect with 2.4.x (which also didnīt occur with 2.2.16) happened, when we started more than one netperf-session at a time by a script simultaneously: when we started a single netperf-session afterwards it wouldnīt achieve the throughput, it achieved before the "multisession"-netperf-transfer, but only a transfer rate comparable to one of the connections from the multisession-netperf-transfer ! This also didnīt happen with 2.2.16 ! All of the above effects were reproducible and vanished after a restart of the network with init 1 and afterwards init 3. Distribution used was SuSE 7.0. It seems, as strange as this may sound, that the 2.4.x-kernel-series has some kind of congestion window memory, that still remembers the congestion window of a former tcp-transfer, at least it acts this way. But this would in no way be conforming to the current tcp-standards ! I hope you can help me with this problem or at least direct me to someone else who could help me. I already searched for maybe a sysctl to change in /proc/sys/net which could change this behaviour, but none of them seems appopriate. By the way, whatīs this no_cong, lo_cong, mod_cong-stuff in /proc/sys/net/core ?? And something else: we use a SMC Etherpower II in one of our computers and the driver for this card seems to be broken in 2.4.7 (it locks the system as soon as you try a ping or the card is pinged from another machine) (in the newsgroup for epic100-driver on www.sycld.com some other people already reported this problem and said that the driver was still ok until 2.4.2, so i also used the epic100-driver from 2.4.1 together with 2.4.7-kernel and also tried a driver directly from SMC and both worked) Thank you very much in advance ! Yours sincerely, Bernd Sturm ND Satcom phone: 0049-7545 / 96-8847 mailto:Bernd.Sturm@NDSatcom.com From owner-netdev@oss.sgi.com Thu Aug 2 04:01:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72B1uw31230 for netdev-outgoing; Thu, 2 Aug 2001 04:01:56 -0700 Received: from shell.cyberus.ca (shell.cyberus.ca [209.195.95.7]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72B1tV31227 for ; Thu, 2 Aug 2001 04:01:55 -0700 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id GAA18743; Thu, 2 Aug 2001 06:58:45 -0400 (EDT) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Thu, 2 Aug 2001 06:58:45 -0400 (EDT) From: jamal To: Alexey Kuznetsov cc: Rusty Russell , Subject: Re: Linux 2.4 networking/routing slowdown In-Reply-To: <200107292204.CAA00325@mops.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Sorry, I missed this ... Routing does not slow down when you dont compile in netfilter. Upto 20% degradation if you turn it on with a single IP table rule with 2.4.7 cheers, jamal PS:- I believe this is being worked on, so the above is just a FYI. On Mon, 30 Jul 2001, Alexey Kuznetsov wrote: > Hello! > > > Yes, you're paying for full connection tracking with the compatibility > > stuff. If you just want filtering, switch to iptables (should be > > pretty easy for you). > > Paul, but he said "several seconds"! This has nothing to do with > performance and surely cannot be a payment for using an obsolete > interface... It is some loss or something sort of this. > > > > Hmmm... this I don't know. > > Here too. :-) > > Alexey > From owner-netdev@oss.sgi.com Thu Aug 2 04:07:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72B78t32288 for netdev-outgoing; Thu, 2 Aug 2001 04:07:08 -0700 Received: from opium.mbsi.ca (opium.mbsi.ca [198.168.101.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72B75V32278 for ; Thu, 2 Aug 2001 04:07:05 -0700 Received: (from marc@localhost) by opium.mbsi.ca (8.11.3/8.11.3) id f72B4jM11983; Thu, 2 Aug 2001 07:04:45 -0400 (EDT) Date: Thu, 2 Aug 2001 07:04:45 -0400 From: Marc Boucher To: Rusty Russell Cc: Alexey Kuznetsov , Dave Miller , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) Message-ID: <20010802070445.A11923@opium.mbsi.ca> References: <200107312226.CAA00407@mops.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.19i Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Aug 02, 2001 at 04:00:24PM +1000, Rusty Russell wrote: > In message <200107312226.CAA00407@mops.inr.ac.ru> you write: > > Very interesting... Paul, did he really not miss the place where it was > > really checked? > > The bug is elsewhere. Nope. The bug is in netfilter. > Looks like Marc forgot that we don't do NAT if > the conntrack code hasn't already set skb->nfct. 8) Perhaps; some of the checks added by my patch may be superfluous. I wanted to be on the safe side while hunting down the bug.. > If the packet gets here, it is an ICMP packet which the connection > tracking code has marked RELATED. > > The only way ICMP packets get marked RELATED is in icmp_error_track. > To do this, it has to get past ip_conntrack_core.c:357-359: > > datalen = skb->len - iph->ihl*4 - sizeof(*hdr); > > if (skb->len < iph->ihl * 4 + sizeof(struct icmphdr)) { > DEBUGP("icmp_error_track: too short\n"); > return NULL; > } > > In my first audit, I noticed that this should strictly be ... + > sizeof(struct iphdr), but we only read from it. > > Anyway, so datalen is always >= 0. > > And this is the killer: line 385 (it's redundant: we check this inside > get_tuple anyway): > > /* Are they talking about one of our connections? */ > if (inner->ihl * 4 + 8 > datalen > || !get_tuple(inner, datalen, &origtuple, innerproto)) { > > So, we will always have 8 protocol bytes in the inner packet. This is > enough to contain the source and destinations ports (TCP/UDP) or ICMP > id, so we're not writing over the end of the packet... Wrong, ipv4/netfilter/ip_nat_proto_tcp.c:tcp_manip_pkt() *is* writing over the end of the packet when setting tcp->check in 8 byte inner packets without checking length. Several users have reported on the netfilter mailing list that the crashes have disappeared after the patch was applied. > Now, if Marc is seeing this, and noone can find a flaw in my logic, > then I'd say that someone else is mangling a packet without doing (see > ipip.c): > > nf_conntrack_put(skb->nfct); > skb->nfct = NULL; > > Please find them and hit them hard... I'll let you take care of that! But please, don't be too harsch on yourself. 8) Marc > > Hope that helps, > Rusty. > -- > Premature optmztion is rt of all evl. --DK > From owner-netdev@oss.sgi.com Thu Aug 2 04:58:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72BwOS07931 for netdev-outgoing; Thu, 2 Aug 2001 04:58:24 -0700 Received: from avocet.mail.pas.earthlink.net (avocet.mail.pas.earthlink.net [207.217.121.50]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72BwMV07927 for ; Thu, 2 Aug 2001 04:58:22 -0700 Received: from earthlink.net (dialup-63.208.221.95.Dial1.Baltimore1.Level3.net [63.208.221.95]) by avocet.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id EAA02070; Thu, 2 Aug 2001 04:49:45 -0700 (PDT) Message-ID: <3B692FB6.4030402@earthlink.net> Date: Thu, 02 Aug 2001 06:47:18 -0400 From: Brad Chapman User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.7 i586; en-US; C-UPD: MaxLinux0301) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: Imran Patel CC: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing References: <200107312208.CAA00330@mops.inr.ac.ru> <004701c11b15$77c5cf00$4d61a4ca@zombie> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Mr. Patel, Well, like I told Mr. Alexey, I think NAT in its major form for IPv6 is dead. Right now, the only use ip6_conntrack would be to an IPv6 firewall implementation would be tracking packet states via -m state. However, maybe once the code is stable and people are starting to use it, I may ask Mr. Henrik if he wants to write a reduced NAT layer for IPv6 which only offers redirection-type NAT. Is this a good idea? Or is ip6_conntrack really not going to see any use except for packet state tracking? Brad P.S. BTW do you want a patch copy or a source copy of my latest work on ip6_conntrack? Imran Patel wrote: >>> I am currently completing a port of the Netfilter connection >>> tracking subsystem from IPv4 to IPv6. Most of the features in this >>> port are complete, except for fragment handling, >> >> This is the last thing to complete transition from IPv6 back >> to IPv4 wickedness. :-) > > > On the contrary, it might be useful for transition from IPv4 to IPv6 ;-) > IPv6 connection tracking is useful for NAT-PT. However, other options on top > of IPv6 conntrack like masquerading, v6-v6 NAT, etc look useless and silly. > > imran > > > > > > > > > From owner-netdev@oss.sgi.com Thu Aug 2 05:23:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72CNAE11583 for netdev-outgoing; Thu, 2 Aug 2001 05:23:10 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72CN8V11579 for ; Thu, 2 Aug 2001 05:23:09 -0700 Received: from cogenit.fr (se1.cogenit.fr [195.68.53.173]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id FAA04636 for ; Thu, 2 Aug 2001 05:22:25 -0700 (PDT) mail_from (romieu@cogenit.fr) Received: (from romieu@localhost) by cogenit.fr (8.12.0.Beta7/8.12.0.Beta7) id f72CLUgp013050; Thu, 2 Aug 2001 14:21:30 +0200 Date: Thu, 2 Aug 2001 14:21:30 +0200 From: Francois Romieu To: BERND.STURM@NDSatcom.com Cc: netdev@oss.sgi.com Subject: etherpowerII - episode 57328 Message-ID: <20010802142130.A12709@se1.cogenit.fr> References: Mime-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from BERND.STURM@NDSatcom.com on Thu, Aug 02, 2001 at 12:52:15PM +0200 X-Organisation: Marie's fan club - I X-MIME-Autoconverted: from 8bit to quoted-printable by sgi.com id FAA04636 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f72CN9V11580 Sender: owner-netdev@oss.sgi.com Precedence: bulk BERND.STURM@NDSatcom.com écrit : [...] > And something else: we use a SMC Etherpower II in one of our computers and > the driver for this card seems to be broken in 2.4.7 (it locks the system as > soon as you try a ping or the card is pinged from another machine) (in the > newsgroup for epic100-driver on www.sycld.com some other people already > reported this problem and said that the driver was still ok until 2.4.2, so > i also used the epic100-driver from 2.4.1 together with 2.4.7-kernel and > also tried a driver directly from SMC and both worked) The hard lockup without any message is something new :o( Could you: - specify the URL of the driver from SMC you used - specify your hardware/mobo - lspci -x/dmesg/lsmod - insmod epic100 debug=5, ping again -- Ueimor From owner-netdev@oss.sgi.com Thu Aug 2 06:24:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72DOHl20987 for netdev-outgoing; Thu, 2 Aug 2001 06:24:17 -0700 Received: from sabre-wulf.nvg.ntnu.no (IDENT:root@sabre-wulf.nvg.ntnu.no [129.241.210.67]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72DOFV20983 for ; Thu, 2 Aug 2001 06:24:15 -0700 Received: from tyrell.nvg.ntnu.no ([IPv6:::ffff:129.241.210.70]:9484 "EHLO tyrell.nvg.ntnu.no" ident: "root" whoson: "-unregistered-") by sabre-wulf.nvg.ntnu.no with ESMTP id ; Thu, 2 Aug 2001 15:22:09 +0200 Received: (from venaas@localhost) by tyrell.nvg.ntnu.no (8.9.3/8.8.4) id PAA07332; Thu, 2 Aug 2001 15:22:08 +0200 Date: Thu, 2 Aug 2001 15:22:08 +0200 From: Stig Venaas To: Imran Patel Cc: Brad Chapman , Alexey Kuznetsov , netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing Message-ID: <20010802152208.A14571@nvg.ntnu.no> References: <200107312208.CAA00330@mops.inr.ac.ru> <004701c11b15$77c5cf00$4d61a4ca@zombie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <004701c11b15$77c5cf00$4d61a4ca@zombie>; from ipatel@crosswinds.net on Thu, Aug 02, 2001 at 11:08:50AM +0530 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Aug 02, 2001 at 11:08:50AM +0530, Imran Patel wrote: > > > I am currently completing a port of the Netfilter connection > > > tracking subsystem from IPv4 to IPv6. Most of the features in this > > > port are complete, except for fragment handling, > > > > This is the last thing to complete transition from IPv6 back > > to IPv4 wickedness. :-) > > On the contrary, it might be useful for transition from IPv4 to IPv6 ;-) > IPv6 connection tracking is useful for NAT-PT. However, other options on top > of IPv6 conntrack like masquerading, v6-v6 NAT, etc look useless and silly. I agree, only IPv6 related NAT worth thinking about is NAT-PT. But you should only need to check port numbers on the IPv4 side, on the IPv6 side you should only be interested in the IPv6 address, so no need to defragment IPv6. You may need to defragment in the other direction for two reasons I think. First of all to know the port number, secondly to stay above the minimum IPv6 MTU. Stig From owner-netdev@oss.sgi.com Thu Aug 2 08:45:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72Fjhv07264 for netdev-outgoing; Thu, 2 Aug 2001 08:45:43 -0700 Received: from lox.sandelman.ottawa.on.ca (lox.sandelman.ottawa.on.ca [209.151.24.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72FjdV07246 for ; Thu, 2 Aug 2001 08:45:41 -0700 Received: from nox.sandelman.ottawa.on.ca (nox.sandelman.ottawa.on.ca [209.151.24.6]) by lox.sandelman.ottawa.on.ca (8.8.7/8.8.8) with ESMTP id LAA00721 for ; Thu, 2 Aug 2001 11:43:40 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca ([2001:410:402:2:204:76ff:fe2d:8c]) by nox.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f72FiCF10323 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK) for ; Thu, 2 Aug 2001 11:44:13 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca (localhost [[UNIX: localhost]]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f72FY0w28611 for ; Thu, 2 Aug 2001 11:34:00 -0400 (EDT) Message-Id: <200108021534.f72FY0w28611@marajade.sandelman.ottawa.on.ca> To: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing In-reply-to: Your message of "Thu, 02 Aug 2001 11:07:58 +0530." <003801c11b15$4fb35320$4d61a4ca@zombie> Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Thu, 02 Aug 2001 11:33:59 -0400 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk >>>>> "Imran" == Imran Patel writes: Imran> IPv6 ;-) IPv6 connection tracking is useful for NAT-PT. However, Imran> other options on top of IPv6 conntrack like masquerading, v6-v6 Imran> NAT, etc look useless and silly. connection tracking is useful for: - stateful packet inspection - IPsec - queuing/scheduling decisions The ability to make a decision for a microflow once and then remember it efficiently for that microflow is very useful. NAT/NAPT is just one situation where it is required. One hopes that IPv6 NAT will never be needed. ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[ ] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ From owner-netdev@oss.sgi.com Thu Aug 2 09:01:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72G1sP09724 for netdev-outgoing; Thu, 2 Aug 2001 09:01:54 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72G1qV09721 for ; Thu, 2 Aug 2001 09:01:52 -0700 Received: by colin.muc.de id <140555-2>; Thu, 2 Aug 2001 17:59:58 +0200 Message-ID: <20010802175956.34244@colin.muc.de> Date: Thu, 2 Aug 2001 17:59:56 +0200 From: Andi Kleen To: BERND.STURM@NDSatcom.com Cc: ak@muc.de, netdev@oss.sgi.com Subject: Re: Linux 2.4 network performance oddities References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 0.88e In-Reply-To: ; from BERND.STURM@NDSatcom.com on Thu, Aug 02, 2001 at 12:52:15PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Aug 02, 2001 at 12:52:15PM +0200, BERND.STURM@NDSatcom.com wrote: > We did packet captures with tcpdump and saw that the sender wouldnīt open up > its tcp send window wide enough and every once in a while stall with its > sending. > This effect didnīt occur with 2.2.16 but all the kernel settings in > /proc/sys/net were the same ! 2.4 has an additional global TCP memory limit; see the tcp_mem entry in /usr/src/linux/Documentation/networking/ip-sysctl.txt for details. It also has more sysctls to tune the TCP buffer management, like tcp_app_win or tcp_adv_win_scale. > It seems, as strange as this may sound, that the 2.4.x-kernel-series has > some kind of congestion window memory, that still remembers the congestion > window of a former tcp-transfer, at least it acts this way. But this would > in no way be conforming to the current tcp-standards ! 2.4 saves cwnds and some other information in the destination cache, similar to many other stacks (e.g. Solaris). You can get rid of this information by flushing the routing cache (echo 1 > /proc/sys/net/ipv4/route/flush) > By the way, whatīs this no_cong, lo_cong, mod_cong-stuff in > /proc/sys/net/core ?? They define how the stack drops packets on overload. -Andi From owner-netdev@oss.sgi.com Thu Aug 2 10:13:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72HDGL11651 for netdev-outgoing; Thu, 2 Aug 2001 10:13:16 -0700 Received: from dea.waldorf-gmbh.de (u-86-19.karlsruhe.ipdial.viaginterkom.de [62.180.19.86]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72HDCV11644 for ; Thu, 2 Aug 2001 10:13:13 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f72BcdK24622; Thu, 2 Aug 2001 13:38:39 +0200 Date: Thu, 2 Aug 2001 13:38:39 +0200 From: Ralf Baechle To: Imran Patel Cc: Brad Chapman , Alexey Kuznetsov , netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing Message-ID: <20010802133839.C24305@bacchus.dhis.org> References: <200107312208.CAA00330@mops.inr.ac.ru> <003801c11b15$4fb35320$4d61a4ca@zombie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <003801c11b15$4fb35320$4d61a4ca@zombie>; from ipatel@crosswinds.net on Thu, Aug 02, 2001 at 11:07:58AM +0530 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Aug 02, 2001 at 11:07:58AM +0530, Imran Patel wrote: > On the contrary, it might be useful for transition from IPv4 to IPv6 ;-) > IPv6 connection tracking is useful for NAT-PT. However, other options on top > of IPv6 conntrack like masquerading, v6-v6 NAT, etc look useless and silly. You forget real world. I bet ISPs will continue to only give a single IPv6 address to their dialup customers, so masquerading will stay ... Ralf From owner-netdev@oss.sgi.com Thu Aug 2 10:26:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72HQ1c12196 for netdev-outgoing; Thu, 2 Aug 2001 10:26:01 -0700 Received: from lox.sandelman.ottawa.on.ca (lox.sandelman.ottawa.on.ca [209.151.24.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72HPtV12193; Thu, 2 Aug 2001 10:25:55 -0700 Received: from nox.sandelman.ottawa.on.ca (nox.sandelman.ottawa.on.ca [209.151.24.6]) by lox.sandelman.ottawa.on.ca (8.8.7/8.8.8) with ESMTP id NAA02414; Thu, 2 Aug 2001 13:23:14 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca ([2001:410:402:2:204:76ff:fe2d:8c]) by nox.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f72HNoF10415 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK); Thu, 2 Aug 2001 13:23:51 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca (localhost [[UNIX: localhost]]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f72HErD02185; Thu, 2 Aug 2001 13:14:53 -0400 (EDT) Message-Id: <200108021714.f72HErD02185@marajade.sandelman.ottawa.on.ca> To: Ralf Baechle cc: Brad Chapman , Alexey Kuznetsov , netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing In-reply-to: Your message of "Thu, 02 Aug 2001 13:38:39 +0200." <20010802133839.C24305@bacchus.dhis.org> Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Thu, 02 Aug 2001 13:14:53 -0400 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk >>>>> "Ralf" == Ralf Baechle writes: Ralf> On Thu, Aug 02, 2001 at 11:07:58AM +0530, Imran Patel wrote: >> On the contrary, it might be useful for transition from IPv4 to IPv6 >> ;-) IPv6 connection tracking is useful for NAT-PT. However, other >> options on top of IPv6 conntrack like masquerading, v6-v6 NAT, etc >> look useless and silly. Ralf> You forget real world. I bet ISPs will continue to only give a You forget 6to4. One IPv4 address gets you 2^80 IPv6 addresses. ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[ ] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ From owner-netdev@oss.sgi.com Thu Aug 2 10:36:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72Hapo12626 for netdev-outgoing; Thu, 2 Aug 2001 10:36:51 -0700 Received: from zmailer.org (mail.zmailer.org [194.252.70.162]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72HaXV12622; Thu, 2 Aug 2001 10:36:33 -0700 Received: (mea@zmailer.org) by mail.zmailer.org id ; Thu, 2 Aug 2001 20:34:29 +0300 Date: Thu, 2 Aug 2001 20:34:29 +0300 From: Matti Aarnio To: Ralf Baechle Cc: Imran Patel , Brad Chapman , Alexey Kuznetsov , netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing Message-ID: <20010802203429.S2650@mea-ext.zmailer.org> References: <200107312208.CAA00330@mops.inr.ac.ru> <003801c11b15$4fb35320$4d61a4ca@zombie> <20010802133839.C24305@bacchus.dhis.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010802133839.C24305@bacchus.dhis.org>; from ralf@oss.sgi.com on Thu, Aug 02, 2001 at 01:38:39PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Aug 02, 2001 at 01:38:39PM +0200, Ralf Baechle wrote: > On Thu, Aug 02, 2001 at 11:07:58AM +0530, Imran Patel wrote: > > On the contrary, it might be useful for transition from IPv4 to IPv6 ;-) > > IPv6 connection tracking is useful for NAT-PT. However, other options on top > > of IPv6 conntrack like masquerading, v6-v6 NAT, etc look useless and silly. > > You forget real world. I bet ISPs will continue to only give a single IPv6 > address to their dialup customers, so masquerading will stay ... But aren't the addressing specs saying that we must consider each and every router port (which I take dialup servers to be) as /64 ? Now to think of that... 64k lines of modem pools can thus be fitted into a /48, which is what addressing specs say (as I faintly recall without checking it) that ISPs should issue for each leased-line customer / corporation. Once upon a time I considered allocating something like /120 for each dialup user, but things aren't completely defined in the IETF for dialups of IPv6. At least Itojun has written a draft for it: draft-itojun-ipv6-dialup-requirement-01.txt ----------- 3.1. Address space It is desired to assign /48 address space, regardless from usage pattern or size of the downstream site. If it is apparent that the customers will have a single subnet behind them, /64 allocation may be desirable. It is to make future renumbering in downstream site easier on ISP change. /128 assignment MUST NOT be made, as it will promote IPv6-to- IPv6 NAT. The item is highly related to RIR address allocation recommendations. ----------- The mobile (3GPP) folks will have to consider things more like that /120 instead of /64, because they have so bloody many customers, but if they go to /128, that is pure stupidity. (Although, a /40 should be enough for each mobile operator in Finland for a long time to come if they issue out static /64 addresses. Issuing dynamic addresses saves the day, of course.) > Ralf /Matti Aarnio From owner-netdev@oss.sgi.com Thu Aug 2 10:44:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72Hi4Y13005 for netdev-outgoing; Thu, 2 Aug 2001 10:44:04 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72Hi0V13002; Thu, 2 Aug 2001 10:44:00 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f72Hfvm04550; Thu, 2 Aug 2001 20:41:58 +0300 Date: Thu, 2 Aug 2001 20:41:57 +0300 (EEST) From: Pekka Savola To: Ralf Baechle cc: Imran Patel , Brad Chapman , Alexey Kuznetsov , Subject: Re: IPv6 fragmentation and IPv6 header parsing In-Reply-To: <20010802133839.C24305@bacchus.dhis.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, 2 Aug 2001, Ralf Baechle wrote: > On Thu, Aug 02, 2001 at 11:07:58AM +0530, Imran Patel wrote: > > > On the contrary, it might be useful for transition from IPv4 to IPv6 ;-) > > IPv6 connection tracking is useful for NAT-PT. However, other options on top > > of IPv6 conntrack like masquerading, v6-v6 NAT, etc look useless and silly. > > You forget real world. I bet ISPs will continue to only give a single IPv6 > address to their dialup customers, so masquerading will stay ... This is feared by many. However, there is some very strict wording against this in IESG address allocation policy, which is being approved by RIR's. As there are no huge technical or address allocational reasons why ISP's could not give at least /64, those ISP's that do get more popular and ones dealing /128's do not, and disappear from IPv6 market. So I think this will sort itself out by "natural means" ... -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Thu Aug 2 12:13:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72JDvs25720 for netdev-outgoing; Thu, 2 Aug 2001 12:13:57 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72JDtV25717 for ; Thu, 2 Aug 2001 12:13:56 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA28101; Thu, 2 Aug 2001 23:13:25 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108021913.XAA28101@ms2.inr.ac.ru> Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) To: rusty@rustcorp.com.au (Rusty Russell) Date: Thu, 2 Aug 2001 23:13:25 +0400 (MSK DST) Cc: davem@redhat.com, netfilter-devel@lists.samba.org, netdev@oss.sgi.com, marc@mbsi.ca In-Reply-To: from "Rusty Russell" at Aug 2, 1 04:00:24 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > Now, if Marc is seeing this, and noone can find a flaw in my logic, > then I'd say that someone else is mangling a packet without doing (see > ipip.c): > > nf_conntrack_put(skb->nfct); > skb->nfct = NULL; > > Please find them and hit them hard... Good spot. I try to force people to copy this code from tunnels, but they prefer to copy from loopback. It is just shorter. :-) BTW what's about loopback? It does not! Is nfct fpassed through loopback used in some way? If it is not, it is better to add this there. Alexey From owner-netdev@oss.sgi.com Thu Aug 2 12:31:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72JVPZ28743 for netdev-outgoing; Thu, 2 Aug 2001 12:31:25 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72JVLV28730 for ; Thu, 2 Aug 2001 12:31:21 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f72JV0H05256; Thu, 2 Aug 2001 22:31:00 +0300 Date: Thu, 2 Aug 2001 22:31:00 +0300 (EEST) From: Pekka Savola To: cc: , , Dave Miller Subject: Re: missing icmp errors for udp packets In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Mon, 30 Jul 2001, Pekka Savola wrote: > On Sun, 29 Jul 2001 kuznet@ms2.inr.ac.ru wrote: > > > Hello! > > > > > So in conclusion: > > > > > > with net.ipv4.icmp_echoreply_rate=0: > > > > Congratulations! That's why I do not see this, forgot to ping before. :-) > > > > The patch is enclosed. > > Alexey, there is a tiny problem with your patch. > > If you reboot the computer, the _first_ ping/scan attempt will not return > icmp dest unreachable. All of the rest do. If the network was quiet > enough, I guess there might be some circumstances where this could be > applicable again.. As this happening is rather rare, would there be resistance for adding this as an intermediate fix, to be replaced later with a bigger overhaul if that is to be decided? For 99.9% of cases, this works rather well and the 0.1% is the same as before (== acceptable). Returning ICMP unreachables after being pinged is IMO rather important. > > --- ../dust/vger3-010728/linux/net/ipv4/icmp.c Thu Jun 14 22:49:44 2001 > > +++ linux/net/ipv4/icmp.c Sun Jul 29 19:52:55 2001 > > @@ -240,12 +240,15 @@ > > int xrlim_allow(struct dst_entry *dst, int timeout) > > { > > unsigned long now; > > + static int burst; > > > > now = jiffies; > > dst->rate_tokens += now - dst->rate_last; > > dst->rate_last = now; > > - if (dst->rate_tokens > XRLIM_BURST_FACTOR*timeout) > > - dst->rate_tokens = XRLIM_BURST_FACTOR*timeout; > > + if (burst < XRLIM_BURST_FACTOR*timeout) > > + burst = XRLIM_BURST_FACTOR*timeout; > > + if (dst->rate_tokens > burst) > > + dst->rate_tokens = burst; > > if (dst->rate_tokens >= timeout) { > > dst->rate_tokens -= timeout; > > return 1; > > > > -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Thu Aug 2 14:01:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f72L12313358 for netdev-outgoing; Thu, 2 Aug 2001 14:01:02 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f72L10V13338 for ; Thu, 2 Aug 2001 14:01:00 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15SPal-0005FB-00 for netdev@oss.sgi.com; Thu, 02 Aug 2001 23:01:15 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15S8j8-0000sm-00; Thu, 02 Aug 2001 00:00:46 -0300 Date: Thu, 2 Aug 2001 00:00:46 -0300 From: Harald Welte To: Rusty Russell Cc: Alexey Kuznetsov , davem@redhat.com (Dave Miller), netfilter-devel@lists.samba.org, netdev@oss.sgi.com, Marc Boucher Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) Message-ID: <20010802000046.R1612@obroa-skai.gnumonks.org> Mail-Followup-To: Harald Welte , Rusty Russell , Alexey Kuznetsov , davem@redhat.com (Dave Miller), netfilter-devel@lists.samba.org, netdev@oss.sgi.com, Marc Boucher References: <200107312226.CAA00407@mops.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: ; from rusty@rustcorp.com.au on Thu, Aug 02, 2001 at 04:00:24PM +1000 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7-nfpom X-Date: Today is Boomtime, the 66th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Aug 02, 2001 at 04:00:24PM +1000, Rusty Russell wrote: > And this is the killer: line 385 (it's redundant: we check this inside > get_tuple anyway): > > /* Are they talking about one of our connections? */ > if (inner->ihl * 4 + 8 > datalen > || !get_tuple(inner, datalen, &origtuple, innerproto)) { > > So, we will always have 8 protocol bytes in the inner packet. This is > enough to contain the source and destinations ports (TCP/UDP) or ICMP > id, so we're not writing over the end of the packet... Well, Rusty, I have to agree with Marc. Look at ip_nat_proto_tcp.c:tcp_mainp_pkt(). It just assumes that we have a tcp header with up to 18 bytes in length, as it overwrites the TCP header's checksum. > Please find them and hit them hard... well... next time I am in .au ;) > Rusty. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Thu Aug 2 20:46:31 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f733kVt08406 for netdev-outgoing; Thu, 2 Aug 2001 20:46:31 -0700 Received: from localhost ([144.137.80.71]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f733kSV08403 for ; Thu, 2 Aug 2001 20:46:29 -0700 Received: from localhost ([127.0.0.1] helo=rustcorp.com.au) by localhost with esmtp (Exim 3.31 #1 (Debian)) id 15SVuq-0004qP-00; Fri, 03 Aug 2001 13:46:24 +1000 From: Rusty Russell To: Marc Boucher Cc: Alexey Kuznetsov , Dave Miller , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-reply-to: Your message of "Thu, 02 Aug 2001 07:04:45 -0400." <20010802070445.A11923@opium.mbsi.ca> Date: Fri, 03 Aug 2001 13:46:21 +1000 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk In message <20010802070445.A11923@opium.mbsi.ca> you write: > > The bug is elsewhere. > > Nope. The bug is in netfilter. Yep... > > So, we will always have 8 protocol bytes in the inner packet. This is > > enough to contain the source and destinations ports (TCP/UDP) or ICMP > > id, so we're not writing over the end of the packet... > > Wrong, ipv4/netfilter/ip_nat_proto_tcp.c:tcp_manip_pkt() *is* writing > over the end of the packet when setting tcp->check in 8 byte inner packets > without checking length. Argh... there *is* a check in ip_conntrack_proto_tcp, but it checks against tcph->doff, not sizeof(struct tcphdr)! This patch strengthens the check, and for neatness, moves it to pkt_to_tuple where it belongs. It also adds my previously minor fix. If this solves the problem, I will move the iph sanity checks to the entry to the conntrack & NAT code, so we can set skb->h.raw and not recalculate it everywhere... Well done Marc! Rusty. -- Premature optmztion is rt of all evl. --DK diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.7-official/net/ipv4/netfilter/ip_conntrack_core.c working-2.4.7-marc/net/ipv4/netfilter/ip_conntrack_core.c --- linux-2.4.7-official/net/ipv4/netfilter/ip_conntrack_core.c Sat Apr 28 07:15:01 2001 +++ working-2.4.7-marc/net/ipv4/netfilter/ip_conntrack_core.c Fri Aug 3 13:29:48 2001 @@ -356,7 +356,7 @@ inner = (struct iphdr *)(hdr + 1); datalen = skb->len - iph->ihl*4 - sizeof(*hdr); - if (skb->len < iph->ihl * 4 + sizeof(struct icmphdr)) { + if (skb->len < iph->ihl * 4 + sizeof(*hdr) + sizeof(*iph)) { DEBUGP("icmp_error_track: too short\n"); return NULL; } diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.7-official/net/ipv4/netfilter/ip_conntrack_proto_tcp.c working-2.4.7-marc/net/ipv4/netfilter/ip_conntrack_proto_tcp.c --- linux-2.4.7-official/net/ipv4/netfilter/ip_conntrack_proto_tcp.c Sat Apr 28 07:15:01 2001 +++ working-2.4.7-marc/net/ipv4/netfilter/ip_conntrack_proto_tcp.c Fri Aug 3 13:28:50 2001 @@ -98,6 +98,10 @@ { const struct tcphdr *hdr = datah; + /* We know we have 8 bytes, but we need whole TCP header */ + if (datalen < sizeof(*hdr) || datalen < hdr->doff * 4) + return 0; + tuple->src.u.tcp.port = hdr->source; tuple->dst.u.tcp.port = hdr->dest; @@ -150,13 +154,6 @@ { enum tcp_conntrack newconntrack, oldtcpstate; struct tcphdr *tcph = (struct tcphdr *)((u_int32_t *)iph + iph->ihl); - - /* We're guaranteed to have the base header, but maybe not the - options. */ - if (len < (iph->ihl + tcph->doff) * 4) { - DEBUGP("ip_conntrack_tcp: Truncated packet.\n"); - return -1; - } WRITE_LOCK(&tcp_lock); oldtcpstate = conntrack->proto.tcp.state; From owner-netdev@oss.sgi.com Thu Aug 2 21:37:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f734bS015239 for netdev-outgoing; Thu, 2 Aug 2001 21:37:28 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f734bPV15230 for ; Thu, 2 Aug 2001 21:37:25 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15SWiR-0001VU-00 for netdev@oss.sgi.com; Fri, 03 Aug 2001 06:37:39 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15SFqS-00028m-00; Thu, 02 Aug 2001 07:36:48 -0300 Date: Thu, 2 Aug 2001 07:36:48 -0300 From: Harald Welte To: Rusty Russell Cc: Marc Boucher , Alexey Kuznetsov , Dave Miller , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) Message-ID: <20010802073648.G1612@obroa-skai.gnumonks.org> Mail-Followup-To: Harald Welte , Rusty Russell , Marc Boucher , Alexey Kuznetsov , Dave Miller , netfilter-devel@lists.samba.org, netdev@oss.sgi.com References: <20010802070445.A11923@opium.mbsi.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: ; from rusty@rustcorp.com.au on Fri, Aug 03, 2001 at 01:46:21PM +1000 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7-nfpom X-Date: Today is Boomtime, the 66th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Aug 03, 2001 at 01:46:21PM +1000, Rusty Russell wrote: > Argh... there *is* a check in ip_conntrack_proto_tcp, but it checks > against tcph->doff, not sizeof(struct tcphdr)! Sorry Rusty, but check on sizeof(struct tcphdr) is IMHO wrong, again. - scenario a Imagine the case, where we have the first 18 bytes of the tcp header, including the checksum, but not including the urgp. Result: Your check decides the packet is too small to recalculate the checksum, so the packet will get sent out with the old checksum :( - scenario b Imagine the case, where we only have the first couple of bytes of the inner packet, not including the checksum of the tcp header. Result: Your check decides the patcket is too small and the code will thus not change the port numbers inside the tcp header. The icmp packet gets forwarded, reaches the destination host. The destination host parses the tcp header, reads out the port number, and reads the wrong, un-nat'ed one. Please look at the attached patch, which is the new proposed fix, as in current netfilter CVS patch-o-matic (tcp_manip_pkt.patch). > Rusty. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_tcp.c 2001/07/31 15:37:45 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_tcp.c 2001/07/31 17:35:20 @@ -92,10 +104,17 @@ oldip = iph->daddr; portptr = &hdr->dest; } - hdr->check = ip_nat_cheat_check(~oldip, manip->ip, + + /* this could be a inner header returned in icmp packet; in such + cases we cannot update the checksum field since it is outside of + the 64 bits of transport layer headers typically included */ + if(((void *)&hdr->check + sizeof(hdr->check) - (void *)iph) <= len) { + hdr->check = ip_nat_cheat_check(~oldip, manip->ip, ip_nat_cheat_check(*portptr ^ 0xFFFF, manip->u.tcp.port, hdr->check)); + } + *portptr = manip->u.tcp.port; } From owner-netdev@oss.sgi.com Thu Aug 2 22:37:20 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f735bKL22520 for netdev-outgoing; Thu, 2 Aug 2001 22:37:20 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f735bJV22516 for ; Thu, 2 Aug 2001 22:37:19 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id WAA23222; Thu, 2 Aug 2001 22:36:46 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15210.14446.81297.26145@pizda.ninka.net> Date: Thu, 2 Aug 2001 22:36:46 -0700 (PDT) To: Harald Welte Cc: Rusty Russell , Marc Boucher , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-Reply-To: <20010802073648.G1612@obroa-skai.gnumonks.org> References: <20010802070445.A11923@opium.mbsi.ca> <20010802073648.G1612@obroa-skai.gnumonks.org> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Harald Welte writes: > Sorry Rusty, but check on sizeof(struct tcphdr) is IMHO wrong, again. I think there is no way you can validly drop an ICMP packet just because the TCP checksum field is not there in the embedded header. So I think I basically agree with Harald. Nobody verifies the checksum of the TCP header included in the ICMP anyways, and in fact most of the time you can't simply because you'd need all the data part there to do so. This code should just verify that the ports are there and fix them up, and do nothing more, for the TCP in ICMP packet case. And pretty much this is what Harald's patch does if I read it correctly. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Aug 3 00:19:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f737J8w29394 for netdev-outgoing; Fri, 3 Aug 2001 00:19:08 -0700 Received: from localhost ([144.137.81.83]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f737J5V29390 for ; Fri, 3 Aug 2001 00:19:06 -0700 Received: from localhost ([127.0.0.1] helo=rustcorp.com.au) by localhost with esmtp (Exim 3.31 #1 (Debian)) id 15SZEN-0005tT-00; Fri, 03 Aug 2001 17:18:47 +1000 From: Rusty Russell To: Harald Welte Cc: Marc Boucher , Alexey Kuznetsov , Dave Miller , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-reply-to: Your message of "Thu, 02 Aug 2001 07:36:48 -0300." <20010802073648.G1612@obroa-skai.gnumonks.org> Date: Fri, 03 Aug 2001 17:18:45 +1000 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk In message <20010802073648.G1612@obroa-skai.gnumonks.org> you write: > - scenario a > Imagine the case, where we have the first 18 bytes of the tcp header, ACK... Your patch is correct. Was still not thinking about ICMP packets, and I'm supposed to be working on work stuff at the moment. I think it's pretty clear to everyone that I don't have time or resources to maintain this stuff any more. Have appended my other minor fix. Dave, please apply... Rusty. -- Premature optmztion is rt of all evl. --DK diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.7-official/net/ipv4/netfilter/ip_conntrack_core.c working-2.4.7-marc/net/ipv4/netfilter/ip_conntrack_core.c --- linux-2.4.7-official/net/ipv4/netfilter/ip_conntrack_core.c Sat Apr 28 07:15:01 2001 +++ working-2.4.7-marc/net/ipv4/netfilter/ip_conntrack_core.c Fri Aug 3 13:29:48 2001 @@ -356,7 +356,7 @@ inner = (struct iphdr *)(hdr + 1); datalen = skb->len - iph->ihl*4 - sizeof(*hdr); - if (skb->len < iph->ihl * 4 + sizeof(struct icmphdr)) { + if (skb->len < iph->ihl * 4 + sizeof(*hdr) + sizeof(*iph)) { DEBUGP("icmp_error_track: too short\n"); return NULL; } --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_tcp.c 2001/07/31 15:37:45 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_tcp.c 2001/07/31 17:35:20 @@ -92,10 +104,17 @@ oldip = iph->daddr; portptr = &hdr->dest; } - hdr->check = ip_nat_cheat_check(~oldip, manip->ip, + + /* this could be a inner header returned in icmp packet; in such + cases we cannot update the checksum field since it is outside of + the 8 bytes of transport layer headers we are guaranteed */ + if(((void *)&hdr->check + sizeof(hdr->check) - (void *)iph) <= len) { + hdr->check = ip_nat_cheat_check(~oldip, manip->ip, ip_nat_cheat_check(*portptr ^ 0xFFFF, manip->u.tcp.port, hdr->check)); + } + *portptr = manip->u.tcp.port; } From owner-netdev@oss.sgi.com Fri Aug 3 01:56:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f738upx00475 for netdev-outgoing; Fri, 3 Aug 2001 01:56:51 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f738unV00472 for ; Fri, 3 Aug 2001 01:56:49 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id BAA01095; Fri, 3 Aug 2001 01:56:37 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15210.26437.501185.294773@pizda.ninka.net> Date: Fri, 3 Aug 2001 01:56:37 -0700 (PDT) To: Rusty Russell Cc: Harald Welte , Marc Boucher , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-Reply-To: References: <20010802073648.G1612@obroa-skai.gnumonks.org> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Rusty Russell writes: > I think it's pretty clear to everyone that I don't have time or > resources to maintain this stuff any more. Any takers? :-) > Have appended my other minor fix. Dave, please apply... Done. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Aug 3 01:59:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f738x1n00723 for netdev-outgoing; Fri, 3 Aug 2001 01:59:01 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f738x0V00717 for ; Fri, 3 Aug 2001 01:59:00 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id BAA01347; Fri, 3 Aug 2001 01:58:54 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15210.26574.374615.6434@pizda.ninka.net> Date: Fri, 3 Aug 2001 01:58:54 -0700 (PDT) To: Pekka Savola Cc: , , Subject: Re: missing icmp errors for udp packets In-Reply-To: References: X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Pekka Savola writes: > As this happening is rather rare, would there be resistance for adding > this as an intermediate fix, to be replaced later with a bigger overhaul > if that is to be decided? > > For 99.9% of cases, this works rather well and the 0.1% is the same as > before (== acceptable). Returning ICMP unreachables after being pinged is > IMO rather important. Please people, just make some decision and send me the final patch :-) Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Aug 3 04:42:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73BgIA08229 for netdev-outgoing; Fri, 3 Aug 2001 04:42:18 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73BgCV08226 for ; Fri, 3 Aug 2001 04:42:12 -0700 Received: (qmail 740 invoked by uid 1000); 3 Aug 2001 11:42:06 -0000 Date: Fri, 3 Aug 2001 13:42:06 +0200 From: clemens To: netdev@oss.sgi.com Subject: [PATCH] global icmp rate limiting Message-ID: <20010803134206.A653@ghanima.endorphin.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="ReaqsoxgOBHFXBhH" Content-Disposition: inline User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk --ReaqsoxgOBHFXBhH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline this patch introduces global icmp rate limiting (/proc/sys/net/ipv4/icmp_ratelimit) with the ability to arbitary rate limit or unlimit certain icmp types (/proc/sys/net/ipv4/icmp_ratemask, but you better have a look at icmp.c before changing this). please test. clemens --ReaqsoxgOBHFXBhH Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="icmp-global-rate2.patch" diff -ur linux-sane/include/linux/sysctl.h linux/include/linux/sysctl.h --- linux-sane/include/linux/sysctl.h Fri Jul 20 21:52:18 2001 +++ linux/include/linux/sysctl.h Fri Aug 3 13:22:56 2001 @@ -251,11 +251,18 @@ NET_IPV4_LOCAL_PORT_RANGE=56, NET_IPV4_ICMP_ECHO_IGNORE_ALL=57, NET_IPV4_ICMP_ECHO_IGNORE_BROADCASTS=58, - NET_IPV4_ICMP_SOURCEQUENCH_RATE=59, - NET_IPV4_ICMP_DESTUNREACH_RATE=60, - NET_IPV4_ICMP_TIMEEXCEED_RATE=61, - NET_IPV4_ICMP_PARAMPROB_RATE=62, - NET_IPV4_ICMP_ECHOREPLY_RATE=63, + +/* obsolet, replaced by global icmp limiting. + + NET_IPV4_ICMP_SOURCEQUENCH_RATE, + NET_IPV4_ICMP_DESTUNREACH_RATE, + NET_IPV4_ICMP_TIMEEXCEED_RATE, + NET_IPV4_ICMP_PARAMPROB_RATE, + NET_IPV4_ICMP_ECHOREPLY_RATE, + + use NET_IPV4_ICMP_RATELIMIT, NET_IPV4_ICMP_RATEMASK instead + +*/ NET_IPV4_ICMP_IGNORE_BOGUS_ERROR_RESPONSES=64, NET_IPV4_IGMP_MAX_MEMBERSHIPS=65, NET_TCP_TW_RECYCLE=66, @@ -281,6 +288,8 @@ NET_TCP_APP_WIN=86, NET_TCP_ADV_WIN_SCALE=87, NET_IPV4_NONLOCAL_BIND=88, + NET_IPV4_ICMP_RATELIMIT=89, + NET_IPV4_ICMP_RATEMASK=90 }; enum { diff -ur linux-sane/net/ipv4/icmp.c linux/net/ipv4/icmp.c --- linux-sane/net/ipv4/icmp.c Thu Jun 21 06:00:55 2001 +++ linux/net/ipv4/icmp.c Fri Aug 3 13:29:46 2001 @@ -16,6 +16,9 @@ * Other than that this module is a complete rewrite. * * Fixes: + * Clemens Fruhwirth : introduce global icmp rate limiting + * with icmp type masking ability instead + * of broken per type icmp timeouts. * Mike Shaver : RFC1122 checks. * Alan Cox : Multicast ping reply as self. * Alan Cox : Fix atomicity lockup in ip_build_xmit @@ -145,6 +148,23 @@ /* Control parameter - ignore bogus broadcast responses? */ int sysctl_icmp_ignore_bogus_error_responses; +/* + * Configurable global rate limit. + * + * ratelimit defines token/tick for dst->rate_token bucket + * ratemask defines which icmp types are ratelimited by setting + * it's bit position. + * + * FIXME: verify if the defaults are reasonable + * + * default: + * dest unreachable (0x03), source quench (0x04), + * time exceeded (0x11), parameter problem (0x12) + */ + +int sysctl_icmp_ratelimit = 1*HZ; +int sysctl_icmp_ratemask = 0x1818; + /* * ICMP control array. This specifies what to do with each ICMP. */ @@ -155,7 +175,6 @@ unsigned long *input; /* Address to increment on input */ void (*handler)(struct sk_buff *skb); short error; /* This ICMP is classed as an error message */ - int *timeout; /* Rate limit */ }; static struct icmp_control icmp_pointers[NR_ICMP_TYPES+1]; @@ -257,22 +276,21 @@ { struct dst_entry *dst = &rt->u.dst; - if (type > NR_ICMP_TYPES || !icmp_pointers[type].timeout) + if (type > NR_ICMP_TYPES) return 1; /* Don't limit PMTU discovery. */ if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED) return 1; - /* Redirect has its own rate limit mechanism */ - if (type == ICMP_REDIRECT) - return 1; - /* No rate limit on loopback */ if (dst->dev && (dst->dev->flags&IFF_LOOPBACK)) return 1; - return xrlim_allow(dst, *(icmp_pointers[type].timeout)); + if((1 << type) & sysctl_icmp_ratemask) + return xrlim_allow(dst,sysctl_icmp_ratelimit); + else + return 1; } /* @@ -929,18 +947,7 @@ } -/* - * Configurable rate limits. - * Someone should check if these default values are correct. - * Note that these values interact with the routing cache GC timeout. - * If you chose them too high they won't take effect, because the - * dst_entry gets expired too early. The same should happen when - * the cache grows too big. - */ -int sysctl_icmp_destunreach_time = 1*HZ; -int sysctl_icmp_timeexceed_time = 1*HZ; -int sysctl_icmp_paramprob_time = 1*HZ; -int sysctl_icmp_echoreply_time; /* don't limit it per default. */ + /* * This table is the definition of how we handle ICMP. @@ -948,37 +955,37 @@ static struct icmp_control icmp_pointers[NR_ICMP_TYPES+1] = { /* ECHO REPLY (0) */ - { &icmp_statistics[0].IcmpOutEchoReps, &icmp_statistics[0].IcmpInEchoReps, icmp_discard, 0, &sysctl_icmp_echoreply_time}, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutEchoReps, &icmp_statistics[0].IcmpInEchoReps, icmp_discard, 0 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* DEST UNREACH (3) */ - { &icmp_statistics[0].IcmpOutDestUnreachs, &icmp_statistics[0].IcmpInDestUnreachs, icmp_unreach, 1, &sysctl_icmp_destunreach_time }, + { &icmp_statistics[0].IcmpOutDestUnreachs, &icmp_statistics[0].IcmpInDestUnreachs, icmp_unreach, 1 }, /* SOURCE QUENCH (4) */ - { &icmp_statistics[0].IcmpOutSrcQuenchs, &icmp_statistics[0].IcmpInSrcQuenchs, icmp_unreach, 1, }, + { &icmp_statistics[0].IcmpOutSrcQuenchs, &icmp_statistics[0].IcmpInSrcQuenchs, icmp_unreach, 1 }, /* REDIRECT (5) */ - { &icmp_statistics[0].IcmpOutRedirects, &icmp_statistics[0].IcmpInRedirects, icmp_redirect, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutRedirects, &icmp_statistics[0].IcmpInRedirects, icmp_redirect, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* ECHO (8) */ - { &icmp_statistics[0].IcmpOutEchos, &icmp_statistics[0].IcmpInEchos, icmp_echo, 0, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutEchos, &icmp_statistics[0].IcmpInEchos, icmp_echo, 0 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* TIME EXCEEDED (11) */ - { &icmp_statistics[0].IcmpOutTimeExcds, &icmp_statistics[0].IcmpInTimeExcds, icmp_unreach, 1, &sysctl_icmp_timeexceed_time }, + { &icmp_statistics[0].IcmpOutTimeExcds, &icmp_statistics[0].IcmpInTimeExcds, icmp_unreach, 1 }, /* PARAMETER PROBLEM (12) */ - { &icmp_statistics[0].IcmpOutParmProbs, &icmp_statistics[0].IcmpInParmProbs, icmp_unreach, 1, &sysctl_icmp_paramprob_time }, + { &icmp_statistics[0].IcmpOutParmProbs, &icmp_statistics[0].IcmpInParmProbs, icmp_unreach, 1 }, /* TIMESTAMP (13) */ - { &icmp_statistics[0].IcmpOutTimestamps, &icmp_statistics[0].IcmpInTimestamps, icmp_timestamp, 0, }, + { &icmp_statistics[0].IcmpOutTimestamps, &icmp_statistics[0].IcmpInTimestamps, icmp_timestamp, 0 }, /* TIMESTAMP REPLY (14) */ - { &icmp_statistics[0].IcmpOutTimestampReps, &icmp_statistics[0].IcmpInTimestampReps, icmp_discard, 0, }, + { &icmp_statistics[0].IcmpOutTimestampReps, &icmp_statistics[0].IcmpInTimestampReps, icmp_discard, 0 }, /* INFO (15) */ - { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0, }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0 }, /* INFO REPLY (16) */ - { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0, }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0 }, /* ADDR MASK (17) */ - { &icmp_statistics[0].IcmpOutAddrMasks, &icmp_statistics[0].IcmpInAddrMasks, icmp_address, 0, }, + { &icmp_statistics[0].IcmpOutAddrMasks, &icmp_statistics[0].IcmpInAddrMasks, icmp_address, 0 }, /* ADDR MASK REPLY (18) */ - { &icmp_statistics[0].IcmpOutAddrMaskReps, &icmp_statistics[0].IcmpInAddrMaskReps, icmp_address_reply, 0, } + { &icmp_statistics[0].IcmpOutAddrMaskReps, &icmp_statistics[0].IcmpInAddrMaskReps, icmp_address_reply, 0 } }; void __init icmp_init(struct net_proto_family *ops) diff -ur linux-sane/net/ipv4/sysctl_net_ipv4.c linux/net/ipv4/sysctl_net_ipv4.c --- linux-sane/net/ipv4/sysctl_net_ipv4.c Mon Mar 26 04:14:25 2001 +++ linux/net/ipv4/sysctl_net_ipv4.c Fri Aug 3 12:44:28 2001 @@ -32,10 +32,8 @@ extern int sysctl_ip_dynaddr; /* From icmp.c */ -extern int sysctl_icmp_destunreach_time; -extern int sysctl_icmp_timeexceed_time; -extern int sysctl_icmp_paramprob_time; -extern int sysctl_icmp_echoreply_time; +extern int sysctl_icmp_ratelimit; +extern int sysctl_icmp_ratemask; /* From igmp.c */ extern int sysctl_igmp_max_memberships; @@ -178,14 +176,6 @@ {NET_IPV4_ICMP_IGNORE_BOGUS_ERROR_RESPONSES, "icmp_ignore_bogus_error_responses", &sysctl_icmp_ignore_bogus_error_responses, sizeof(int), 0644, NULL, &proc_dointvec}, - {NET_IPV4_ICMP_DESTUNREACH_RATE, "icmp_destunreach_rate", - &sysctl_icmp_destunreach_time, sizeof(int), 0644, NULL, &proc_dointvec}, - {NET_IPV4_ICMP_TIMEEXCEED_RATE, "icmp_timeexceed_rate", - &sysctl_icmp_timeexceed_time, sizeof(int), 0644, NULL, &proc_dointvec}, - {NET_IPV4_ICMP_PARAMPROB_RATE, "icmp_paramprob_rate", - &sysctl_icmp_paramprob_time, sizeof(int), 0644, NULL, &proc_dointvec}, - {NET_IPV4_ICMP_ECHOREPLY_RATE, "icmp_echoreply_rate", - &sysctl_icmp_echoreply_time, sizeof(int), 0644, NULL, &proc_dointvec}, {NET_IPV4_ROUTE, "route", NULL, 0, 0555, ipv4_route_table}, #ifdef CONFIG_IP_MULTICAST {NET_IPV4_IGMP_MAX_MEMBERSHIPS, "igmp_max_memberships", @@ -227,6 +217,10 @@ &sysctl_tcp_app_win, sizeof(int), 0644, NULL, &proc_dointvec}, {NET_TCP_ADV_WIN_SCALE, "tcp_adv_win_scale", &sysctl_tcp_adv_win_scale, sizeof(int), 0644, NULL, &proc_dointvec}, + {NET_IPV4_ICMP_RATELIMIT, "icmp_ratelimit", + &sysctl_icmp_ratelimit, sizeof(int), 0644, NULL, &proc_dointvec}, + {NET_IPV4_ICMP_RATEMASK, "icmp_ratemask", + &sysctl_icmp_ratemask, sizeof(int), 0644, NULL, &proc_dointvec}, {0} }; --ReaqsoxgOBHFXBhH-- From owner-netdev@oss.sgi.com Fri Aug 3 05:57:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73Cv8409368 for netdev-outgoing; Fri, 3 Aug 2001 05:57:08 -0700 Received: from opium.mbsi.ca (opium.mbsi.ca [198.168.101.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73Cv6V09365 for ; Fri, 3 Aug 2001 05:57:06 -0700 Received: from opium.mbsi.ca (marc@localhost [127.0.0.1]) by opium.mbsi.ca (8.11.3/8.11.3) with ESMTP id f73Cua521760; Fri, 3 Aug 2001 08:56:36 -0400 (EDT) Message-Id: <200108031256.f73Cua521760@opium.mbsi.ca> X-Mailer: exmh version 2.2 2001/03/06 with nmh-1.0.4+dev To: "David S. Miller" cc: Rusty Russell , Harald Welte , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-Reply-To: Your message of "Fri, 03 Aug 2001 01:56:37 PDT." <15210.26437.501185.294773@pizda.ninka.net> References: <20010802073648.G1612@obroa-skai.gnumonks.org> <15210.26437.501185.294773@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 03 Aug 2001 08:56:36 -0400 From: Marc Boucher Sender: owner-netdev@oss.sgi.com Precedence: bulk Hi Dave, > Rusty Russell writes: > > I think it's pretty clear to everyone that I don't have time or > > resources to maintain this stuff any more. > > Any takers? :-) There is the netfilter coreteam, remember? We had some confusion a while ago when acceptance of critical security patches was delayed due to Rusty being on vacation and some folks insisting on accepting patches only from him, despite the fact that he had explicitly delegated the power to submit patches/release software to the whole coreteam. Harald is the most active team member these days and he will be submitting most patches in the foreseeable future I guess, but you should still accept fixes from other coreteam members (James Morris, Rusty, and myself) especially when urgent and the changes are judged adequate (ie. we have reached consensus, after discussion/revision like in this case). > > Have appended my other minor fix. Dave, please apply... > > Done. Thanks! Marc > > Have appended my other minor fix. Dave, please apply... > > Done. > > Later, > David S. Miller > davem@redhat.com > From owner-netdev@oss.sgi.com Fri Aug 3 06:10:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73DAAa09604 for netdev-outgoing; Fri, 3 Aug 2001 06:10:10 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73DA9V09601 for ; Fri, 3 Aug 2001 06:10:09 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id GAA20940; Fri, 3 Aug 2001 06:10:05 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15210.41645.786815.800356@pizda.ninka.net> Date: Fri, 3 Aug 2001 06:10:05 -0700 (PDT) To: Marc Boucher Cc: Rusty Russell , Harald Welte , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-Reply-To: <200108031256.f73Cua521760@opium.mbsi.ca> References: <20010802073648.G1612@obroa-skai.gnumonks.org> <15210.26437.501185.294773@pizda.ninka.net> <200108031256.f73Cua521760@opium.mbsi.ca> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Marc Boucher writes: > Harald is the most active team member these days and he will be > submitting most patches in the foreseeable future I guess, but you > should still accept fixes from other coreteam members (James Morris, > Rusty, and myself) especially when urgent and the changes are judged > adequate (ie. we have reached consensus, after discussion/revision like > in this case). I would prefer to limit the interaction to one, maybe two members of the core team. Can you guys pick no more than two people for sending patches to me? Thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Aug 3 10:22:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73HMFF16191 for netdev-outgoing; Fri, 3 Aug 2001 10:22:15 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73HM8V16188 for ; Fri, 3 Aug 2001 10:22:11 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA20058; Fri, 3 Aug 2001 21:21:54 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id CAA00311; Fri, 3 Aug 2001 02:10:05 +0400 Message-Id: <200108022210.CAA00311@mops.inr.ac.ru> Subject: Re: Linux 2.4 network performance oddities To: BERND.STURM@NDSatcom.COM Date: Fri, 3 Aug 2001 02:10:05 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: from "BERND.STURM@NDSatcom.COM" at Aug 2, 1 03:15:06 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=X-RU_RU.KOI8-R Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > transfer at a time and no Bit Error rate set. But when we set a BER and > later set BER back to 0 the transfers wouldnīt run at the same transfer rate > as before without BER but at only the speeds with BER set. TCP remebers that this path is lossy and do not try to stress it again with slow start, doing congestion avoidance instead. It will disappear for logarithmic time, exactly like it happens when you disable BER _during_ one connection. > of the connections from the multisession-netperf-transfer ! The same thing. > else who could help me. Nothing to help really. It should recover itself following congestion avoidance. If it does not, prepare tcpdump of failing session. Alexey From owner-netdev@oss.sgi.com Fri Aug 3 10:41:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73Hfid16721 for netdev-outgoing; Fri, 3 Aug 2001 10:41:44 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73HfgV16718 for ; Fri, 3 Aug 2001 10:41:42 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA21968; Fri, 3 Aug 2001 21:40:56 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108031740.VAA21968@ms2.inr.ac.ru> Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) To: davem@redhat.com (David S. Miller) Date: Fri, 3 Aug 2001 21:40:56 +0400 (MSK DST) Cc: laforge@gnumonks.org, rusty@rustcorp.com.au, marc@mbsi.ca, netfilter-devel@lists.samba.org, netdev@oss.sgi.com In-Reply-To: <15210.14446.81297.26145@pizda.ninka.net> from "David S. Miller" at Aug 2, 1 10:36:46 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > > Sorry Rusty, but check on sizeof(struct tcphdr) is IMHO wrong, again. > > I think there is no way you can validly drop an ICMP packet just > because the TCP checksum field is not there in the embedded header. > > So I think I basically agree with Harald. Reminder to Paul: 99% of icmp errors have only 8 bytes of tcp header enough to get ports and sequence number and that's all. All the rest is an option, which is not respected by the most of routers and even host OSes. Alexey From owner-netdev@oss.sgi.com Fri Aug 3 10:56:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73Husb17048 for netdev-outgoing; Fri, 3 Aug 2001 10:56:54 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73HuqV17045 for ; Fri, 3 Aug 2001 10:56:53 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA22896; Fri, 3 Aug 2001 21:56:14 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108031756.VAA22896@ms2.inr.ac.ru> Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) To: marc@mbsi.ca (Marc Boucher) Date: Fri, 3 Aug 2001 21:56:14 +0400 (MSK DST) Cc: davem@redhat.com, rusty@rustcorp.com.au, laforge@gnumonks.org, netfilter-devel@lists.samba.org, netdev@oss.sgi.com In-Reply-To: <200108031256.f73Cua521760@opium.mbsi.ca> from "Marc Boucher" at Aug 3, 1 08:56:36 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > There is the netfilter coreteam, remember? TseKa? (=Central Comitee) Well, provided you elect some Tovarisch Gensec, this has chances not to degenrate to total irresponsibility. If Paul has problems with time, elect some Politburo at least. :-) Alexey From owner-netdev@oss.sgi.com Fri Aug 3 11:50:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73Iov918170 for netdev-outgoing; Fri, 3 Aug 2001 11:50:57 -0700 Received: from raq299.uk2net.com (raq299.uk2net.com [213.239.42.132]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73IotV18167 for ; Fri, 3 Aug 2001 11:50:56 -0700 Received: from mark by raq299.uk2net.com with local (Exim 3.16 #2) id 15Sk27-0004M1-00; Fri, 03 Aug 2001 19:50:51 +0100 Date: Fri, 3 Aug 2001 19:50:51 +0100 From: Mark Baker To: Pekka Savola Cc: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing Message-ID: <20010803195051.B15145@raq299.uk2net.com> Mail-Followup-To: Pekka Savola , netdev@oss.sgi.com References: <20010802133839.C24305@bacchus.dhis.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from pekkas@netcore.fi on Thu, Aug 02, 2001 at 08:41:57PM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Aug 02, 2001 at 08:41:57PM +0300, Pekka Savola wrote: > As there are no huge technical or address allocational reasons why ISP's > could not give at least /64, those ISP's that do get more popular and ones > dealing /128's do not, and disappear from IPv6 market. There are, however, technical reasons why ISPs might want to use dynamic IPs (if they have lots of dial-up hardware in different locations, routing issues make static IP difficult), so although their customers would get a /64, it might be a different one every time they dial up. In that situation, since I wouldn't want addresses on my local network to keep changing, I would want to use NAT to translate the address block assigned by the ISP onto some site local address space. From owner-netdev@oss.sgi.com Fri Aug 3 12:00:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73J0vq18794 for netdev-outgoing; Fri, 3 Aug 2001 12:00:57 -0700 Received: from lust.cs.ohiou.edu (adsl-dynamic1-8.cleveland.oh.ameritech.net [64.108.88.8]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73J0sV18791 for ; Fri, 3 Aug 2001 12:00:54 -0700 Received: (from elb@localhost) by lust.cs.ohiou.edu (8.11.2/8.11.2) id f73J0ck30161; Fri, 3 Aug 2001 15:00:38 -0400 Date: Fri, 3 Aug 2001 15:00:38 -0400 From: Ethan Blanton To: Mark Baker Cc: Pekka Savola , netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing Message-ID: <20010803150038.G29494@localhost.localdomain> Mail-Followup-To: Mark Baker , Pekka Savola , netdev@oss.sgi.com References: <20010802133839.C24305@bacchus.dhis.org> <20010803195051.B15145@raq299.uk2net.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="r5lq+205vWdkqwtk" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010803195051.B15145@raq299.uk2net.com>; from mark@mnb.org.uk on Fri, Aug 03, 2001 at 07:50:51PM +0100 X-Operating-System: Linux X-GnuPG-Fingerprint: A290 14A8 C682 5C88 AE51 4787 AFD9 00F4 883C 1C14 Sender: owner-netdev@oss.sgi.com Precedence: bulk --r5lq+205vWdkqwtk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Mark Baker spake unto us the following wisdom: > > As there are no huge technical or address allocational reasons why ISP's > > could not give at least /64, those ISP's that do get more popular and o= nes > > dealing /128's do not, and disappear from IPv6 market. >=20 > There are, however, technical reasons why ISPs might want to use dynamic = IPs > (if they have lots of dial-up hardware in different locations, routing > issues make static IP difficult), so although their customers would get a > /64, it might be a different one every time they dial up. >=20 > In that situation, since I wouldn't want addresses on my local network to > keep changing, I would want to use NAT to translate the address block > assigned by the ISP onto some site local address space. IIRC, there are discussions in one of the IPv6 RFCs of exactly this. This type of NAT is obscenely simple for protocols that are not internally address-aware (i.e. not FTP, DCC, etc.), as you simply replace the top N bits (where N is your prefixlen) and adjust the TCP/UDP checksum as necessary. 'Fraid I can't help for address-aware protocols, but the salient point here is that that type of NAT is officially blessed somewhere, I believe. Ethan --=20 If I've told you once, I've told you once And once is all that you needed. -- The Refreshments, "Carefree" --r5lq+205vWdkqwtk Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7avTWr9kA9Ig8HBQRAgtPAJ9obVKo8bmiKMwd6ImVeLiFw8Qi4wCfatPz nWm5hBUwbJJM7pcQxM9P8yo= =S4T8 -----END PGP SIGNATURE----- --r5lq+205vWdkqwtk-- From owner-netdev@oss.sgi.com Fri Aug 3 12:33:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73JXF819501 for netdev-outgoing; Fri, 3 Aug 2001 12:33:15 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73JXDV19498 for ; Fri, 3 Aug 2001 12:33:13 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f73JX0013779; Fri, 3 Aug 2001 22:33:01 +0300 Date: Fri, 3 Aug 2001 22:33:00 +0300 (EEST) From: Pekka Savola To: Mark Baker cc: Subject: Re: IPv6 fragmentation and IPv6 header parsing In-Reply-To: <20010803195051.B15145@raq299.uk2net.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 3 Aug 2001, Mark Baker wrote: > On Thu, Aug 02, 2001 at 08:41:57PM +0300, Pekka Savola wrote: > > > As there are no huge technical or address allocational reasons why ISP's > > could not give at least /64, those ISP's that do get more popular and ones > > dealing /128's do not, and disappear from IPv6 market. > > There are, however, technical reasons why ISPs might want to use dynamic IPs > (if they have lots of dial-up hardware in different locations, routing > issues make static IP difficult), so although their customers would get a > /64, it might be a different one every time they dial up. > > In that situation, since I wouldn't want addresses on my local network to > keep changing, I would want to use NAT to translate the address block > assigned by the ISP onto some site local address space. This is an "illegal" use of NAT for IPv6. The use of autoconfiguration/router renumbering is encouraged in situations like these. If the prefix isn't static, dyndns _could_ be used but I don't see this as an issue (if you don't get a static prefix, you probably shouldn't have sensible dns records either). -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Fri Aug 3 12:56:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73Jusd19823 for netdev-outgoing; Fri, 3 Aug 2001 12:56:54 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73JuqV19818 for ; Fri, 3 Aug 2001 12:56:52 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15Sl4C-0001Ol-00 for netdev@oss.sgi.com; Fri, 03 Aug 2001 21:57:04 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15SO2x-0002GB-00; Thu, 02 Aug 2001 16:22:15 -0300 Date: Thu, 2 Aug 2001 16:22:15 -0300 From: Harald Welte To: clemens Cc: netdev@oss.sgi.com Subject: Re: [PATCH] global icmp rate limiting Message-ID: <20010802162214.O1612@obroa-skai.gnumonks.org> References: <20010803134206.A653@ghanima.endorphin.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010803134206.A653@ghanima.endorphin.org>; from therapy@endorphin.org on Fri, Aug 03, 2001 at 01:42:06PM +0200 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7-nfpom X-Date: Today is Boomtime, the 66th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Aug 03, 2001 at 01:42:06PM +0200, clemens wrote: > this patch introduces global icmp rate limiting > (/proc/sys/net/ipv4/icmp_ratelimit) with the ability to arbitary > rate limit or unlimit certain icmp types (/proc/sys/net/ipv4/icmp_ratemask, > but you better have a look at icmp.c before changing this). If somebody is going to change the icmp rate limiting code, please take into consideration fixing the kernel/userspace interface as well. There was a thread about this on linux-kernel some months ago. The basic problem is, that the values in /proc/sys/net/icmp_xxx_rate are dependent on HZ. This is bad, because there is no way to read out HZ from userspace (yes, there is code which tries to guess it, but that's a bad hack). So either we have a) HZ is not exposed to userspace _AND_ all interfaces are HZ-independent b) HZ is exposed to userspace But the current situation, where every sysctl.conf including icmp rate limits just has to guess what HZ is, is from my point of view a broken interface. And then of course I have to add (as a comment) that the functionality of generic icmp rate limiting is replicated in iptables currently (icmp match + limit match)... but yes, I understand that there are reasons why you don't want to load iptables. > clemens -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Fri Aug 3 12:56:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73JuxK19831 for netdev-outgoing; Fri, 3 Aug 2001 12:56:59 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73JupV19816 for ; Fri, 3 Aug 2001 12:56:52 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15Sl4C-0001OY-00 for netdev@oss.sgi.com; Fri, 03 Aug 2001 21:57:04 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15SNgL-0002Es-00; Thu, 02 Aug 2001 15:58:53 -0300 Date: Thu, 2 Aug 2001 15:58:53 -0300 From: Harald Welte To: Rusty Russell Cc: Marc Boucher , Alexey Kuznetsov , Dave Miller , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) Message-ID: <20010802155853.K1612@obroa-skai.gnumonks.org> Mail-Followup-To: Harald Welte , Rusty Russell , Marc Boucher , Alexey Kuznetsov , Dave Miller , netfilter-devel@lists.samba.org, netdev@oss.sgi.com References: <20010802073648.G1612@obroa-skai.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: ; from rusty@rustcorp.com.au on Fri, Aug 03, 2001 at 05:18:45PM +1000 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7-nfpom X-Date: Today is Boomtime, the 66th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Aug 03, 2001 at 05:18:45PM +1000, Rusty Russell wrote: > In message <20010802073648.G1612@obroa-skai.gnumonks.org> you write: > > - scenario a > > Imagine the case, where we have the first 18 bytes of the tcp header, > > ACK... Your patch is correct. Well, maybe I should have stated that more explicitly, but the patch was the one proposed by Marc, as he checked this version into CVS last evening. As Marc went to bed last night, he asked me to refer to (or attach) the patch to any email I'm going to send ;) So credits for the fix fully go to Marc, I just agreed with his fix and promoted it :) > Was still not thinking about ICMP packets, and I'm supposed to be working on > work stuff at the moment. > I think it's pretty clear to everyone that I don't have time or > resources to maintain this stuff any more. Netfilter was and is mainly your project, it would be a big loss if you'd give up on it. > Rusty. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Fri Aug 3 12:57:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73Jv6f19842 for netdev-outgoing; Fri, 3 Aug 2001 12:57:06 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73Jv4V19839 for ; Fri, 3 Aug 2001 12:57:04 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15Sl4A-0001OG-00 for netdev@oss.sgi.com; Fri, 03 Aug 2001 21:57:02 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15SNqS-0002Fe-00; Thu, 02 Aug 2001 16:09:20 -0300 Date: Thu, 2 Aug 2001 16:09:20 -0300 From: Harald Welte To: "David S. Miller" Cc: Rusty Russell , Marc Boucher , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) Message-ID: <20010802160920.N1612@obroa-skai.gnumonks.org> Mail-Followup-To: Harald Welte , "David S. Miller" , Rusty Russell , Marc Boucher , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com References: <20010802073648.G1612@obroa-skai.gnumonks.org> <15210.26437.501185.294773@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <15210.26437.501185.294773@pizda.ninka.net>; from davem@redhat.com on Fri, Aug 03, 2001 at 01:56:37AM -0700 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7-nfpom X-Date: Today is Boomtime, the 66th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, Aug 03, 2001 at 01:56:37AM -0700, David Miller wrote: > > Rusty Russell writes: > > I think it's pretty clear to everyone that I don't have time or > > resources to maintain this stuff any more. > > Any takers? :-) Well, we have a core team consisting out of four people. Rusty created that once, to divide the task of maintainership between people who have committed themselves to netfilter/iptables development over a long time. This is also what you find respectively in the MAINTAINERS file. But as I understood from the past, you prefer to receive fixes / updates through a single person (let's say with a gateway functionality). If this is what you are asking for, I'd be happy to volunteer. > > Have appended my other minor fix. Dave, please apply... > Done. Thanks. > David S. Miller -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Fri Aug 3 14:06:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73L6bR26076 for netdev-outgoing; Fri, 3 Aug 2001 14:06:37 -0700 Received: from lox.sandelman.ottawa.on.ca (IDENT:root@lox.sandelman.ottawa.on.ca [209.151.24.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73L6ZV26070 for ; Fri, 3 Aug 2001 14:06:35 -0700 Received: from nox.sandelman.ottawa.on.ca (nox.sandelman.ottawa.on.ca [209.151.24.6]) by lox.sandelman.ottawa.on.ca (8.8.7/8.8.8) with ESMTP id RAA22762 for ; Fri, 3 Aug 2001 17:06:32 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca (marajade.sandelman.ottawa.on.ca [209.151.24.20]) by nox.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f73L7BF11820 (using TLSv1/SSLv3 with cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified OK) for ; Fri, 3 Aug 2001 17:07:12 -0400 (EDT) Received: from marajade.sandelman.ottawa.on.ca (localhost [[UNIX: localhost]]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f73Kw3j18054 for ; Fri, 3 Aug 2001 16:58:04 -0400 (EDT) Message-Id: <200108032058.f73Kw3j18054@marajade.sandelman.ottawa.on.ca> To: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing In-reply-to: Your message of "Fri, 03 Aug 2001 19:50:51 BST." <20010803195051.B15145@raq299.uk2net.com> Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Fri, 03 Aug 2001 16:58:02 -0400 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk -----BEGIN PGP SIGNED MESSAGE----- >>>>> "Mark" == Mark Baker writes: >> As there are no huge technical or address allocational reasons why >> ISP's could not give at least /64, those ISP's that do get more >> popular and ones dealing /128's do not, and disappear from IPv6 >> market. Mark> There are, however, technical reasons why ISPs might want to use Mark> dynamic IPs (if they have lots of dial-up hardware in different Mark> locations, routing issues make static IP difficult), so although Mark> their customers would get a /64, it might be a different one every Mark> time they dial up. Mark> In that situation, since I wouldn't want addresses on my local Mark> network to keep changing, I would want to use NAT to translate the Mark> address block assigned by the ISP onto some site local address Mark> space. You might use 1:1 NAT (not NAPT), but that would be a mistake. 1) you use site local addresses to talk between hosts locally. 2) you use router advertisements to get a public address for your network. 3) you use TSIG signed updates to a DNS server to get your addresses into a DNS server. (possibly, just updating a single A6 record) This is all possible *NOW*. No need for NAT. ] ON HUMILITY: to err is human. To moo, bovine. | firewalls [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON |net architect[ ] mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |device driver[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ -----BEGIN PGP SIGNATURE----- Version: 2.6.3ia Charset: latin1 Comment: Processed by Mailcrypt 3.5.6, an Emacs/PGP interface iQCVAwUBO2sQWYqHRg3pndX9AQEifQP9H8NzYUBi6Sgif+7Ut8UP7A3B6tSZBZ5G JuZjdZXZx6c4+lHSCFLqHaHu3Zl5cEBq8ckEfNMO8C76KUVo5JTEk3C45kQ/SyPp JITSVAHPMuDZZ87ubbWYLKHOlt02BbiCi6dqDQcmOqmjKj1OkBZ15VDLpQcg+hQ5 tiaXaOUiQAg= =5l5w -----END PGP SIGNATURE----- From owner-netdev@oss.sgi.com Fri Aug 3 15:08:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f73M8TX31879 for netdev-outgoing; Fri, 3 Aug 2001 15:08:29 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f73M8RV31871 for ; Fri, 3 Aug 2001 15:08:27 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA12012; Fri, 3 Aug 2001 15:08:23 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15211.8406.989733.977905@pizda.ninka.net> Date: Fri, 3 Aug 2001 15:08:22 -0700 (PDT) To: Harald Welte Cc: Rusty Russell , Marc Boucher , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-Reply-To: <20010802160920.N1612@obroa-skai.gnumonks.org> References: <20010802073648.G1612@obroa-skai.gnumonks.org> <15210.26437.501185.294773@pizda.ninka.net> <20010802160920.N1612@obroa-skai.gnumonks.org> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Harald Welte writes: > But as I understood from the past, you prefer to receive fixes / updates > through a single person (let's say with a gateway functionality). If > this is what you are asking for, I'd be happy to volunteer. If this is fine with the rest of the core team, great. It is the ideal situation. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Aug 3 18:01:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7411Xj14885 for netdev-outgoing; Fri, 3 Aug 2001 18:01:33 -0700 Received: from opium.mbsi.ca (opium.mbsi.ca [198.168.101.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7411WV14881 for ; Fri, 3 Aug 2001 18:01:32 -0700 Received: from opium.mbsi.ca (marc@localhost [127.0.0.1]) by opium.mbsi.ca (8.11.3/8.11.3) with ESMTP id f74111626361; Fri, 3 Aug 2001 21:01:01 -0400 (EDT) Message-Id: <200108040101.f74111626361@opium.mbsi.ca> X-Mailer: exmh version 2.2 2001/03/06 with nmh-1.0.4+dev To: "David S. Miller" cc: Harald Welte , Rusty Russell , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-Reply-To: Your message of "Fri, 03 Aug 2001 15:08:22 PDT." <15211.8406.989733.977905@pizda.ninka.net> References: <20010802073648.G1612@obroa-skai.gnumonks.org> <15210.26437.501185.294773@pizda.ninka.net> <20010802160920.N1612@obroa-skai.gnumonks.org> <15211.8406.989733.977905@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 03 Aug 2001 21:01:01 -0400 From: Marc Boucher Sender: owner-netdev@oss.sgi.com Precedence: bulk > > Harald Welte writes: > > But as I understood from the past, you prefer to receive fixes / updates > > through a single person (let's say with a gateway functionality). If > > this is what you are asking for, I'd be happy to volunteer. > > If this is fine with the rest of the core team, great. It is the > ideal situation. Agreed. However in exceptional circumstances (such as discovery of a really critical problem) when Harald is away for a prolonged period, decent urgent fixes should be accepted from any other sane core-team member.. Marc From owner-netdev@oss.sgi.com Fri Aug 3 18:20:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f741KXS16998 for netdev-outgoing; Fri, 3 Aug 2001 18:20:33 -0700 Received: from localhost ([144.137.81.73]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f741KVV16986 for ; Fri, 3 Aug 2001 18:20:31 -0700 Received: from localhost ([127.0.0.1] helo=rustcorp.com.au) by localhost with esmtp (Exim 3.31 #1 (Debian)) id 15Sq5k-0006kL-00; Sat, 04 Aug 2001 11:19:00 +1000 From: Rusty Russell To: jamal Cc: Alexey Kuznetsov , netdev@oss.sgi.com Subject: Re: Linux 2.4 networking/routing slowdown In-reply-to: Your message of "Thu, 02 Aug 2001 06:58:45 EDT." Date: Sat, 04 Aug 2001 11:18:59 +1000 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk In message you writ e: > Sorry, I missed this ... > Routing does not slow down when you dont compile in netfilter. > Upto 20% degradation if you turn it on with a single IP table rule > with 2.4.7 Hmmm... I missed the start of this. You're saying the overhead of CONFIG_NETFILTER + iptables module + one rule is 20%? Wow. The question is, which of these is the culprit? I'd like some idea of where to look! BTW, I have no 2.4 machine at the moment, so I'd only be looking... Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Sat Aug 4 06:54:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f74DsmO15852 for netdev-outgoing; Sat, 4 Aug 2001 06:54:48 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f74DskV15848 for ; Sat, 4 Aug 2001 06:54:47 -0700 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id PAA25897; Sat, 4 Aug 2001 15:55:31 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15211.65235.369977.774321@robur.slu.se> Date: Sat, 4 Aug 2001 15:55:31 +0200 To: Rusty Russell Cc: jamal , Alexey Kuznetsov , netdev@oss.sgi.com Subject: Re: Linux 2.4 networking/routing slowdown In-Reply-To: References: X-Mailer: VM 6.92 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Rusty Russell writes: > In message you writ > e: > > Sorry, I missed this ... > > Routing does not slow down when you dont compile in netfilter. > > Upto 20% degradation if you turn it on with a single IP table rule > > with 2.4.7 > > Hmmm... I missed the start of this. You're saying the overhead of > CONFIG_NETFILTER + iptables module + one rule is 20%? Rusty here are some numbers for connection tracking... Forwarding from eth0 to eth1. One million packets injected into eth0 at 890.000 pkts/s. Kernel 2.4.7 UP PII @ 933 MHz and hacked e1000 driver. First with run without ipchains.o. Ignore the RX-ERR, RX-DRP Intel does something weird when the counters roll over. Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 367023 782941 782941 632997 30 0 0 0 BRU eth1 1500 0 15 0 0 0 366830 0 0 0 BRU A throughput of 0.37 * 890.000 = 329.000 pkts/s With ipchains.o insmoded but *no* filters at all... Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flags eth0 1500 0 249153 842948 842948 750870 31 0 0 0 BRU eth1 1500 0 8 0 0 0 249094 0 0 0 BRU A throughput of 0.25 * 890.000 = 222.000 pkts/s And it seems like its the connecting tracking that takes the resources iptables without connection tracking modules loaded was fine. The moral of this... Use iptables and don't load connection tracking unless you really need it. Cheers. --ro From owner-netdev@oss.sgi.com Sat Aug 4 20:07:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7537lt07324 for netdev-outgoing; Sat, 4 Aug 2001 20:07:47 -0700 Received: from localhost (CPE-61-9-151-240.vic.bigpond.net.au [61.9.151.240]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7537iV07318 for ; Sat, 4 Aug 2001 20:07:45 -0700 Received: from localhost ([127.0.0.1] helo=rustcorp.com.au) by localhost with esmtp (Exim 3.31 #1 (Debian)) id 15TEGf-0008QW-00; Sun, 05 Aug 2001 13:07:53 +1000 From: Rusty Russell To: Robert Olsson Cc: jamal , Alexey Kuznetsov , netdev@oss.sgi.com Subject: Re: Linux 2.4 networking/routing slowdown In-reply-to: Your message of "Sat, 04 Aug 2001 15:55:31 +0200." <15211.65235.369977.774321@robur.slu.se> Date: Sun, 05 Aug 2001 13:07:44 +1000 Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk In message <15211.65235.369977.774321@robur.slu.se> you write: > Rusty here are some numbers for connection tracking... > > Forwarding from eth0 to eth1. One million packets injected into eth0 at > 890.000 pkts/s. Kernel 2.4.7 UP PII @ 933 MHz and hacked e1000 driver. First > with run without ipchains.o. Ah... What are you using as a traffic generator? Creating a new connection is expensive (but could probably be optimized): given that usually < 1 in 10 packets is a new connection, this seemed a reasonable optimization strategy. If you are sending random packets, you are trying to create 1 million connections (well, some will timeout). You *can* help a bit by enlarging the hash tables: try: insmod ipchains hashsize=100000 You could also try sending the *same* packet 1,000,000 times, and see if we do better there... Very interesting, Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Sun Aug 5 03:29:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f75ATtN15491 for netdev-outgoing; Sun, 5 Aug 2001 03:29:55 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f75AToV15475 for ; Sun, 5 Aug 2001 03:29:50 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id DAA09097; Sun, 5 Aug 2001 03:29:06 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15213.8178.895951.411152@pizda.ninka.net> Date: Sun, 5 Aug 2001 03:29:06 -0700 (PDT) To: Marc Boucher Cc: Harald Welte , Rusty Russell , Alexey Kuznetsov , netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) In-Reply-To: <200108040101.f74111626361@opium.mbsi.ca> References: <20010802073648.G1612@obroa-skai.gnumonks.org> <15210.26437.501185.294773@pizda.ninka.net> <20010802160920.N1612@obroa-skai.gnumonks.org> <15211.8406.989733.977905@pizda.ninka.net> <200108040101.f74111626361@opium.mbsi.ca> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Marc Boucher writes: > Agreed. However in exceptional circumstances (such as discovery of a > really critical problem) when Harald is away for a prolonged period, > decent urgent fixes should be accepted from any other sane core-team > member.. Sure, not a problem. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sun Aug 5 08:57:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f75Fvju14845 for netdev-outgoing; Sun, 5 Aug 2001 08:57:45 -0700 Received: from mailhost.iitb.ac.in (mailhost.iitb.ac.in [203.197.74.142]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f75FvhV14835 for ; Sun, 5 Aug 2001 08:57:43 -0700 Received: (qmail 18599 invoked from network); 5 Aug 2001 15:51:38 -0000 Received: from jeeves.cse.iitb.ernet.in (HELO jeeves.cse.iitb.ac.in) (root@144.16.111.15) by mailhost.iitb.ac.in with SMTP; 5 Aug 2001 15:51:38 -0000 Received: from chandra.cse.iitb.ernet.in (chandra.cse.iitb.ernet.in [144.16.111.7]) by jeeves.cse.iitb.ac.in (8.11.0/8.11.0) with ESMTP id f75FvEX03366 for ; Sun, 5 Aug 2001 21:27:14 +0530 Received: from localhost (aswin@localhost) by chandra.cse.iitb.ernet.in (8.8.8/8.8.8) with SMTP id VAA16806 for ; Sun, 5 Aug 2001 21:25:47 +0530 (IST) Date: Sun, 5 Aug 2001 21:25:47 +0530 (IST) From: Aswani Kumar T To: netdev@oss.sgi.com Subject: Problem with tunneling in linux..(from student of IIT Bombay) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Dear Sir, I got ur mail id from ALan COX... the following is my problem..hope u will help me. i am doing project in the field of active networking. I am using linux box as my router. I have to manually tunnel the packet, that is after receiving an ip-packet through ip_rcv() , i have add a header to that packet with different dest ip address. For that iam using skb_push function. am i wrong in doing so?? it seems it is not working. i will be very gald if can tell me whether iam doing right thing in calling skb_push for extending the header. awaiting a quicker response. regards, aswin@cse.iitb.ac.in ---------------------------------------------- THE BEST DAY FOR STARTING A GOOD WORK IS TODAY THE BEST DAY FOR STOPPING A BAD WORK IS TODAY ---------------------------------------------- Aswani Kumar T, Aswani Kumar T M.Tech(CSE), IITBombay, 9-27, Amaravathi Nagar, Room no 154, Hostel-1, Tirupati-517502. powai, Mumbai-76. Ph: (08574) 41626 Ph (022) 5721017 46769 E-Mail : aswin@cse.iitb.ac.in aswin_t4@rediffmail.com Home Page : www.cse.iitb.ac.in/~aswin From owner-netdev@oss.sgi.com Sun Aug 5 13:42:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f75Kg9c24266 for netdev-outgoing; Sun, 5 Aug 2001 13:42:09 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f75Kg2V24247 for ; Sun, 5 Aug 2001 13:42:02 -0700 Received: (qmail 4151 invoked by uid 1000); 5 Aug 2001 20:22:53 -0000 Date: Sun, 5 Aug 2001 22:22:52 +0200 From: clemens To: Harald Welte Cc: netdev@oss.sgi.com Subject: [PATCH] Re: [PATCH] global icmp rate limiting Message-ID: <20010805222252.A4012@ghanima.endorphin.org> References: <20010803134206.A653@ghanima.endorphin.org> <20010802162214.O1612@obroa-skai.gnumonks.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="ZPt4rx8FFjLCG7dd" Content-Disposition: inline In-Reply-To: <20010802162214.O1612@obroa-skai.gnumonks.org> User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk --ZPt4rx8FFjLCG7dd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Aug 02, 2001 at 04:22:15PM -0300, Harald Welte wrote: > > this patch introduces global icmp rate limiting > > (/proc/sys/net/ipv4/icmp_ratelimit) with the ability to arbitary > > rate limit or unlimit certain icmp types (/proc/sys/net/ipv4/icmp_ratemask, > > but you better have a look at icmp.c before changing this). > > If somebody is going to change the icmp rate limiting code, please take > into consideration fixing the kernel/userspace interface as well. you're absolutly right. please consider patch attached. unit for icmp_ratelimit will be [packets/second]. HZ multiplication is cached in icmpv4_xrlim_allow. networking code maintainers please take note of this patch. i haven't got any response by official maintainers. clemens --ZPt4rx8FFjLCG7dd Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="icmp-global-rate3.patch" diff -ur linux-sane/include/linux/sysctl.h linux/include/linux/sysctl.h --- linux-sane/include/linux/sysctl.h Fri Jul 20 21:52:18 2001 +++ linux/include/linux/sysctl.h Fri Aug 3 13:22:56 2001 @@ -251,11 +251,18 @@ NET_IPV4_LOCAL_PORT_RANGE=56, NET_IPV4_ICMP_ECHO_IGNORE_ALL=57, NET_IPV4_ICMP_ECHO_IGNORE_BROADCASTS=58, - NET_IPV4_ICMP_SOURCEQUENCH_RATE=59, - NET_IPV4_ICMP_DESTUNREACH_RATE=60, - NET_IPV4_ICMP_TIMEEXCEED_RATE=61, - NET_IPV4_ICMP_PARAMPROB_RATE=62, - NET_IPV4_ICMP_ECHOREPLY_RATE=63, + +/* obsolet, replaced by global icmp limiting. + + NET_IPV4_ICMP_SOURCEQUENCH_RATE, + NET_IPV4_ICMP_DESTUNREACH_RATE, + NET_IPV4_ICMP_TIMEEXCEED_RATE, + NET_IPV4_ICMP_PARAMPROB_RATE, + NET_IPV4_ICMP_ECHOREPLY_RATE, + + use NET_IPV4_ICMP_RATELIMIT, NET_IPV4_ICMP_RATEMASK instead + +*/ NET_IPV4_ICMP_IGNORE_BOGUS_ERROR_RESPONSES=64, NET_IPV4_IGMP_MAX_MEMBERSHIPS=65, NET_TCP_TW_RECYCLE=66, @@ -281,6 +288,8 @@ NET_TCP_APP_WIN=86, NET_TCP_ADV_WIN_SCALE=87, NET_IPV4_NONLOCAL_BIND=88, + NET_IPV4_ICMP_RATELIMIT=89, + NET_IPV4_ICMP_RATEMASK=90 }; enum { diff -ur linux-sane/net/ipv4/icmp.c linux/net/ipv4/icmp.c --- linux-sane/net/ipv4/icmp.c Thu Jun 21 06:00:55 2001 +++ linux/net/ipv4/icmp.c Sun Aug 5 22:14:34 2001 @@ -16,6 +16,9 @@ * Other than that this module is a complete rewrite. * * Fixes: + * Clemens Fruhwirth : introduce global icmp rate limiting + * with icmp type masking ability instead + * of broken per type icmp timeouts. * Mike Shaver : RFC1122 checks. * Alan Cox : Multicast ping reply as self. * Alan Cox : Fix atomicity lockup in ip_build_xmit @@ -145,6 +148,23 @@ /* Control parameter - ignore bogus broadcast responses? */ int sysctl_icmp_ignore_bogus_error_responses; +/* + * Configurable global rate limit. + * + * ratelimit defines token/tick for dst->rate_token bucket + * ratemask defines which icmp types are ratelimited by setting + * it's bit position. + * + * FIXME: verify if the defaults are reasonable + * + * default: + * dest unreachable (0x03), source quench (0x04), + * time exceeded (0x11), parameter problem (0x12) + */ + +int sysctl_icmp_ratelimit = 1; +int sysctl_icmp_ratemask = 0x1818; + /* * ICMP control array. This specifies what to do with each ICMP. */ @@ -155,7 +175,6 @@ unsigned long *input; /* Address to increment on input */ void (*handler)(struct sk_buff *skb); short error; /* This ICMP is classed as an error message */ - int *timeout; /* Rate limit */ }; static struct icmp_control icmp_pointers[NR_ICMP_TYPES+1]; @@ -223,11 +242,6 @@ * Note that the same dst_entry fields are modified by functions in * route.c too, but these work for packet destinations while xrlim_allow * works for icmp destinations. This means the rate limiting information - * for one "ip object" is shared. - * - * Note that the same dst_entry fields are modified by functions in - * route.c too, but these work for packet destinations while xrlim_allow - * works for icmp destinations. This means the rate limiting information * for one "ip object" is shared - and these ICMPs are twice limited: * by source and by destination. * @@ -257,22 +271,27 @@ { struct dst_entry *dst = &rt->u.dst; - if (type > NR_ICMP_TYPES || !icmp_pointers[type].timeout) + static int icmp_ratelimit_cache; // mul is much slower than + static int icmp_ratelimit_last; // cmp + branch + + if (type > NR_ICMP_TYPES) return 1; /* Don't limit PMTU discovery. */ if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED) return 1; - /* Redirect has its own rate limit mechanism */ - if (type == ICMP_REDIRECT) - return 1; - /* No rate limit on loopback */ if (dst->dev && (dst->dev->flags&IFF_LOOPBACK)) return 1; - return xrlim_allow(dst, *(icmp_pointers[type].timeout)); + if((1 << type) & sysctl_icmp_ratemask) { + if(sysctl_icmp_ratelimit != icmp_ratelimit_last) + icmp_ratelimit_cache = sysctl_icmp_ratelimit*HZ; + return xrlim_allow(dst,icmp_ratelimit_cache); + } + else + return 1; } /* @@ -929,18 +948,7 @@ } -/* - * Configurable rate limits. - * Someone should check if these default values are correct. - * Note that these values interact with the routing cache GC timeout. - * If you chose them too high they won't take effect, because the - * dst_entry gets expired too early. The same should happen when - * the cache grows too big. - */ -int sysctl_icmp_destunreach_time = 1*HZ; -int sysctl_icmp_timeexceed_time = 1*HZ; -int sysctl_icmp_paramprob_time = 1*HZ; -int sysctl_icmp_echoreply_time; /* don't limit it per default. */ + /* * This table is the definition of how we handle ICMP. @@ -948,37 +956,37 @@ static struct icmp_control icmp_pointers[NR_ICMP_TYPES+1] = { /* ECHO REPLY (0) */ - { &icmp_statistics[0].IcmpOutEchoReps, &icmp_statistics[0].IcmpInEchoReps, icmp_discard, 0, &sysctl_icmp_echoreply_time}, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutEchoReps, &icmp_statistics[0].IcmpInEchoReps, icmp_discard, 0 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* DEST UNREACH (3) */ - { &icmp_statistics[0].IcmpOutDestUnreachs, &icmp_statistics[0].IcmpInDestUnreachs, icmp_unreach, 1, &sysctl_icmp_destunreach_time }, + { &icmp_statistics[0].IcmpOutDestUnreachs, &icmp_statistics[0].IcmpInDestUnreachs, icmp_unreach, 1 }, /* SOURCE QUENCH (4) */ - { &icmp_statistics[0].IcmpOutSrcQuenchs, &icmp_statistics[0].IcmpInSrcQuenchs, icmp_unreach, 1, }, + { &icmp_statistics[0].IcmpOutSrcQuenchs, &icmp_statistics[0].IcmpInSrcQuenchs, icmp_unreach, 1 }, /* REDIRECT (5) */ - { &icmp_statistics[0].IcmpOutRedirects, &icmp_statistics[0].IcmpInRedirects, icmp_redirect, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutRedirects, &icmp_statistics[0].IcmpInRedirects, icmp_redirect, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* ECHO (8) */ - { &icmp_statistics[0].IcmpOutEchos, &icmp_statistics[0].IcmpInEchos, icmp_echo, 0, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutEchos, &icmp_statistics[0].IcmpInEchos, icmp_echo, 0 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* TIME EXCEEDED (11) */ - { &icmp_statistics[0].IcmpOutTimeExcds, &icmp_statistics[0].IcmpInTimeExcds, icmp_unreach, 1, &sysctl_icmp_timeexceed_time }, + { &icmp_statistics[0].IcmpOutTimeExcds, &icmp_statistics[0].IcmpInTimeExcds, icmp_unreach, 1 }, /* PARAMETER PROBLEM (12) */ - { &icmp_statistics[0].IcmpOutParmProbs, &icmp_statistics[0].IcmpInParmProbs, icmp_unreach, 1, &sysctl_icmp_paramprob_time }, + { &icmp_statistics[0].IcmpOutParmProbs, &icmp_statistics[0].IcmpInParmProbs, icmp_unreach, 1 }, /* TIMESTAMP (13) */ - { &icmp_statistics[0].IcmpOutTimestamps, &icmp_statistics[0].IcmpInTimestamps, icmp_timestamp, 0, }, + { &icmp_statistics[0].IcmpOutTimestamps, &icmp_statistics[0].IcmpInTimestamps, icmp_timestamp, 0 }, /* TIMESTAMP REPLY (14) */ - { &icmp_statistics[0].IcmpOutTimestampReps, &icmp_statistics[0].IcmpInTimestampReps, icmp_discard, 0, }, + { &icmp_statistics[0].IcmpOutTimestampReps, &icmp_statistics[0].IcmpInTimestampReps, icmp_discard, 0 }, /* INFO (15) */ - { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0, }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0 }, /* INFO REPLY (16) */ - { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0, }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0 }, /* ADDR MASK (17) */ - { &icmp_statistics[0].IcmpOutAddrMasks, &icmp_statistics[0].IcmpInAddrMasks, icmp_address, 0, }, + { &icmp_statistics[0].IcmpOutAddrMasks, &icmp_statistics[0].IcmpInAddrMasks, icmp_address, 0 }, /* ADDR MASK REPLY (18) */ - { &icmp_statistics[0].IcmpOutAddrMaskReps, &icmp_statistics[0].IcmpInAddrMaskReps, icmp_address_reply, 0, } + { &icmp_statistics[0].IcmpOutAddrMaskReps, &icmp_statistics[0].IcmpInAddrMaskReps, icmp_address_reply, 0 } }; void __init icmp_init(struct net_proto_family *ops) diff -ur linux-sane/net/ipv4/sysctl_net_ipv4.c linux/net/ipv4/sysctl_net_ipv4.c --- linux-sane/net/ipv4/sysctl_net_ipv4.c Mon Mar 26 04:14:25 2001 +++ linux/net/ipv4/sysctl_net_ipv4.c Fri Aug 3 12:44:28 2001 @@ -32,10 +32,8 @@ extern int sysctl_ip_dynaddr; /* From icmp.c */ -extern int sysctl_icmp_destunreach_time; -extern int sysctl_icmp_timeexceed_time; -extern int sysctl_icmp_paramprob_time; -extern int sysctl_icmp_echoreply_time; +extern int sysctl_icmp_ratelimit; +extern int sysctl_icmp_ratemask; /* From igmp.c */ extern int sysctl_igmp_max_memberships; @@ -178,14 +176,6 @@ {NET_IPV4_ICMP_IGNORE_BOGUS_ERROR_RESPONSES, "icmp_ignore_bogus_error_responses", &sysctl_icmp_ignore_bogus_error_responses, sizeof(int), 0644, NULL, &proc_dointvec}, - {NET_IPV4_ICMP_DESTUNREACH_RATE, "icmp_destunreach_rate", - &sysctl_icmp_destunreach_time, sizeof(int), 0644, NULL, &proc_dointvec}, - {NET_IPV4_ICMP_TIMEEXCEED_RATE, "icmp_timeexceed_rate", - &sysctl_icmp_timeexceed_time, sizeof(int), 0644, NULL, &proc_dointvec}, - {NET_IPV4_ICMP_PARAMPROB_RATE, "icmp_paramprob_rate", - &sysctl_icmp_paramprob_time, sizeof(int), 0644, NULL, &proc_dointvec}, - {NET_IPV4_ICMP_ECHOREPLY_RATE, "icmp_echoreply_rate", - &sysctl_icmp_echoreply_time, sizeof(int), 0644, NULL, &proc_dointvec}, {NET_IPV4_ROUTE, "route", NULL, 0, 0555, ipv4_route_table}, #ifdef CONFIG_IP_MULTICAST {NET_IPV4_IGMP_MAX_MEMBERSHIPS, "igmp_max_memberships", @@ -227,6 +217,10 @@ &sysctl_tcp_app_win, sizeof(int), 0644, NULL, &proc_dointvec}, {NET_TCP_ADV_WIN_SCALE, "tcp_adv_win_scale", &sysctl_tcp_adv_win_scale, sizeof(int), 0644, NULL, &proc_dointvec}, + {NET_IPV4_ICMP_RATELIMIT, "icmp_ratelimit", + &sysctl_icmp_ratelimit, sizeof(int), 0644, NULL, &proc_dointvec}, + {NET_IPV4_ICMP_RATEMASK, "icmp_ratemask", + &sysctl_icmp_ratemask, sizeof(int), 0644, NULL, &proc_dointvec}, {0} }; --ZPt4rx8FFjLCG7dd-- From owner-netdev@oss.sgi.com Sun Aug 5 17:33:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f760XIG15003 for netdev-outgoing; Sun, 5 Aug 2001 17:33:18 -0700 Received: from vasquez.zip.com.au (vasquez.zip.com.au [203.12.97.41]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f760XHV14999 for ; Sun, 5 Aug 2001 17:33:17 -0700 Received: from zip.com.au (root@zipperii.zip.com.au [61.8.0.87]) by vasquez.zip.com.au (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id KAA22490; Mon, 6 Aug 2001 10:32:15 +1000 X-Authentication-Warning: vasquez.zip.com.au: Host root@zipperii.zip.com.au [61.8.0.87] claimed to be zip.com.au Message-ID: <3B6DE6F2.65A1A335@zip.com.au> Date: Sun, 05 Aug 2001 17:38:10 -0700 From: Andrew Morton X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.7 i686) X-Accept-Language: en MIME-Version: 1.0 To: Aswani Kumar T CC: netdev@oss.sgi.com Subject: Re: Problem with tunneling in linux..(from student of IIT Bombay) References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Aswani Kumar T wrote: > > Dear Sir, > I got ur mail id from ALan COX... the following is my problem..hope u will > help me. > i am doing project in the field of active networking. I am using linux box > as my router. I have to manually tunnel the packet, that is after > receiving an ip-packet through ip_rcv() , i have add a header to that > packet with different > dest ip address. Have you looked at net/ipv4/ipip.c:ipip_tunnel_xmit()? From owner-netdev@oss.sgi.com Sun Aug 5 21:41:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f764fhg24869 for netdev-outgoing; Sun, 5 Aug 2001 21:41:43 -0700 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f764ffV24866 for ; Sun, 5 Aug 2001 21:41:41 -0700 Received: (qmail 28961 invoked from network); 6 Aug 2001 04:41:34 -0000 Received: from pd950fce5.dip.t-dialin.net (HELO gate.muc.bieringer.de) (217.80.252.229) by mail.bieringer.de with SMTP; 6 Aug 2001 04:41:34 -0000 Date: Mon, 06 Aug 2001 06:41:50 +0200 From: Peter Bieringer To: clemens cc: netdev@oss.sgi.com Subject: Re: [PATCH] Re: [PATCH] global icmp rate limiting Message-ID: <50230000.997072910@localhost> In-Reply-To: <20010805222252.A4012@ghanima.endorphin.org> References: <20010803134206.A653@ghanima.endorphin.org> <20010802162214.O1612@obroa-skai.gnumonks.org> <20010805222252.A4012@ghanima.endorphin.org> X-Mailer: Mulberry/2.1.0b3 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk --On Sunday, August 05, 2001 10:22:52 PM +0200 clemens wrote: > On Thu, Aug 02, 2001 at 04:22:15PM -0300, Harald Welte wrote: > >> > this patch introduces global icmp rate limiting >> > (/proc/sys/net/ipv4/icmp_ratelimit) with the ability to arbitary >> > rate limit or unlimit certain icmp types >> > (/proc/sys/net/ipv4/icmp_ratemask, but you better have a look at >> > icmp.c before changing this). >> >> If somebody is going to change the icmp rate limiting code, please >> take into consideration fixing the kernel/userspace interface as >> well. > > you're absolutly right. > please consider patch attached. > > unit for icmp_ratelimit will be [packets/second]. > HZ multiplication is cached in icmpv4_xrlim_allow. > > networking code maintainers please take note of this patch. i > haven't got any response by official maintainers. Please, is it possible that there is a "signal" somewhere in the /proc-FS to recognize, whether HZ or [packets/second] are used. Because of firewall scripts can be made able to recognize it and change their values the'll applied on rates using sysctl or something else. Otherwise, script applies e.g. "100" which isn't good anymore using new (and really better) unit. Peter From owner-netdev@oss.sgi.com Mon Aug 6 05:56:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f76CucN22427 for netdev-outgoing; Mon, 6 Aug 2001 05:56:38 -0700 Received: from do-smtp.nortel-dasa.de (do-smtp.nortel-dasa.de [193.141.241.40]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f76CuVV22406 for ; Mon, 6 Aug 2001 05:56:31 -0700 Received: from satcomnt11.ndsatcom.com (satcomcom [131.147.44.70]) by do-smtp.nortel-dasa.de (8.9.3+Sun/8.9.3) with ESMTP id OAA08833 for ; Mon, 6 Aug 2001 14:56:23 +0200 (MET DST) From: Ece.Aksu@NDSatcom.com Received: by satcomnt11.ndsatcom.com with Internet Mail Service (5.5.2653.19) id <31Z1PLHR>; Mon, 6 Aug 2001 14:54:49 +0200 Message-ID: To: netdev@oss.sgi.com Subject: sync serial comm Date: Mon, 6 Aug 2001 14:54:45 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk hello, I am writing to you to ask for your help about syncronous serial communications and cards. As I dont have much knowledge about this kind of hardware and communications, I am a bit confused. first of all, what I want to do is to connect two suse linux 6.2 (kernel 2.2.14) PCs using 2 sattelite modems between them. PC-satmodem-satmodem-PC I had made this configuration before, using com2 port,slip and ppp protocols and it worked.but the speed was not what I had desired.so I wanted to try it usin sync. serial cards. I have 2 sealevel ACB-IV 2 port ssc. sealevel doesnot support linux drivers but I think there are generic drivers in linux kernel 2.2.14 like alan cox's driver for these cards. however, so far, I coulnot succeed installing this card's driver and couldnt configure the interfaces. what I did step by step is written below.I dont understand what is wrong or missing. #insmod syncppp (is this hdlc driver??) #insmod z85230 #insmod sealevel when I type ifconfig afterwards, I see hdlc0 and hdlc1 interfaces but they look a little strange to me. because link encaps is unsepc. and hwadress is 0000...?? and also, though it is the same configuration on both PCs, what ifconfig shows is very different. I didnt understand what are these recieved packets and transfer packet values? I havent sent anything yet? ifconfig screen shots are attached. later, I type: ifconfig hdlc0 PC1ipaddr pointopoint PC2ipaddr up ip addresses are assigned but there is no connection when I try to ping! now, how can I configure a sync serial card? maybe I better buy a new card which is better for my purpose but which one? I need documentation about sync communication configuration but I couldnt find so far? do I need X.25, V.35? can I connect PCs back to back to check if the cards work?and then connect to the modems? what kind of cable should I use? does the cross cable that I used to connect com2 ports using slip protocol works for sync card interface also? another important thing is that all sync cards are used for connection to a server via a router.but I want to use it to connect PCs via sat modems. what should I do? I will be so glad if you help.I cannot get enough support from anybody and I cannot find enough documentation. regards, ece. From owner-netdev@oss.sgi.com Mon Aug 6 10:48:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f76HmM518893 for netdev-outgoing; Mon, 6 Aug 2001 10:48:22 -0700 Received: from dr.ea.ms (root@dr.ea.ms [24.169.103.185]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f76HmLV18890 for ; Mon, 6 Aug 2001 10:48:21 -0700 Received: (from dr.ea.ms) by dr.ea.ms (8.8.6/8.6.9) id NAA04242 for netdev@oss.sgi.com; Mon, 6 Aug 2001 13:48:33 -0400 Message-Id: <200108061748.NAA04242@dr.ea.ms> Subject: 2.4.x kernel bug To: netdev@oss.sgi.com Date: Mon, 6 Aug 2001 13:48:33 -0400 (EDT) From: "AppleUni Author" X-Mailer: ELM [version 2.4 PL25 PGP3 *ALPHA*] Content-Type: text Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello, Alan Cox suggested I email you about a bug that I have found which has manifested it'sself within an application that I use. the following code reports how large the queue is in 2.2.x, but NOT in 2.4 ioctl(sock, TIOCOUTQ, &tcpbufsize); on a 2.2.x kernel it tells me how large the tcp outbound (and inbound) queue size is, but not in 2.4. This is breaking applications that scale depending on how full and empty the queue is. Since it breaks at that point, I've not even bothered to check the other functions that check the read queues. Yours, -- http://dr.ea.ms http://startrek.off.net http://CPM.doa.org ________________ -= Andrew Kroll =---------------\ /----------------------------- Tired of Bill Gates? LL \ /Think Bill is getting MY CASH?? Win '95 sucks! DOS is OK. LL II NNNNN UU UU XX XX Linux! A free Un*x Want to turn your PC intoLL II NN NN UU UU XXX clone for 386/486/P5's a powerful workstation? LLLLL II NN NN UUUUU XX XX FINALLY A -=REAL=- OS! -------------------------------------\ /--= =------------- \ / \/ !FREE! At your favorite FTP site! From owner-netdev@oss.sgi.com Mon Aug 6 10:55:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f76HtMw19111 for netdev-outgoing; Mon, 6 Aug 2001 10:55:22 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f76HtLV19108 for ; Mon, 6 Aug 2001 10:55:21 -0700 Received: by colin.muc.de id <140561-1>; Mon, 6 Aug 2001 19:54:44 +0200 Message-ID: <20010806195435.14868@colin.muc.de> Date: Mon, 6 Aug 2001 19:54:35 +0200 From: Andi Kleen To: AppleUni Author Cc: netdev@oss.sgi.com Subject: Re: 2.4.x kernel bug References: <200108061748.NAA04242@dr.ea.ms> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <200108061748.NAA04242@dr.ea.ms>; from AppleUni Author on Mon, Aug 06, 2001 at 07:48:33PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Mon, Aug 06, 2001 at 07:48:33PM +0200, AppleUni Author wrote: > the following code reports how large the queue is in 2.2.x, but NOT in 2.4 > > ioctl(sock, TIOCOUTQ, &tcpbufsize); > > on a 2.2.x kernel it tells me how large the tcp outbound (and inbound) queue > size is, but not in 2.4. This is breaking applications that scale depending > on how full and empty the queue is. It doesn't tell you that on 2.2. It tells you there how many bytes are still free in the send queue (send buffer size - allocated bytes including metadata), which was clearly a bug. 2.4 instead tells you how many bytes are unacked which makes a lot more sense. If the connection is not established that number is zero. -Andi From owner-netdev@oss.sgi.com Mon Aug 6 17:37:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f770bPu03418 for netdev-outgoing; Mon, 6 Aug 2001 17:37:25 -0700 Received: from heron.ucsd.edu (heron.ucsd.edu [132.239.95.163]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f770bHV03405 for ; Mon, 6 Aug 2001 17:37:17 -0700 Received: from cs.ucsd.edu (localhost.localdomain [127.0.0.1]) by heron.ucsd.edu (Postfix) with ESMTP id 314345E900; Mon, 6 Aug 2001 17:30:18 -0700 (PDT) Message-ID: <3B6F369A.A546140D@cs.ucsd.edu> Date: Mon, 06 Aug 2001 17:30:18 -0700 From: Federico David Sacerdoti Organization: UCSD X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.3-tcphealth i686) X-Accept-Language: en MIME-Version: 1.0 To: ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org, "David S. Miller" , "Perches, Joe" Subject: A TCP monitoring /proc/net file Content-Type: multipart/mixed; boundary="------------1917B4CC359163C6AFA6D60A" Sender: owner-netdev@oss.sgi.com Precedence: bulk This is a multi-part message in MIME format. --------------1917B4CC359163C6AFA6D60A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit After a long delay due to my research schedule here at UCSD, I have made a patch that creates the /proc/net/tcphealth file. This file monitors all established TCP connections and reports some health metrics on them. See http://heron.ucsd.edu/tcphealth.php for more information. This patch is for kernel version 2.4.3 and was made with the command 'diff -Naur pristine-linux-2.4.3 linux-2.4.3'. Patches for other versions are available upon request. I believe this patch would make a useful addition to the Linux kernel. Below are the correspondance with all of you this past spring, with the most recent first. Sincerely, Federico David Sacerdoti, UCSD CSE department, San Diego CA ------------- Hello! > Would a patch for 2.4.2 be helpful? Yes, of course. The tool is useful not depending on any curcumstances. Alexey ------------- On Fri, Mar 23, 2001 at 09:19:14PM +0100, Federico David Sacerdoti wrote: > The external monitoring made possible by the /proc/net/tcphealth is > interesting because the SRTT is proportional to the speed of one's > network connection, and duplicate acks indicate that packets are being > lost (or reordered, less likely) somewhere in the network. 2.4 has a special state machine to detect reordering when the connection supports timestamps. I guess some long term statistics (currently TCP_INFO only dumps current state) would be useful too, but it's David's call if he want to put in the few cycles that'll cost (probably only in slow paths anyways) I guess it would be better if you would put it into the existing TCP_INFO framework, perhaps with an additional /proc frontend to TCP_INFO. Having two ways to do a similar thing is not good. -Andi -------------- Date: Fri, 23 Mar 2001 12:19:14 -0800 From: Federico David Sacerdoti The external monitoring made possible by the /proc/net/tcphealth is interesting because the SRTT is proportional to the speed of one's network connection, and duplicate acks indicate that packets are being lost (or reordered, less likely) somewhere in the network. These are things we want to know about a connection we are trying to communicate on - its individual latency and how often packets are being lost over it. Would a patch for 2.4.2 be helpful? -------------- On Fri, Mar 23, 2001 at 01:57:11AM +0100, David S. Miller wrote: > > See the TCP_INFO socket option we added to 2.4.x Sadly TCP_INFO can not be used for external monitoring currently (at least not without very bad and racy hacks to allow /proc to open sockets in /proc/pid/fd) -Andi -------------- Date: Thu, 22 Mar 2001 16:53:44 -0800 From: Federico David Sacerdoti For a graduate network class at UCSD I implemented some TCP performance monitors in the Linux TCP stack (ipv4). I have added a file to the proc filesystem (/proc/net/tcphealth) that monitors the "health" of all tcp connections on a machine. The tcphealth file tracks smoothed Round-Trip-Times, duplicate acks, and duplicate incoming packets for each established tcp connection. I believe that there is lots of good monitoring information that can be gleaned from this file. It works on all TCP connections without the cooperation of the remote server. In the code I have taken care not to disrupt the fast path in tcp_rcv_established(), and generally have tried to step lightly. I have patched kernel versions 2.2.14 and 2.2.16, and tested it on an ix86, a SUN, and a PowerPC. If there is any interest, I will submit the patch to the appropriate maintainer. --------------1917B4CC359163C6AFA6D60A Content-Type: text/plain; charset=us-ascii; name="patch-2.4.3-tcphealth" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch-2.4.3-tcphealth" diff -Naur pristine-linux-2.4.3/Makefile linux-2.4.3/Makefile --- pristine-linux-2.4.3/Makefile Thu Aug 2 15:46:19 2001 +++ linux-2.4.3/Makefile Thu Aug 2 16:06:53 2001 @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 4 SUBLEVEL = 3 -EXTRAVERSION = +EXTRAVERSION = -tcphealth KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION) diff -Naur pristine-linux-2.4.3/include/net/sock.h linux-2.4.3/include/net/sock.h --- pristine-linux-2.4.3/include/net/sock.h Thu Aug 2 15:47:08 2001 +++ linux-2.4.3/include/net/sock.h Thu Aug 2 16:02:36 2001 @@ -24,6 +24,7 @@ * Alan Cox : Eliminate low level recv/recvfrom * David S. Miller : New socket lookup architecture. * Steve Whitehouse: Default routines for sock_ops + * Federico David Sacerdoti : Added TCP health counters. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -272,7 +273,8 @@ unsigned long timeout; /* Currently scheduled timeout */ __u32 lrcvtime; /* timestamp of last received data packet*/ __u16 last_seg_size; /* Size of last incoming segment */ __u16 rcv_mss; /* MSS used for delayed ACK decisions */ + __u32 last_ack_sent; /* sequence number of the last ack we sent. */ } ack; /* Data for direct copy to user */ @@ -411,9 +413,18 @@ unsigned int keepalive_time; /* time before keep alive takes place */ unsigned int keepalive_intvl; /* time interval between keep alive probes */ int linger2; + + /* + * TCP health monitoring counters. + */ + __u32 dup_acks_sent; + __u32 dup_pkts_recv; + __u32 acks_sent; + __u32 pkts_recv; + }; - + /* * This structure really needs to be cleaned up. * Most of it is for TCP, and not used by any of diff -Naur pristine-linux-2.4.3/net/ipv4/af_inet.c linux-2.4.3/net/ipv4/af_inet.c --- pristine-linux-2.4.3/net/ipv4/af_inet.c Thu Aug 2 15:47:15 2001 +++ linux-2.4.3/net/ipv4/af_inet.c Thu Aug 2 16:02:36 2001 @@ -54,6 +54,7 @@ * Some other random speedups. * Cyrus Durgin : Cleaned up file for kmod hacks. * Andi Kleen : Fix inet_stream_connect TCP race. + * Federico David Sacerdoti : Added tcphealth proc file * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -128,6 +129,7 @@ extern int afinet_get_info(char *, char **, off_t, int); extern int tcp_get_info(char *, char **, off_t, int); extern int udp_get_info(char *, char **, off_t, int); +extern int tcp_health_get_info(char *, char **, off_t, int); extern void ip_mc_drop_socket(struct sock *sk); #ifdef CONFIG_DLCI @@ -474,7 +476,7 @@ * (ie. your servers still start up even if your ISDN link * is temporarily down) */ - if (sysctl_ip_nonlocal_bind == 0 && + if (sysctl_ip_nonlocal_bind == 0 && sk->protinfo.af_inet.freebind == 0 && addr->sin_addr.s_addr != INADDR_ANY && chk_addr_ret != RTN_LOCAL && @@ -1054,6 +1056,7 @@ proc_net_create ("sockstat", 0, afinet_get_info); proc_net_create ("tcp", 0, tcp_get_info); proc_net_create ("udp", 0, udp_get_info); + proc_net_create ("tcphealth", 0, tcp_health_get_info); #endif /* CONFIG_PROC_FS */ return 0; } diff -Naur pristine-linux-2.4.3/net/ipv4/proc.c linux-2.4.3/net/ipv4/proc.c --- pristine-linux-2.4.3/net/ipv4/proc.c Thu Aug 2 15:47:16 2001 +++ linux-2.4.3/net/ipv4/proc.c Thu Aug 2 16:02:36 2001 @@ -26,6 +26,7 @@ * Andi Kleen : Add support for open_requests and * split functions for more readibility. * Andi Kleen : Add support for /proc/net/netstat + * Federico David Sacerdoti : Added support for /proc/net/tcphealth * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -155,7 +156,7 @@ if (len > length) len = length; if (len < 0) - len = 0; + len = 0; return len; } @@ -212,3 +213,97 @@ len = 0; return len; } + +/* + * Output /proc/net/tcphealth + */ +#define LINESZ 128 + +int tcp_health_get_info(char *buffer, char **start, off_t offset, int length) +{ + int len=0, i=0, num=0; + off_t pos=0, begin=0; + char tmpbuf[LINESZ+1], srcIP[32], destIP[32]; + + unsigned long dest, src, SmoothedRttEstimate, + AcksSent, DupAcksSent, PktsRecv, DupPktsRecv; + unsigned short destp, srcp; + + len = sprintf(buffer, + "TCP Health Monitoring (established connections only)\n" + " -Duplicate ACKs indicate lost/reordered packets on the connection.\n" + " -Duplicate Packets Received show you should be using SACK (rare).\n" + " -RttEst estimates how long a packet takes on a round trip over the connection.\n" + "id Local Address Remote Address RttEst(ms) AcksSent " + "DupAcksSent PktsRecv DupPktsRecv\n"); + pos=len; + + /* Loop through established TCP connections */ + local_bh_disable(); + for (i=0; i < tcp_ehash_size; i++) { + struct tcp_ehash_bucket *head = &tcp_ehash[i]; + struct sock *sk; + struct tcp_opt *tp; + + read_lock(&head->lock); + for (sk=head->chain; sk; sk=sk->next) { + if (!TCP_INET_FAMILY(sk->family)) + continue; + pos+=LINESZ; + if (pos <= offset) + continue; + + dest = ntohl(sk->daddr); + src = ntohl(sk->rcv_saddr); + destp = ntohs(sk->dport); + srcp = ntohs(sk->sport); + + tp = &(sk->tp_pinfo.af_tcp); + SmoothedRttEstimate = (tp->srtt >> 3); + AcksSent = tp->acks_sent; + DupAcksSent = tp->dup_acks_sent; + PktsRecv = tp->pkts_recv; + DupPktsRecv = tp->dup_pkts_recv; + + sprintf(srcIP, "%lu.%lu.%lu.%lu:%u", + ((src >> 24) & 0xFF), ((src >> 16) & 0xFF), ((src >> 8) & 0xFF), (src & 0xFF), + srcp); + sprintf(destIP, "%lu.%lu.%lu.%lu:%u", + ((dest >> 24) & 0xFF), ((dest >> 16) & 0xFF), ((dest >> 8) & 0xFF), (dest & 0xFF), + destp); + + sprintf(tmpbuf, "%d: %-21s %-21s " + "%8lu %8lu %8lu %8lu %8lu", + num, + srcIP, + destIP, + SmoothedRttEstimate, + AcksSent, + DupAcksSent, + PktsRecv, + DupPktsRecv + ); + + len += sprintf(buffer+len, "%-*s\n", LINESZ-1, tmpbuf); + if(pos >= offset+length) { + read_unlock(&head->lock); + goto out; + } + num++; + } + read_unlock(&head->lock); + } + +out: + local_bh_enable(); + + begin = len - (pos - offset); + *start = buffer + begin; + len -= begin; + if(len>length) + len = length; + if (len<0) + len = 0; + return len; +} + diff -Naur pristine-linux-2.4.3/net/ipv4/tcp_input.c linux-2.4.3/net/ipv4/tcp_input.c --- pristine-linux-2.4.3/net/ipv4/tcp_input.c Thu Aug 2 15:47:16 2001 +++ linux-2.4.3/net/ipv4/tcp_input.c Thu Aug 2 16:02:36 2001 @@ -60,6 +60,7 @@ * Pasi Sarolahti, * Panu Kuhlberg: Experimental audit of TCP (re)transmission * engine. Lots of bugs are found. + * Federico David Sacerdoti : Added TCP health monitoring */ #include @@ -2489,6 +2490,8 @@ } if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) { + /* Course retransmit inefficiency- this packet has been received twice. [tcphealth] */ + tp->dup_pkts_recv++; SOCK_DEBUG(sk, "ofo packet was already received \n"); __skb_unlink(skb, skb->list); __kfree_skb(skb); @@ -2584,6 +2587,10 @@ return; } + /* A packet is a "duplicate" if it contains bytes we have already received. [tcphealth] */ + if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) + tp->dup_pkts_recv++; + if (!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt)) { /* A retransmit, 2nd most common case. Force an immediate ack. */ NET_INC_STATS_BH(DelayedACKLost); @@ -3180,6 +3187,14 @@ */ tp->saw_tstamp = 0; + + /* + * Tcp health monitoring is interested in + * total per connection packet arrivals. + * There is no way to avoid putting this in the fast + * path. + */ + tp->pkts_recv++; /* pred_flags is 0xS?10 << 16 + snd_wnd * if header_predition is to be made diff -Naur pristine-linux-2.4.3/net/ipv4/tcp_output.c linux-2.4.3/net/ipv4/tcp_output.c --- pristine-linux-2.4.3/net/ipv4/tcp_output.c Thu Aug 2 15:47:16 2001 +++ linux-2.4.3/net/ipv4/tcp_output.c Thu Aug 2 16:05:54 2001 @@ -33,6 +33,7 @@ * Andrea Arcangeli: SYNACK carry ts_recent in tsecr. * Cacophonix Gaul : draft-minshall-nagle-01 * J Hadi Salim : ECN support + * Federico David Sacerdoti : Added TCP health monitoring * */ @@ -1269,9 +1270,16 @@ TCP_SKB_CB(buff)->flags = TCPCB_FLAG_ACK; TCP_SKB_CB(buff)->sacked = 0; + /* If the rcv_nxt has not advanced since sending our last ACK, this is a duplicate. [tcphealth] */ + if (tp->rcv_nxt == tp->ack.last_ack_sent) + tp->dup_acks_sent++; + /* Record the total number of acks sent on this connection [tcphealth]. */ + tp->acks_sent++; + /* Send it off, this clears delayed acks for us. */ TCP_SKB_CB(buff)->seq = TCP_SKB_CB(buff)->end_seq = tcp_acceptable_seq(sk, tp); TCP_SKB_CB(buff)->when = tcp_time_stamp; + tp->ack.last_ack_sent = tp->rcv_nxt; tcp_transmit_skb(sk, buff); } } --------------1917B4CC359163C6AFA6D60A-- From owner-netdev@oss.sgi.com Tue Aug 7 04:12:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f77BCnO07493 for netdev-outgoing; Tue, 7 Aug 2001 04:12:49 -0700 Received: from robur.slu.se (robur.slu.se [130.238.98.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f77BClV07489 for ; Tue, 7 Aug 2001 04:12:48 -0700 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id NAA18913; Tue, 7 Aug 2001 13:13:39 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15215.52579.537646.121252@robur.slu.se> Date: Tue, 7 Aug 2001 13:13:39 +0200 To: Rusty Russell Cc: Robert Olsson , jamal , Alexey Kuznetsov , netdev@oss.sgi.com Subject: Re: Linux 2.4 networking/routing slowdown In-Reply-To: References: <15211.65235.369977.774321@robur.slu.se> X-Mailer: VM 6.92 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Rusty Russell writes: > > Ah... What are you using as a traffic generator? > > Creating a new connection is expensive (but could probably be > optimized): given that usually < 1 in 10 packets is a new connection, > this seemed a reasonable optimization strategy. If you are sending > random packets, you are trying to create 1 million connections (well, > some will timeout). > > You *can* help a bit by enlarging the hash tables: try: > insmod ipchains hashsize=100000 > > You could also try sending the *same* packet 1,000,000 times, and see > if we do better there... > > Very interesting, Hello! Yes a packet generator is used and it is sending the "same" packet a million times. So there shouldn't be too much connections unless something weird is going on. If you got any idea it easy to test again. Cheers. --ro From owner-netdev@oss.sgi.com Wed Aug 8 09:45:41 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f78GjfQ22673 for netdev-outgoing; Wed, 8 Aug 2001 09:45:41 -0700 Received: from caligula (host-208-134-128-5.amicus.com [208.134.128.5]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f78GjdV22668 for ; Wed, 8 Aug 2001 09:45:39 -0700 Received: from caligula (caligula [127.0.0.1]) by caligula (Postfix) with ESMTP id B0B3D132E5 for ; Wed, 8 Aug 2001 11:45:38 -0500 (CDT) Subject: C and/or ioctl question From: Stephen Waters To: netdev@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/0.12 (Preview Release) Date: 08 Aug 2001 11:45:38 -0500 Message-Id: <997289138.21407.20.camel@caligula> Mime-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk hi, I've been studying the net-tools ifconfig implementation a bit in my efforts to better learn C and ioctl for Linux. I've noticed that the following snippet of code compiles and runs just fine with gcc-2.95.4 but gcc-3.0.1 compile returns error. CFLAGS: -Wall -g Error message in question: SIOCGIFCONF: Bad address Code Snippet: #include #include #include #include #include #include #include #include int main () { struct ifconf ifc; int skfd; skfd = socket(AF_INET, SOCK_DGRAM, 0); if (ioctl(skfd, SIOCGIFCONF, &ifc) < 0) { perror("SIOCGIFCONF"); close(skfd); return -1; } close(skfd); return 0; } Thanks for any help! -s From owner-netdev@oss.sgi.com Wed Aug 8 10:43:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f78HhSU25404 for netdev-outgoing; Wed, 8 Aug 2001 10:43:28 -0700 Received: from blackbird.intercode.com.au (blackbird.intercode.com.au [203.32.101.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f78HhPV25399 for ; Wed, 8 Aug 2001 10:43:25 -0700 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id DAA18867; Thu, 9 Aug 2001 03:43:13 +1000 X-Authentication-Warning: blackbird.intercode.com.au: jmorris owned process doing -bs Date: Thu, 9 Aug 2001 03:43:13 +1000 (EST) From: James Morris To: Stephen Waters cc: Subject: Re: C and/or ioctl question In-Reply-To: <997289138.21407.20.camel@caligula> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On 8 Aug 2001, Stephen Waters wrote: > hi, I've been studying the net-tools ifconfig implementation a bit in my > efforts to better learn C and ioctl for Linux. > > int main () { > struct ifconf ifc; > int skfd; > skfd = socket(AF_INET, SOCK_DGRAM, 0); > if (ioctl(skfd, SIOCGIFCONF, &ifc) < 0) { You need to set up the ifconf structure first and allocate enough memory in it for a payload of ifreq structures to be returned from the kernel. Have a look at if_readconf() in net-tools/lib/interface.c - James -- James Morris From owner-netdev@oss.sgi.com Thu Aug 9 07:05:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f79E5nj31093 for netdev-outgoing; Thu, 9 Aug 2001 07:05:49 -0700 Received: from caligula (host-208-134-128-5.amicus.com [208.134.128.5] (may be forged)) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f79E5mV31089 for ; Thu, 9 Aug 2001 07:05:48 -0700 Received: from caligula (caligula [127.0.0.1]) by caligula (Postfix) with ESMTP id 96A42132E5; Thu, 9 Aug 2001 09:05:47 -0500 (CDT) Subject: Re: C and/or ioctl question From: Stephen Waters To: James Morris Cc: netdev@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/0.12 (Preview Release) Date: 09 Aug 2001 09:05:47 -0500 Message-Id: <997365947.32535.5.camel@caligula> Mime-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk On 09 Aug 2001 03:43:13 +1000, James Morris wrote: > On 8 Aug 2001, Stephen Waters wrote: > > > hi, I've been studying the net-tools ifconfig implementation a bit in my > > efforts to better learn C and ioctl for Linux. > > > > > int main () { > > struct ifconf ifc; > > int skfd; > > skfd = socket(AF_INET, SOCK_DGRAM, 0); > > if (ioctl(skfd, SIOCGIFCONF, &ifc) < 0) { > > > You need to set up the ifconf structure first and allocate enough memory > in it for a payload of ifreq structures to be returned from the kernel. > > Have a look at if_readconf() in net-tools/lib/interface.c Thanks! -sw From owner-netdev@oss.sgi.com Sun Aug 12 20:27:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7D3R9628354 for netdev-outgoing; Sun, 12 Aug 2001 20:27:09 -0700 Received: from grok.yi.org (IDENT:root@[24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7D3R5j28351 for ; Sun, 12 Aug 2001 20:27:07 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f7D3Paa13990; Sun, 12 Aug 2001 20:25:39 -0700 Message-ID: <3B7748B0.C897121A@candelatech.com> Date: Sun, 12 Aug 2001 20:25:36 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7 i686) X-Accept-Language: en MIME-Version: 1.0 To: linux-net , "netdev@oss.sgi.com" Subject: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk After reading this page: http://www.hut.fi/~msisomak/diffserv.html and http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1349.html#sec-3 It appears that the IP TOS values in netinet/ip.h are incorrect: /* * Definitions for IP type of service (ip_tos) */ #define IPTOS_TOS_MASK 0x1E #define IPTOS_TOS(tos) ((tos) & IPTOS_TOS_MASK) #define IPTOS_LOWDELAY 0x10 #define IPTOS_THROUGHPUT 0x08 #define IPTOS_RELIABILITY 0x04 #define IPTOS_LOWCOST 0x02 #define IPTOS_MINCOST IPTOS_LOWCOST I think each value should be left-shifed two times to make room for the 3 precedence bits. Am I missing something?? Also, will the kernel take any 8-bit value I try to stuff in there with: setsockopt(dev_socket, SOL_IP, IP_TOS, (char*)&val, sizeof(int)) or will it normalize values according to it's internal rules? Thanks, Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Aug 13 07:14:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DEEq409076 for netdev-outgoing; Mon, 13 Aug 2001 07:14:52 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DEEmj09072 for ; Mon, 13 Aug 2001 07:14:48 -0700 Received: from localhost (IDENT:davem@pizda.ninka.net [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id HAA00995; Mon, 13 Aug 2001 07:14:36 -0700 Date: Mon, 13 Aug 2001 07:14:36 -0700 (PDT) Message-Id: <20010813.071436.74751236.davem@redhat.com> To: greearb@candelatech.com Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? From: "David S. Miller" In-Reply-To: <3B7748B0.C897121A@candelatech.com> References: <3B7748B0.C897121A@candelatech.com> X-Mailer: Mew version 2.0 on Emacs 21.0 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: Ben Greear Date: Sun, 12 Aug 2001 20:25:36 -0700 It appears that the IP TOS values in netinet/ip.h are incorrect: I'll be using the page you even quoted: http://www.hut.fi/~msisomak/diffserv.html to show that the values are correct. This page, in figure 2, shows the following bit layout: ---------------------------------------- | Precedence | Type of Service | 0 | ---------------------------------------- 8 7 6 5 4 3 2 1 0 The bit numbering above is mine, the page labels them backwards for whatever reason. #define IPTOS_TOS_MASK 0x1E #define IPTOS_TOS(tos) ((tos) & IPTOS_TOS_MASK) TOS field, bits 1 thru 5 #define IPTOS_LOWDELAY 0x10 Minimize delay, binary 1000 in TOS field. #define IPTOS_THROUGHPUT 0x08 Minimize thruput, binary 0100 in TOS field. #define IPTOS_RELIABILITY 0x04 Maximize reliability, binary 0010 in TOS field. #define IPTOS_LOWCOST 0x02 #define IPTOS_MINCOST IPTOS_LOWCOST Minimize monetary (misspelled on page) cost, binary 0001 in TOS field. Am I missing something?? I think the mislabelled bit numbering in the figure of that web page has mislead your thinking, that's all. At least, Linux and BSD's header files agree perfectly for these values :-) Also, will the kernel take any 8-bit value I try to stuff in there with: setsockopt(dev_socket, SOL_IP, IP_TOS, (char*)&val, sizeof(int)) or will it normalize values according to it's internal rules? 1) If you have any bits outside of the precidence or type of service field set, you will get -EINVAL. (basically if the lowest bit is set, it is reserved and thus invalid to pass to this sockopt) 2) If this is a TCP connection, and CONFIG_INET_ECN is enabled, the low two bits you pass will be replaced with the current ECN bit settings. The lowest two TOS byte bits are used for ECN. 3) If the precidence in the TOS value passed is greater than or equal to IPTOS_PREC_CRITIC_ECP, and the user does not uave CAP_NET_ADMIN capabilities, you will get an -EPERM failure. Basically, I just read aloud the code in net/ipv4/ip_sockglue.c which you could have just as easily have done by grepping for IPTOS under net/ipv4/*.c But I'll let you be lazy just this one time :-) Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Aug 13 08:44:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DFi1111390 for netdev-outgoing; Mon, 13 Aug 2001 08:44:01 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DFhvj11387 for ; Mon, 13 Aug 2001 08:43:57 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f7DFhqa10196; Mon, 13 Aug 2001 08:43:52 -0700 Message-ID: <3B77F5B8.B43EA7CD@candelatech.com> Date: Mon, 13 Aug 2001 08:43:52 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? References: <3B7748B0.C897121A@candelatech.com> <20010813.071436.74751236.davem@redhat.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk "David S. Miller" wrote: > > From: Ben Greear > Date: Sun, 12 Aug 2001 20:25:36 -0700 > > It appears that the IP TOS values in netinet/ip.h are > incorrect: > > I'll be using the page you even quoted: > > http://www.hut.fi/~msisomak/diffserv.html > > to show that the values are correct. This page, in figure > 2, shows the following bit layout: > > ---------------------------------------- > | Precedence | Type of Service | 0 | > ---------------------------------------- > 8 7 6 5 4 3 2 1 0 > > The bit numbering above is mine, the page labels them > backwards for whatever reason. RFC 1349 labels them differently: http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1349.html#sec-3 Also, from doing a google search on setsockopt and IP_TOS, I was able to find exactly zero pieces of source code that do this, so I may be the first one to every try it :) > > #define IPTOS_TOS_MASK 0x1E > #define IPTOS_TOS(tos) ((tos) & IPTOS_TOS_MASK) > > TOS field, bits 1 thru 5 > > #define IPTOS_LOWDELAY 0x10 > > Minimize delay, binary 1000 in TOS field. > > #define IPTOS_THROUGHPUT 0x08 > > Minimize thruput, binary 0100 in TOS field. > > #define IPTOS_RELIABILITY 0x04 > > Maximize reliability, binary 0010 in TOS field. > > #define IPTOS_LOWCOST 0x02 > #define IPTOS_MINCOST IPTOS_LOWCOST > > Minimize monetary (misspelled on page) cost, binary 0001 in TOS > field. > > Am I missing something?? > > I think the mislabelled bit numbering in the figure of that web page > has mislead your thinking, that's all. > > At least, Linux and BSD's header files agree perfectly for > these values :-) > > Also, will the kernel take any 8-bit value I try to stuff in > there with: > setsockopt(dev_socket, SOL_IP, IP_TOS, (char*)&val, sizeof(int)) > or will it normalize values according to it's internal rules? > > 1) If you have any bits outside of the precidence or > type of service field set, you will get -EINVAL. > (basically if the lowest bit is set, it is reserved > and thus invalid to pass to this sockopt) I didn't seem to be getting errors on the system call, but I should check more closely to make sure... > > 2) If this is a TCP connection, and CONFIG_INET_ECN is > enabled, the low two bits you pass will be replaced > with the current ECN bit settings. The lowest two > TOS byte bits are used for ECN. I think that #ifdef code should be changed to check for the run-time enabled-ness of ECN. Also, is there a way to turn ECN on/off for a specific socket only? > > 3) If the precidence in the TOS value passed is greater than > or equal to IPTOS_PREC_CRITIC_ECP, and the user does not > uave CAP_NET_ADMIN capabilities, you will get an -EPERM > failure. > > Basically, I just read aloud the code in net/ipv4/ip_sockglue.c > which you could have just as easily have done by grepping for > IPTOS under net/ipv4/*.c But I'll let you be lazy just this > one time :-) I eventually found it, but couldn't easily explain what I was seeing. The code generally looks correct, depending on the bit-ordering. I wasn't passing values to setsockopt in network byte order, but it doesn't look like I should: Do you agree? Thanks, Ben > > Later, > David S. Miller > davem@redhat.com -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Aug 13 08:46:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DFkgr11581 for netdev-outgoing; Mon, 13 Aug 2001 08:46:42 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DFkfj11578 for ; Mon, 13 Aug 2001 08:46:41 -0700 Received: from localhost (IDENT:davem@pizda.ninka.net [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA13671; Mon, 13 Aug 2001 08:46:39 -0700 Date: Mon, 13 Aug 2001 08:46:38 -0700 (PDT) Message-Id: <20010813.084638.41644986.davem@redhat.com> To: greearb@candelatech.com Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? From: "David S. Miller" In-Reply-To: <3B77F5B8.B43EA7CD@candelatech.com> References: <3B7748B0.C897121A@candelatech.com> <20010813.071436.74751236.davem@redhat.com> <3B77F5B8.B43EA7CD@candelatech.com> X-Mailer: Mew version 2.0 on Emacs 21.0 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: Ben Greear Date: Mon, 13 Aug 2001 08:43:52 -0700 I wasn't passing values to setsockopt in network byte order, but it doesn't look like I should: Do you agree? Ummm... "byte order" doesn't matter when you're passing in just a byte. :-) As for examples of programs which set a specific TOS, you might want to look into the sources of obscure unknown programs such as telnet et al. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Aug 13 08:48:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DFmHE11664 for netdev-outgoing; Mon, 13 Aug 2001 08:48:17 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DFmGj11661 for ; Mon, 13 Aug 2001 08:48:16 -0700 Received: from localhost (IDENT:davem@pizda.ninka.net [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id IAA13812; Mon, 13 Aug 2001 08:48:14 -0700 Date: Mon, 13 Aug 2001 08:48:14 -0700 (PDT) Message-Id: <20010813.084814.71098952.davem@redhat.com> To: greearb@candelatech.com Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? From: "David S. Miller" In-Reply-To: <3B77F5B8.B43EA7CD@candelatech.com> References: <3B7748B0.C897121A@candelatech.com> <20010813.071436.74751236.davem@redhat.com> <3B77F5B8.B43EA7CD@candelatech.com> X-Mailer: Mew version 2.0 on Emacs 21.0 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: Ben Greear Date: Mon, 13 Aug 2001 08:43:52 -0700 I think that #ifdef code should be changed to check for the run-time enabled-ness of ECN. Also, is there a way to turn ECN on/off for a specific socket only? Wrong, a host adhering to the ECN rfc designates the new meanings of these bits. In fact I would argue that the lowest two bits should be zapped out for TCP sockets even when CONFIG_INET_ECN is not set. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Aug 13 09:11:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DGBOZ12403 for netdev-outgoing; Mon, 13 Aug 2001 09:11:24 -0700 Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DGBMj12400 for ; Mon, 13 Aug 2001 09:11:22 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id 033281E6E1; Mon, 13 Aug 2001 18:11:17 +0200 (MEST) Date: Mon, 13 Aug 2001 18:11:04 +0200 From: Andi Kleen To: Ben Greear Cc: "David S. Miller" , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? Message-ID: <20010813181104.A12904@gruyere.muc.suse.de> References: <3B7748B0.C897121A@candelatech.com> <20010813.071436.74751236.davem@redhat.com> <3B77F5B8.B43EA7CD@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3B77F5B8.B43EA7CD@candelatech.com>; from greearb@candelatech.com on Mon, Aug 13, 2001 at 08:43:52AM -0700 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Mon, Aug 13, 2001 at 08:43:52AM -0700, Ben Greear wrote: > Also, from doing a google search on setsockopt and IP_TOS, I was able > to find exactly zero pieces of source code that do this, so I may be > the first one to every try it :) A search for IP_TOS on http://www.codecatalog.com will give you 54hits, and it is missing a lot of stuff. For example sshd has a long standing bug where it enables high priority TOS when $DISPLAY is set, which causes rsync over ssh to kill your lines badly when run from X. -Andi From owner-netdev@oss.sgi.com Mon Aug 13 09:13:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DGDDI12502 for netdev-outgoing; Mon, 13 Aug 2001 09:13:13 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DGDCj12499 for ; Mon, 13 Aug 2001 09:13:12 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f7DGDAa10373; Mon, 13 Aug 2001 09:13:10 -0700 Message-ID: <3B77FC96.FF58C65E@candelatech.com> Date: Mon, 13 Aug 2001 09:13:10 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? References: <3B7748B0.C897121A@candelatech.com> <20010813.071436.74751236.davem@redhat.com> <3B77F5B8.B43EA7CD@candelatech.com> <20010813.084814.71098952.davem@redhat.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk "David S. Miller" wrote: > > From: Ben Greear > Date: Mon, 13 Aug 2001 08:43:52 -0700 > > I think that #ifdef code should be changed to check for the > run-time enabled-ness of ECN. Also, is there a way to turn > ECN on/off for a specific socket only? > > Wrong, a host adhering to the ECN rfc designates the new meanings of > these bits. Fair enough, but when I specifically disable ECN through the /proc/ interface, then I should be able to set the bits as specified in 1349 or whatever. It's not critical path code, so the extra if () check is meaningless, and the flexibility would be welcomed by me :) Is there a way to specifically turn on/off ECN for a given socket, like through setsockopt? > In fact I would argue that the lowest two bits should be zapped out > for TCP sockets even when CONFIG_INET_ECN is not set. I would like to be able to simulate all kinds of RFC's for testing purposes. If you zap the two bits then it just makes my (and anyone else who wants to try out old RFC behaviour) lives harder, without helping anyone that is actively running ECN. Did you look at the different bit-ordering in RFC 1349? For what it's worth, Ethereal also seems to use the bit-ordering as described in 1349. Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Aug 13 09:17:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DGH3112684 for netdev-outgoing; Mon, 13 Aug 2001 09:17:03 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DGH2j12681 for ; Mon, 13 Aug 2001 09:17:02 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f7DGH0a10393; Mon, 13 Aug 2001 09:17:00 -0700 Message-ID: <3B77FD7C.52C34609@candelatech.com> Date: Mon, 13 Aug 2001 09:17:00 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? References: <3B7748B0.C897121A@candelatech.com> <20010813.071436.74751236.davem@redhat.com> <3B77F5B8.B43EA7CD@candelatech.com> <20010813.084638.41644986.davem@redhat.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk "David S. Miller" wrote: > > From: Ben Greear > Date: Mon, 13 Aug 2001 08:43:52 -0700 > > I wasn't passing values to setsockopt in network byte order, but > it doesn't look like I should: Do you agree? > > Ummm... "byte order" doesn't matter when you're passing in > just a byte. :-) I was passing a whole integer. I will try passing in just a single byte to see if that changes anything. Maybe an addition could be made to the ip man page to describe the different things that void* should be for different get/setsockopt calls? If there is such a thing, I couldn't find it... -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Aug 13 09:29:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DGT9t13139 for netdev-outgoing; Mon, 13 Aug 2001 09:29:09 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DGT7j13136 for ; Mon, 13 Aug 2001 09:29:07 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f7DGT2a10470; Mon, 13 Aug 2001 09:29:02 -0700 Message-ID: <3B78004E.46B23ADE@candelatech.com> Date: Mon, 13 Aug 2001 09:29:02 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: "David S. Miller" , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? References: <3B7748B0.C897121A@candelatech.com> <20010813.071436.74751236.davem@redhat.com> <3B77F5B8.B43EA7CD@candelatech.com> <20010813181104.A12904@gruyere.muc.suse.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Andi Kleen wrote: > > On Mon, Aug 13, 2001 at 08:43:52AM -0700, Ben Greear wrote: > > Also, from doing a google search on setsockopt and IP_TOS, I was able > > to find exactly zero pieces of source code that do this, so I may be > > the first one to every try it :) > > A search for IP_TOS on http://www.codecatalog.com will give you 54hits, > and it is missing a lot of stuff. For example sshd has a long standing > bug where it enables high priority TOS when $DISPLAY is set, which causes > rsync over ssh to kill your lines badly when run from X. Ahh, found an example in the mysql code: http://www.codecatalog.com/showFile.html?project=mysql&version=3.23.21-beta&fileid=193180&startline=264&endline=284&line=274 They use an integer instead of a byte and no htonl, so my code should be close to correct (or broken in the same manner, at least!)... Thanks for the search site... Ben > > -Andi -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Aug 13 16:02:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7DN2Op22925 for netdev-outgoing; Mon, 13 Aug 2001 16:02:24 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7DN2Nj22922 for ; Mon, 13 Aug 2001 16:02:23 -0700 Received: from southrelay01.raleigh.ibm.com (southrelay01.raleigh.ibm.com [9.37.3.208]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id SAA111204; Mon, 13 Aug 2001 18:00:11 -0500 Received: from w-sridhar2.des.beaverton.ibm.com (w-sridhar2.des.beaverton.ibm.com [9.47.18.20]) by southrelay01.raleigh.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f7DN2Ga60812; Mon, 13 Aug 2001 19:02:16 -0400 Date: Mon, 13 Aug 2001 16:02:14 -0700 (PDT) From: Sridhar Samudrala X-Sender: sridhar@w-sridhar2.des.sequent.com To: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: disabling tcp quickacks Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk After the 3-way handshake is completed, i am interested in getting the ack for the first request to be delayed so that it can be piggybacked with the response. I expected that setsockopt() TCP_QUICKACK option with a value of 0 will disable quickacks. This should set tp->ack.pingpong to 1 and cause the ack to be delayed. But looks like somehow pingpong value is reset to 0 and the ack is sent immediately. What is the reason for this behaviour? I noticed a couple of places where pingpong can be reset to 0, for ex. while sending a dupack or retransmission. But i am not sure why it is being reset to 0 at such an early stage of the connection. Thanks Sridhar From owner-netdev@oss.sgi.com Mon Aug 13 17:11:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7E0BuS24364 for netdev-outgoing; Mon, 13 Aug 2001 17:11:56 -0700 Received: from shell.cyberus.ca (shell.cyberus.ca [209.195.95.7]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7E0Bsj24361 for ; Mon, 13 Aug 2001 17:11:54 -0700 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id UAA02180; Mon, 13 Aug 2001 20:09:20 -0400 (EDT) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 13 Aug 2001 20:09:20 -0400 (EDT) From: jamal To: Ben Greear cc: "David S. Miller" , , Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? In-Reply-To: <3B77FC96.FF58C65E@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Mon, 13 Aug 2001, Ben Greear wrote: > "David S. Miller" wrote: > > > > From: Ben Greear > > Date: Mon, 13 Aug 2001 08:43:52 -0700 > > > > I think that #ifdef code should be changed to check for the > > run-time enabled-ness of ECN. Also, is there a way to turn > > ECN on/off for a specific socket only? > > > > Wrong, a host adhering to the ECN rfc designates the new meanings of > > these bits. > > Fair enough, but when I specifically disable ECN through the /proc/ interface, > then I should be able to set the bits as specified in 1349 or whatever. It's > not critical path code, so the extra if () check is meaningless, and the > flexibility would be welcomed by me :) um, RFC 1349 is obsoleted by RFC 2474, so those ECN bits stay. Since they have now been officially allocated to ECN. and Alexey fixed the code just fine many many moons ago. Infact i would say Linux was probably the first ever to conform to the RFC. So recheck your coordinates. If you are trying to get 802.1p to precedence/DSCP bit settings, i would strongly recommend you use the CISCO mapping. cheers, jamal From owner-netdev@oss.sgi.com Mon Aug 13 19:26:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7E2Qis27182 for netdev-outgoing; Mon, 13 Aug 2001 19:26:44 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7E2Qgj27179 for ; Mon, 13 Aug 2001 19:26:42 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f7E2QWa13482; Mon, 13 Aug 2001 19:26:37 -0700 Message-ID: <3B788C57.FE030B3F@candelatech.com> Date: Mon, 13 Aug 2001 19:26:31 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7 i686) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: "David S. Miller" , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk jamal wrote: > > On Mon, 13 Aug 2001, Ben Greear wrote: > > > "David S. Miller" wrote: > > > > > > From: Ben Greear > > > Date: Mon, 13 Aug 2001 08:43:52 -0700 > > > > > > I think that #ifdef code should be changed to check for the > > > run-time enabled-ness of ECN. Also, is there a way to turn > > > ECN on/off for a specific socket only? > > > > > > Wrong, a host adhering to the ECN rfc designates the new meanings of > > > these bits. > > > > Fair enough, but when I specifically disable ECN through the /proc/ interface, > > then I should be able to set the bits as specified in 1349 or whatever. It's > > not critical path code, so the extra if () check is meaningless, and the > > flexibility would be welcomed by me :) > > um, RFC 1349 is obsoleted by RFC 2474, so those ECN bits stay. Since > they have now been officially allocated to ECN. OK. What happens if a linux box is connected to something that is still using RFC 1349 and gets sent a packet with one of the ECN bits set? > and Alexey fixed the code just fine many many moons ago. > Infact i would say Linux was probably the first ever to conform > to the RFC. > So recheck your coordinates. > If you are trying to get 802.1p to precedence/DSCP bit settings, i would > strongly recommend you use the CISCO mapping. I haven't gotten that far yet..I'm just trying to figure out ToS and QoS in general... Ben > > cheers, > jamal -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Tue Aug 14 00:01:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7E715R00416 for netdev-outgoing; Tue, 14 Aug 2001 00:01:05 -0700 Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7E711j00410 for ; Tue, 14 Aug 2001 00:01:02 -0700 Received: from northrelay01.pok.ibm.com (northrelay01.pok.ibm.com [9.117.200.21]) by e1.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id CAA179204; Tue, 14 Aug 2001 02:58:36 -0400 Received: from gateway.beaverton.ibm.com (gateway.sequent.com [138.95.180.1]) by northrelay01.pok.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f7E70ZC81170; Tue, 14 Aug 2001 03:00:35 -0400 Received: from eng4.beaverton.ibm.com (eng4.sequent.com [138.95.7.64]) by gateway.beaverton.ibm.com (8.10.0.Beta10/8.8.5) with ESMTP id f7E6tXK25892; Mon, 13 Aug 2001 23:55:33 -0700 (PDT) Received: (from nivedita@localhost) by eng4.beaverton.ibm.com (8.8.5/8.8.5/token.aware-1.2) id XAA14160; Mon, 13 Aug 2001 23:55:32 -0700 (PDT) From: Nivedita Singhvi Message-Id: <200108140655.XAA14160@eng4.beaverton.ibm.com> Subject: Re: disabling tcp quickacks To: samudrala@us.ibm.com (Sridhar Samudrala) Date: Mon, 13 Aug 2001 23:55:31 -0700 (PDT) Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: from "Sridhar Samudrala" at Aug 13, 2001 03:02:14 PM PST X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk > After the 3-way handshake is completed, i am interested in getting > the ack for the first request to be delayed so that it can be > piggybacked with the response. > > I expected that setsockopt() TCP_QUICKACK option with a value of > 0 will disable quickacks. > This should set tp->ack.pingpong to 1 and cause the ack to be delayed. > But looks like somehow pingpong value is reset to 0 and the ack is sent > immediately. What is the reason for this behaviour? > > I noticed a couple of places where pingpong can be reset to 0, for ex. > while sending a dupack or retransmission. But i am not sure why it is > being reset to 0 at such an early stage of the connection. > > Thanks > Sridhar In tcp_rcv_synsent_state_process(), we have the following code: if (tp->write_pending || tp->defer_accept || tp->ack.pingpong) { /* Save one ACK. Data will be ready after * several ticks, if write_pending is set. * * It may be deleted, but with this feature tcpdumps * look so _wonderfully_ clever, that I was not able * to stand against the temptation 8) --ANK */ tcp_schedule_ack(tp); tp->ack.lrcvtime = tcp_time_stamp; tp->ack.ato = TCP_ATO_MIN; tcp_incr_quickack(tp); tcp_enter_quickack_mode(tp); tcp_reset_xmit_timer(sk,TCP_TIME_DACK, TCP_DELACK_MAX); discard: __kfree_skb(skb); return 0; If the client has disabled TCP_QUICKACKS via setsockopt() on this socket (i.e. tp->ack.pingpong = 1), we'll fall through to this code when completing the 3 way handshake from TCP_SYN_SENT state. However, tcp_enter_quickack_mode(tp) unconditionally resets tp->ack.pingpong to 0, of course. Subsequent acks will be quick acks, rather than delayed acks, as hoped. Or what am I missing here? Does (tp->write_pending || tp->defer_accept || !(tp->ack.pingpong)) make more sense? What was intended here? Is tp->ack.pingpong not intended to store the user choice of "dont/do quick ack" as set by TCP_QUICKACKS? We reset it (pingpong) when we receive data that fills our out of order queue, or receive out of order/window or retransmitted data, so it doesnt seem to be the case.. Any clarification here would be appreciated! On an unconnected note, why are there 2 mailing lists, linux-net and netdev? Is one deprecated, or preferred? thanks, Nivedita From owner-netdev@oss.sgi.com Tue Aug 14 09:22:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7EGM7d13827 for netdev-outgoing; Tue, 14 Aug 2001 09:22:07 -0700 Received: from e24.nc.us.ibm.com (e24.nc.us.ibm.com [32.97.136.230]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7EGM1j13823 for ; Tue, 14 Aug 2001 09:22:02 -0700 Received: from southrelay02.raleigh.ibm.com (southrelay02.raleigh.ibm.com [9.37.3.209]) by e24.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id LAA13368; Tue, 14 Aug 2001 11:19:50 -0500 Received: from d04nm106.raleigh.ibm.com (d04nm106.raleigh.ibm.com [9.67.228.133]) by southrelay02.raleigh.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f7EGKcx89034; Tue, 14 Aug 2001 12:20:39 -0400 Importance: Normal Subject: To: netdev@oss.sgi.com, davem@redhat.com Cc: chad_tindel@users.sourceforge.net X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 Message-ID: From: "Janice Girouard" Date: Tue, 14 Aug 2001 11:20:51 -0500 X-MIMETrack: Serialize by Router on D04NM106/04/M/IBM(Release 5.0.6 |December 14, 2000) at 08/14/2001 12:20:52 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk There are a number of patches out at the www.sourceforge.net/projects/bonding site for release 2.4.4 at: http://sourceforge.net/project/showfiles.php?group_id=24692&release_id=47592 These patches represent work since 9/30/2000 from various individuals for some nice improvements in the bonding.c code. A detailed list of these changes are included at the bottom of this note. I was hoping that we could receive feedback on these changes to facilate having them accepted. If you have a moment, could you take a look at this work and provide input. Thanks Janice Girouard girouard@us.ibm.com 512-838-7981 ________________________________________________________________________________ Summary of changes since 9/30/2000 2000/09/30 - Willy Tarreau - added trivial code to release a slave device. - fixed security bug (CAP_NET_ADMIN not checked) - implemented MII link monitoring to disable dead links : All MII capable slaves are checked every milliseconds (100 ms seems good). This value can be changed by passing it to insmod. A value of zero disables the monitoring (default). - fixed an infinite loop in bond_xmit_roundrobin() when there's no good slave. - made the code hopefully SMP safe 2000/10/03 - Willy Tarreau - optimized slave lists based on relevant suggestions from Thomas Davis - implemented active-backup method to obtain HA with two switches: stay as long as possible on the same active interface, while we also monitor the backup one (MII link status) because we want to know if we are able to switch at any time. ( pass "mode=1" to insmod ) - lots of stress testings because we need it to be more robust than the wires ! :-> 2000/10/09 - Willy Tarreau - added up and down delays after link state change. - optimized the slaves chaining so that when we run forward, we never repass through the bond itself, but we can find it by searching backwards. Renders the deletion more difficult, but accelerates the scan. - smarter enslaving and releasing. - finer and more robust SMP locking 2000/10/17 - Willy Tarreau - fixed two potential SMP race conditions 2000/10/18 - Willy Tarreau - small fixes to the monitoring FSM in case of zero delays 2000/11/01 - Willy Tarreau - fixed first slave not automatically used in trunk mode. 2000/11/10 : spelling of "EtherChannel" corrected. 2000/11/13 : fixed a race condition in case of concurrent accesses to ioctl(). 2000/12/16 : fixed improper usage of rtnl_exlock_nowait(). 2001/1/3 - Chad N. Tindel - The bonding driver now simulates MII status monitoring, just like a normal network device. It will show that the link is down iff every slave in the bond shows that their links are down. If at least one slave is up, the bond's MII status will appear as up. 2001/2/7 - Chad N. Tindel - Applications can now query the bond from user space to get information which may be useful. They do this by calling the BOND_INFO_QUERY ioctl. Once the app knows how many slaves are in the bond, it can call the BOND_SLAVE_INFO_QUERY ioctl to get slave specific information (# link failures, etc). See for more details. The structs of interest are ifbond and ifslave. 2001/4/5 - Chad N. Tindel - Ported to 2.4 Kernel 2001/5/2 - Jeffrey E. Mast - When a device is detached from a bond, the slave device is no longer left thinking that is has a master. 2001/5/16 - Jeffrey E. Mast - memset did not appropriately initialized the bond rw_locks. Used rwlock_init to initialize to unlocked state to prevent deadlock when first attempting a lock - Called SET_MODULE_OWNER for bond device 2001/5/17 - Tim Anderson - 2 paths for releasing for slave release; 1 through ioctl and 2) through close. Both paths need to release the same way. - the free slave in bond release is changing slave status before the free. The netdev_set_master() is intended to change slave state so it should not be done as part of the release process. - Simple rule for slave state at release: only the active in A/B and only one in the trunked case. 2001/6/01 - Tim Anderson - Now call dev_close when releasing a slave so it doesn't screw up out routing table. 2001/6/01 - Chad N. Tindel - Added /proc support for getting bond and slave information. Information is in /proc/net//info. - Changed the locking when calling bond_close to prevent deadlock. 2001/8/05 - Janice Girouard - correct problem where refcnt of slave is not incremented in bond_ioctl so the system hangs when halting. - correct locking problem when unable to malloc in bond_enslave. - added bond_xmit_xor logic. From owner-netdev@oss.sgi.com Tue Aug 14 10:45:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7EHjJn15399 for netdev-outgoing; Tue, 14 Aug 2001 10:45:19 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7EHj5j15394 for ; Tue, 14 Aug 2001 10:45:05 -0700 Received: from southrelay02.raleigh.ibm.com (southrelay02.raleigh.ibm.com [9.37.3.209]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id MAA135340; Tue, 14 Aug 2001 12:42:40 -0500 Received: from d04nm106.raleigh.ibm.com (d04nm106.raleigh.ibm.com [9.67.228.133]) by southrelay02.raleigh.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f7EHiax87386; Tue, 14 Aug 2001 13:44:36 -0400 Importance: Normal Subject: requesting input on changes to bonding.c To: netdev@oss.sgi.com, davem@redhat.com X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 Message-ID: From: "Janice Girouard" Date: Tue, 14 Aug 2001 12:44:48 -0500 X-MIMETrack: Serialize by Router on D04NM106/04/M/IBM(Release 5.0.6 |December 14, 2000) at 08/14/2001 01:44:50 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk There are a number of patches out at the www.sourceforge.net/projects/bonding site for release 2.4.4 at: http://sourceforge.net/project/showfiles.php?group_id=24692&release_id=47592 These patches represent work since 9/30/2000 from various individuals for some nice improvements in the bonding.c code. A detailed list of these changes are included at the bottom of this note. I was hoping that we could receive feedback on these changes to facilate having them accepted. If you have a moment, could you take a look at this work and provide input. Thanks Janice Girouard girouard@us.ibm.com 512-838-7981 ________________________________________________________________________________ Summary of changes since 9/30/2000 2000/09/30 - Willy Tarreau - added trivial code to release a slave device. - fixed security bug (CAP_NET_ADMIN not checked) - implemented MII link monitoring to disable dead links : All MII capable slaves are checked every milliseconds (100 ms seems good). This value can be changed by passing it to insmod. A value of zero disables the monitoring (default). - fixed an infinite loop in bond_xmit_roundrobin() when there's no good slave. - made the code hopefully SMP safe 2000/10/03 - Willy Tarreau - optimized slave lists based on relevant suggestions from Thomas Davis - implemented active-backup method to obtain HA with two switches: stay as long as possible on the same active interface, while we also monitor the backup one (MII link status) because we want to know if we are able to switch at any time. ( pass "mode=1" to insmod ) - lots of stress testings because we need it to be more robust than the wires ! :-> 2000/10/09 - Willy Tarreau - added up and down delays after link state change. - optimized the slaves chaining so that when we run forward, we never repass through the bond itself, but we can find it by searching backwards. Renders the deletion more difficult, but accelerates the scan. - smarter enslaving and releasing. - finer and more robust SMP locking 2000/10/17 - Willy Tarreau - fixed two potential SMP race conditions 2000/10/18 - Willy Tarreau - small fixes to the monitoring FSM in case of zero delays 2000/11/01 - Willy Tarreau - fixed first slave not automatically used in trunk mode. 2000/11/10 : spelling of "EtherChannel" corrected. 2000/11/13 : fixed a race condition in case of concurrent accesses to ioctl(). 2000/12/16 : fixed improper usage of rtnl_exlock_nowait(). 2001/1/3 - Chad N. Tindel - The bonding driver now simulates MII status monitoring, just like a normal network device. It will show that the link is down iff every slave in the bond shows that their links are down. If at least one slave is up, the bond's MII status will appear as up. 2001/2/7 - Chad N. Tindel - Applications can now query the bond from user space to get information which may be useful. They do this by calling the BOND_INFO_QUERY ioctl. Once the app knows how many slaves are in the bond, it can call the BOND_SLAVE_INFO_QUERY ioctl to get slave specific information (# link failures, etc). See for more details. The structs of interest are ifbond and ifslave. 2001/4/5 - Chad N. Tindel - Ported to 2.4 Kernel 2001/5/2 - Jeffrey E. Mast - When a device is detached from a bond, the slave device is no longer left thinking that is has a master. 2001/5/16 - Jeffrey E. Mast - memset did not appropriately initialized the bond rw_locks. Used rwlock_init to initialize to unlocked state to prevent deadlock when first attempting a lock - Called SET_MODULE_OWNER for bond device 2001/5/17 - Tim Anderson - 2 paths for releasing for slave release; 1 through ioctl and 2) through close. Both paths need to release the same way. - the free slave in bond release is changing slave status before the free. The netdev_set_master() is intended to change slave state so it should not be done as part of the release process. - Simple rule for slave state at release: only the active in A/B and only one in the trunked case. 2001/6/01 - Tim Anderson - Now call dev_close when releasing a slave so it doesn't screw up out routing table. 2001/6/01 - Chad N. Tindel - Added /proc support for getting bond and slave information. Information is in /proc/net//info. - Changed the locking when calling bond_close to prevent deadlock. 2001/8/05 - Janice Girouard - correct problem where refcnt of slave is not incremented in bond_ioctl so the system hangs when halting. - correct locking problem when unable to malloc in bond_enslave. - added bond_xmit_xor logic. From owner-netdev@oss.sgi.com Tue Aug 14 10:51:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7EHpXv15635 for netdev-outgoing; Tue, 14 Aug 2001 10:51:33 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7EHncj15616 for ; Tue, 14 Aug 2001 10:49:38 -0700 Received: from southrelay02.raleigh.ibm.com (southrelay02.raleigh.ibm.com [9.37.3.209]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id MAA144340; Tue, 14 Aug 2001 12:46:04 -0500 Received: from w-sridhar2.des.beaverton.ibm.com (w-sridhar2.des.beaverton.ibm.com [9.47.18.20]) by southrelay02.raleigh.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f7EHjOx269950; Tue, 14 Aug 2001 13:45:24 -0400 Date: Tue, 14 Aug 2001 10:45:24 -0700 (PDT) From: Sridhar Samudrala X-Sender: sridhar@w-sridhar2.des.sequent.com To: Nivedita Singhvi cc: Sridhar Samudrala , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: disabling tcp quickacks In-Reply-To: <200108140655.XAA14160@eng4.beaverton.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk I was more interested from a server point of view. After looking into the code some more i found that tcp_listen_start() is calling tcp_delack_init() which is zeroing the tp->ack structure that includes the pingpong field. A workaround for this problem is to call TCP_QUICKACK setsockopt() after the call to listen(). There may be some issues with the code that nivedita listed. * tcp_incr_quickack() gets called twice.(tcp_enter_quickack_mode() also calls this routine) * we enter quickack mode and reset pingpong even when user has disabled quickacks by setting pingpong to 1. * Can this code be replaced with a simple call to tcp_ack_snd_check()? Thanks Sridhar On Mon, 13 Aug 2001, Nivedita Singhvi wrote: > > After the 3-way handshake is completed, i am interested in getting > > the ack for the first request to be delayed so that it can be > > piggybacked with the response. > > > > I expected that setsockopt() TCP_QUICKACK option with a value of > > 0 will disable quickacks. > > This should set tp->ack.pingpong to 1 and cause the ack to be delayed. > > But looks like somehow pingpong value is reset to 0 and the ack is sent > > immediately. What is the reason for this behaviour? > > > > I noticed a couple of places where pingpong can be reset to 0, for ex. > > while sending a dupack or retransmission. But i am not sure why it is > > being reset to 0 at such an early stage of the connection. > > > > Thanks > > Sridhar > > In tcp_rcv_synsent_state_process(), we have the following code: > > if (tp->write_pending || tp->defer_accept || tp->ack.pingpong) { > /* Save one ACK. Data will be ready after > * several ticks, if write_pending is set. > * > * It may be deleted, but with this feature tcpdumps > * look so _wonderfully_ clever, that I was not able > * to stand against the temptation 8) --ANK > */ > tcp_schedule_ack(tp); > tp->ack.lrcvtime = tcp_time_stamp; > tp->ack.ato = TCP_ATO_MIN; > tcp_incr_quickack(tp); > tcp_enter_quickack_mode(tp); > tcp_reset_xmit_timer(sk,TCP_TIME_DACK, TCP_DELACK_MAX); > discard: > __kfree_skb(skb); > return 0; > > If the client has disabled TCP_QUICKACKS via setsockopt() > on this socket (i.e. tp->ack.pingpong = 1), we'll fall > through to this code when completing the 3 way handshake > from TCP_SYN_SENT state. However, tcp_enter_quickack_mode(tp) > unconditionally resets tp->ack.pingpong to 0, of course. > Subsequent acks will be quick acks, rather than delayed > acks, as hoped. Or what am I missing here? > > Does > (tp->write_pending || tp->defer_accept || !(tp->ack.pingpong)) > > make more sense? What was intended here? > > Is tp->ack.pingpong not intended to store the user choice > of "dont/do quick ack" as set by TCP_QUICKACKS? We reset > it (pingpong) when we receive data that fills our > out of order queue, or receive out of order/window or retransmitted > data, so it doesnt seem to be the case.. > > Any clarification here would be appreciated! > > On an unconnected note, why are there 2 mailing lists, linux-net > and netdev? Is one deprecated, or preferred? > > thanks, > Nivedita > > > From owner-netdev@oss.sgi.com Tue Aug 14 11:14:41 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7EIEfW16159 for netdev-outgoing; Tue, 14 Aug 2001 11:14:41 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7EIEcj16152 for ; Tue, 14 Aug 2001 11:14:39 -0700 Received: by colin.muc.de id <140554-3>; Tue, 14 Aug 2001 20:15:02 +0200 Message-ID: <20010814201459.01156@colin.muc.de> Date: Tue, 14 Aug 2001 20:14:59 +0200 From: Andi Kleen To: Janice Girouard Cc: netdev@oss.sgi.com, davem@redhat.com, chad_tindel@users.sourceforge.net Subject: Re: your mail References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from Janice Girouard on Tue, Aug 14, 2001 at 06:20:51PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, Aug 14, 2001 at 06:20:51PM +0200, Janice Girouard wrote: > There are a number of patches out at the > www.sourceforge.net/projects/bonding > site for release 2.4.4 at: > > http://sourceforge.net/project/showfiles.php?group_id=24692&release_id=47592 > > > These patches represent work since 9/30/2000 from various individuals for > some nice improvements in the bonding.c code. A detailed list of these > changes are included at the bottom of this note. > > I was hoping that we could receive feedback on these changes to facilate > having them accepted. If you have a moment, could you take a look at this > work and provide input. One big problem in my opinion is how the MII monitoring is implemented. It uses ioctls that are marked for removal already in vger and hardcodes data structures in an an ugly and non portable way. I think it should use new device functions instead. In addition it is a bit useless in my opinion. MII monitoring alone is never enough to assure HA, because there can be lots of other reasons why the other host can go belly up without losing the ethernet link (e.g. a software crash). For these an higher level heartbeat is required anyways (or at least the neighbour states in the kernel should be used which maintain similar information at least upto L2). I don't see any interface for such an higher level heartbeat, and if it exists the MII monitoring is not really needed anymore. The other stuff doesn't look too bad. Of course there is the fundamental problem of the bounding device that it reorders packets when more than a single interface is used in parallel and therefore kills performance in most network protocols. If you care about that you should use multipath routing instead (which has no [useless] mii monitoring, but an easy to use interface for higher level heartbeat) -Andi (who thinks multipath routing is superior to bounding devices) From owner-netdev@oss.sgi.com Tue Aug 14 13:04:31 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7EK4V119207 for netdev-outgoing; Tue, 14 Aug 2001 13:04:31 -0700 Received: from vaio.greennet (tcp50.ens.ornl.gov [128.219.176.50]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7EK4Oj19200 for ; Tue, 14 Aug 2001 13:04:24 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id PAA09253; Tue, 14 Aug 2001 15:59:21 -0400 Date: Tue, 14 Aug 2001 15:59:21 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Andi Kleen cc: Janice Girouard , netdev@oss.sgi.com Subject: Re: your mail In-Reply-To: <20010814201459.01156@colin.muc.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, 14 Aug 2001, Andi Kleen wrote: > One big problem in my opinion is how the MII monitoring is implemented. ... > It uses ioctls that are marked for removal already in vger.. The MII ioctls are not marked for removal. Instead something far more destructive was done: the ioctl constant was changed without changing the name. This breaks both forward and backwards compatibility. That's a very nasty way to get rid of an interface. (Or should I go with the other end of the malice/incompetence tradeoff..) > In addition it is a bit useless in my opinion. MII monitoring alone > is never enough to assure HA, because there can be lots of other reasons why > the other host can go belly up without losing the ethernet link You should be precise: monitoring link beat (generally, not just MII management data) specifically) covers only a small percentage of network problems. For instance machines crash and switches / routers usually fail while still generating link beat. The MII ioctls provide information about the physical link layer, so you can diagnose if the failure was due to a local cable problem. Some transceivers will even report approximately how far away a cable break/kink is. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From owner-netdev@oss.sgi.com Tue Aug 14 16:15:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7ENFjc25074 for netdev-outgoing; Tue, 14 Aug 2001 16:15:45 -0700 Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7ENFhj25071 for ; Tue, 14 Aug 2001 16:15:43 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id TAA54876; Tue, 14 Aug 2001 19:13:35 -0400 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7ENFf4109488; Tue, 14 Aug 2001 17:15:41 -0600 From: "David Stevens" Importance: Normal Subject: IPv6 multicast address leak (2.4.*) To: linux-net@vger.kernel.org, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: Date: Tue, 14 Aug 2001 16:19:39 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.6 |December 14, 2000) at 08/14/2001 05:19:40 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk When joining a multicast group with ifindex == 0, we get to ipv6_sock_mc_join() which uses the routing table to find an interface when the index doesn't specify one. The code also saves the interface index (as passed by the user) in mc_list. When leaving a multicast group, we call ipv6_sock_mc_drop(), which has this code > if ((dev = dev_get_by_index(mc_lst->ifindex)) != NULL) { > ipv6_dev_mc_dec(dev, &mc_lst->addr); > dev_put(dev); > } In the case where ifindex passed was 0, this won't find the interface we added the multicast address to, won't remove the reference, etc. +-DLS Fix: In ipv6_sock_mc_join(), set the mc_list index based on the device we actually use: diff -urN linux/net/ipv6/mcast.c linux.NEW/net/ipv6/mcast.c --- linux/net/ipv6/mcast.c Thu Apr 26 22:17:26 2001 +++ linux.NEW/net/ipv6/mcast.c Tue Aug 14 17:01:42 2001 @@ -90,7 +90,6 @@ mc_lst->next = NULL; memcpy(&mc_lst->addr, addr, sizeof(struct in6_addr)); - mc_lst->ifindex = ifindex; if (ifindex == 0) { struct rt6_info *rt; @@ -107,6 +106,8 @@ sock_kfree_s(sk, mc_lst, sizeof(*mc_lst)); return -ENODEV; } + + mc_lst->ifindex = dev->ifindex; /* * now add/increase the group membership on the device From owner-netdev@oss.sgi.com Wed Aug 15 00:37:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7F7bBH04108 for netdev-outgoing; Wed, 15 Aug 2001 00:37:11 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7F7b9j04104 for ; Wed, 15 Aug 2001 00:37:09 -0700 Received: from localhost (IDENT:davem@pizda.ninka.net [127.0.0.1]) by pizda.ninka.net (8.9.3/8.9.3) with ESMTP id AAA01929; Wed, 15 Aug 2001 00:37:02 -0700 Date: Wed, 15 Aug 2001 00:37:01 -0700 (PDT) Message-Id: <20010815.003701.74752956.davem@redhat.com> To: dlstevens@us.ibm.com Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: IPv6 multicast address leak (2.4.*) From: "David S. Miller" In-Reply-To: References: X-Mailer: Mew version 2.0 on Emacs 21.0 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk From: "David Stevens" Date: Tue, 14 Aug 2001 16:19:39 -0700 Fix: In ipv6_sock_mc_join(), set the mc_list index based on the device we actually use: Thanks a lot, patch applied. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Aug 15 01:12:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7F8CP804779 for netdev-outgoing; Wed, 15 Aug 2001 01:12:25 -0700 Received: from Cantor.suse.de (ns.suse.de [213.95.15.193]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7F8CNj04776 for ; Wed, 15 Aug 2001 01:12:23 -0700 Received: from Hermes.suse.de (Hermes.suse.de [213.95.15.136]) by Cantor.suse.de (Postfix) with ESMTP id C71131E127; Wed, 15 Aug 2001 10:12:14 +0200 (MEST) Date: Tue, 14 Aug 2001 22:20:10 +0200 From: Andi Kleen To: Nivedita Singhvi Cc: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: disabling tcp quickacks Message-ID: <20010814222010.A6329@gruyere.muc.suse.de> References: <200108140655.XAA14160@eng4.beaverton.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200108140655.XAA14160@eng4.beaverton.ibm.com>; from nivedita@sequent.com on Mon, Aug 13, 2001 at 11:55:31PM -0700 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Mon, Aug 13, 2001 at 11:55:31PM -0700, Nivedita Singhvi wrote: > If the client has disabled TCP_QUICKACKS via setsockopt() > on this socket (i.e. tp->ack.pingpong = 1), we'll fall > through to this code when completing the 3 way handshake > from TCP_SYN_SENT state. However, tcp_enter_quickack_mode(tp) > unconditionally resets tp->ack.pingpong to 0, of course. > Subsequent acks will be quick acks, rather than delayed > acks, as hoped. Or what am I missing here? The quickack should be only send on the next incoming packet, which is effectively the delayed ack needed. > Is tp->ack.pingpong not intended to store the user choice > of "dont/do quick ack" as set by TCP_QUICKACKS? We reset > it (pingpong) when we receive data that fills our > out of order queue, or receive out of order/window or retransmitted > data, so it doesnt seem to be the case.. I think TCP_QUICKACKS was an afterthought which never completely worked. The real purpose of pingpong is to safe the state of the pingping heuristic. Best would be probably to either remove TCP_QUICKACKS or make it a separate flag. > Any clarification here would be appreciated! > > On an unconnected note, why are there 2 mailing lists, linux-net > and netdev? Is one deprecated, or preferred? linux-net has been historically been more user support oriented and included non kernel things, while netdev is strictly aimed at kernel development stuff only. These days linux-net traffic is so low (it used to have much more "support" traffic) that it doesn't make much difference anymore; but it is probably still a good idea to e.g. send patches to netdev instead of linux-net. -Andi From owner-netdev@oss.sgi.com Wed Aug 15 11:53:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7FIrhE01714 for netdev-outgoing; Wed, 15 Aug 2001 11:53:43 -0700 Received: from web10901.mail.yahoo.com (web10901.mail.yahoo.com [216.136.131.37]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7FIrdj01697 for ; Wed, 15 Aug 2001 11:53:39 -0700 Message-ID: <20010815185338.76654.qmail@web10901.mail.yahoo.com> Received: from [63.112.207.180] by web10901.mail.yahoo.com; Wed, 15 Aug 2001 11:53:38 PDT Date: Wed, 15 Aug 2001 11:53:38 -0700 (PDT) From: Brad Chapman Subject: Bad performance when using IPv4 and IPv6 together To: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Everyone, Whenever I compile the 2.4.8 kernel's IPv6 stack monolithically, or when I load it as a module, the system's networking speed slows dramatically. TCP and UDP connections take an immensely long time, and whenever I attempt to ping through my gateway, most of the ICMP packets have huge ms times (10000+) or get dropped altogether. Is this a bug? I need to have access to the IPv6 stack so that I can test my IPv6 netfilter programming efforts. Is there a way to include both stacks and prevent performance degradation? Thanks, Brad ===== Brad Chapman Permanent e-mail: kakadu_croc@yahoo.com Current e-mail: kakadu@adelphia.net Reply to the address I used in the message to you, please! __________________________________________________ Do You Yahoo!? Make international calls for as low as $.04/minute with Yahoo! Messenger http://phonecard.yahoo.com/ From owner-netdev@oss.sgi.com Wed Aug 15 14:29:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7FLTdV07870 for netdev-outgoing; Wed, 15 Aug 2001 14:29:39 -0700 Received: from dea.waldorf-gmbh.de (u-246-19.karlsruhe.ipdial.viaginterkom.de [62.180.19.246]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7FLTaj07857 for ; Wed, 15 Aug 2001 14:29:36 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f7FLRrm14987; Wed, 15 Aug 2001 23:27:53 +0200 Date: Wed, 15 Aug 2001 23:27:53 +0200 From: Ralf Baechle To: Brad Chapman Cc: netdev@oss.sgi.com Subject: Re: Bad performance when using IPv4 and IPv6 together Message-ID: <20010815232753.A14025@bacchus.dhis.org> References: <20010815185338.76654.qmail@web10901.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010815185338.76654.qmail@web10901.mail.yahoo.com>; from kakadu_croc@yahoo.com on Wed, Aug 15, 2001 at 11:53:38AM -0700 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, Aug 15, 2001 at 11:53:38AM -0700, Brad Chapman wrote: > Whenever I compile the 2.4.8 kernel's IPv6 stack monolithically, or > when I load it as a module, the system's networking speed slows dramatically. > TCP and UDP connections take an immensely long time, and whenever I attempt to > ping through my gateway, most of the ICMP packets have huge ms times (10000+) > or get dropped altogether. > Is this a bug? I need to have access to the IPv6 stack so that I > can test my IPv6 netfilter programming efforts. Is there a way to include > both stacks and prevent performance degradation? Have you only tested ICMP using ping? I was observing odd behaviour of ping6 also but suspect it's actually ping6. So for example ping6 behaves different for me depending if I give it a hostname or an IP address - even though both refer to the same address of a local interface. Ralf From owner-netdev@oss.sgi.com Thu Aug 16 05:28:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7GCSjL26608 for netdev-outgoing; Thu, 16 Aug 2001 05:28:45 -0700 Received: from web10905.mail.yahoo.com (web10905.mail.yahoo.com [216.136.131.41]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7GCSij26605 for ; Thu, 16 Aug 2001 05:28:44 -0700 Message-ID: <20010816122841.35363.qmail@web10905.mail.yahoo.com> Received: from [63.112.207.180] by web10905.mail.yahoo.com; Thu, 16 Aug 2001 05:28:41 PDT Date: Thu, 16 Aug 2001 05:28:41 -0700 (PDT) From: Brad Chapman Subject: RE: Bad performance when using IPv4 and IPv6 together To: d.robles@codetel.net.do Cc: netdev@oss.sgi.com, netfilter@lists.samba.org In-Reply-To: <6f7701c1260f$3985f5d0$525103c4@codetel.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk --- d.robles@codetel.net.do wrote: > Off topic question, also maybe a dumb question, > > Why would anyone run ipv6 now ? > > > Elías Mr. Elias, AFAIK, an IPv6-only network called the 6bone does exist; one of the coreteam members talked about it early on. Right now, though, the only two reasons why anyone would run IPv6 is to develop stuff using it (ip6tables) or to debug and test it. Brad ===== Brad Chapman Permanent e-mail: kakadu_croc@yahoo.com Current e-mail: kakadu@adelphia.net Reply to the address I used in the message to you, please! __________________________________________________ Do You Yahoo!? Make international calls for as low as $.04/minute with Yahoo! Messenger http://phonecard.yahoo.com/ From owner-netdev@oss.sgi.com Thu Aug 16 06:39:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7GDdnx28895 for netdev-outgoing; Thu, 16 Aug 2001 06:39:49 -0700 Received: from dea.waldorf-gmbh.de (u-107-20.karlsruhe.ipdial.viaginterkom.de [62.180.20.107]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7GDdkj28892 for ; Thu, 16 Aug 2001 06:39:47 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f7GDbtK30310; Thu, 16 Aug 2001 15:37:55 +0200 Date: Thu, 16 Aug 2001 15:37:55 +0200 From: Ralf Baechle To: Brad Chapman Cc: d.robles@codetel.net.do, netdev@oss.sgi.com, netfilter@lists.samba.org Subject: Re: Bad performance when using IPv4 and IPv6 together Message-ID: <20010816153755.A30079@bacchus.dhis.org> References: <6f7701c1260f$3985f5d0$525103c4@codetel.net> <20010816122841.35363.qmail@web10905.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010816122841.35363.qmail@web10905.mail.yahoo.com>; from kakadu_croc@yahoo.com on Thu, Aug 16, 2001 at 05:28:41AM -0700 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, Aug 16, 2001 at 05:28:41AM -0700, Brad Chapman wrote: > AFAIK, an IPv6-only network called the 6bone does exist; one of the > coreteam members talked about it early on. Right now, though, the only two > reasons why anyone would run IPv6 is to develop stuff using it (ip6tables) > or to debug and test it. 6bone has been found to survive routing problems in the underlying IPv4 structure used to interconnect IPv6 islands rather well so some people already (try to ...) rely on it. Ralf From owner-netdev@oss.sgi.com Thu Aug 16 10:24:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7GHOSp02353 for netdev-outgoing; Thu, 16 Aug 2001 10:24:28 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7GHOMj02349; Thu, 16 Aug 2001 10:24:22 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA21459; Thu, 16 Aug 2001 21:24:08 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108161724.VAA21459@ms2.inr.ac.ru> Subject: Re: Bad performance when using IPv4 and IPv6 together To: ralf@oss.sgi.com (Ralf Baechle) Date: Thu, 16 Aug 2001 21:24:08 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20010815232753.A14025@bacchus.dhis.org> from "Ralf Baechle" at Aug 16, 1 01:45:07 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > Have you only tested ICMP using ping? I was observing odd behaviour of > ping6 also but suspect it's actually ping6. So for example ping6 behaves > different for me depending if I give it a hostname or an IP address - > even though both refer to the same address of a local interface. DNS. Reversed resolutions for IPv6 are _damnly_ slow. By default ping6 from iputils disables reversed resolution, when you give numeric address. No matter: net results obtained with ping/ping6 from iputils do not depend on (mis)configuration of DNS. It may delay waiting for DNS, but the results for latency are right anyway. Alexey From owner-netdev@oss.sgi.com Thu Aug 16 10:47:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7GHlQj02851 for netdev-outgoing; Thu, 16 Aug 2001 10:47:26 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7GHlMj02841 for ; Thu, 16 Aug 2001 10:47:24 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA21551; Thu, 16 Aug 2001 21:47:15 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id PAA00213 for "nivedita@sequent.COM"; Wed, 15 Aug 2001 15:54:14 +0400 Message-Id: <200108151154.PAA00213@mops.inr.ac.ru> Subject: Re: disabling tcp quickacks To: nivedita@sequent.COM (Nivedita Singhvi) Date: Wed, 15 Aug 2001 15:54:14 +0400 (MSD) In-Reply-To: <200108140655.XAA14160@eng4.beaverton.ibm.com> from "Nivedita Singhvi" at Aug 14, 1 11:15:01 am From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > Subsequent acks will be quick acks, rather than delayed > acks, as hoped. Or what am I missing here? Clearing TCP_QUICKACKS you give to kernel hint that bare ack can be delayed, because you are going to send some data soon (for delack timeout). If you break this contract or quick acking is required by protocol, your advice is ignored being invalidated by more strong and evident hints. > Does > (tp->write_pending || tp->defer_accept || !(tp->ack.pingpong)) > > make more sense? This does not make any sense at all. You propose to delay ACK _always_, which will stall any connection except for http for delack timeout. :-) > Is tp->ack.pingpong not intended to store the user choice > of "dont/do quick ack" It is not. It means the thing described above. Alexey From owner-netdev@oss.sgi.com Thu Aug 16 10:47:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7GHlY202874 for netdev-outgoing; Thu, 16 Aug 2001 10:47:34 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7GHlLj02840 for ; Thu, 16 Aug 2001 10:47:24 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA21547; Thu, 16 Aug 2001 21:47:13 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id PAA00206 for "samudrala@us.ibm.COM"; Wed, 15 Aug 2001 15:43:23 +0400 Message-Id: <200108151143.PAA00206@mops.inr.ac.ru> Subject: Re: disabling tcp quickacks To: samudrala@us.ibm.COM (Sridhar Samudrala) Date: Wed, 15 Aug 2001 15:43:23 +0400 (MSD) In-Reply-To: from "Sridhar Samudrala" at Aug 14, 1 10:15:59 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > * tcp_incr_quickack() gets called twice.(tcp_enter_quickack_mode() also > calls this routine) Why twice? It is called each time when quickacks are required. > * we enter quickack mode and reset pingpong even when user has > disabled quickacks by setting pingpong to 1. User is not allowed disable or to force quickacks, when the opposite behaviour is required by protocol. TCP_QUICKACK is _hint_ to use in the cases, when kernel cannot predict further behaviour and user is able to give it. Logically, this hint _must_ be invalidated by the next sent/received packet. > * Can this code be replaced with a simple call to tcp_ack_snd_check()? What "this code"? :-) Alexey From owner-netdev@oss.sgi.com Thu Aug 16 10:47:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7GHlYS02875 for netdev-outgoing; Thu, 16 Aug 2001 10:47:34 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7GHlNj02843 for ; Thu, 16 Aug 2001 10:47:24 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA21560; Thu, 16 Aug 2001 21:47:16 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id GAA00288; Tue, 14 Aug 2001 06:14:57 +0400 Message-Id: <200108140214.GAA00288@mops.inr.ac.ru> Subject: Re: IPTOS_LOWDELAY in netinet/ip.h is incorrect?? To: greearb@candelatech.COM (Ben Greear) Date: Tue, 14 Aug 2001 06:14:57 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: <3B77F5B8.B43EA7CD@candelatech.com> from "Ben Greear" at Aug 13, 1 08:15:01 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > Also, from doing a google search on setsockopt and IP_TOS, I was able > to find exactly zero pieces of source code that do this, so I may be > the first one to every try it :) Really? Indeed, telnet(d), ftp(d) et al. are too modernistic and still not published applications. :-) > Did you look at the different bit-ordering in RFC 1349? It is not different. Well, try to read some other RFC yet. Starting from 791. Compare them to our header files. Assume only that all of them are right and you are not the first person using IP. :-) > I think that #ifdef code should be changed to check for the > run-time enabled-ness of ECN. "run-time ECN" is non-sense. If ECN is disabled, ECN bits must be zero, otherwise you break ECN for others. Even option CONFIG_INET_ECN is not _our_ choice, it means the only thing: whether ECN exists in nature or it does not. If it exists, user of tcp does not have control on ECN bits exactly like he is not allowed to send bogus SEQ, ACK etc. And Dave noticed right thing: it would be better to zap these bits completely and do not create options for things, which are not optional. > when I specifically disable ECN through the /proc/ interface, > then I should be able to set the bits as specified in 1349 or whatever. You cannot disable ECN with sysctl. tcp_ecn controls negotiation of ECN capability. If it is disabled, ECN bits must be zero. Exactly like setting tcp_sack to zero does not allow you to send some random crap under tag of sack option. See? > I eventually found it, but couldn't easily explain what I was seeing. This is not very wonderful. Try to improve your test to use COBOL instead of C++, you will lose not only vision, audition too. :-) > What happens if a linux box is connected to something that is still using > RFC 1349 and gets sent a packet with one of the ECN bits set? What happpens if a linux box sends spam mails following corresponding RFC? Nothing happens, but a bit of abuse. :-) The story is very simple: when ECN was proposed people scanned internet traffic (seems, on east coast backbone) and discovered that bit 1 is not used at all. (BTW reserved bit 0 was used by someone :-)). Certainly, some person may wake up on August 2001 and start to use for something different of ECN. :-) Alexey From owner-netdev@oss.sgi.com Fri Aug 17 04:44:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7HBiuq31863 for netdev-outgoing; Fri, 17 Aug 2001 04:44:56 -0700 Received: from dea.waldorf-gmbh.de (u-214-10.karlsruhe.ipdial.viaginterkom.de [62.180.10.214]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7HBibj31855 for ; Fri, 17 Aug 2001 04:44:37 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f7HBgok05311 for netdev@oss.sgi.com; Fri, 17 Aug 2001 13:42:50 +0200 Received: from alhmailsrv.alhsys (mail.alhsys.com [194.69.248.4]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7H9Lkj28333 for ; Fri, 17 Aug 2001 02:21:46 -0700 Received: by ALHMAILSRV with Internet Mail Service (5.5.2653.19) id <3WTNXVDZ>; Fri, 17 Aug 2001 11:23:59 +0200 Message-ID: <1D23DFB85346D3118CA400A0C9E9872201A9DC95@ALHMAILSRV> From: Javier Castillo Alcibar To: "'netdev@oss.sgi.com'" Subject: Help needed with routing problems Date: Fri, 17 Aug 2001 11:23:57 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C126FE.57B78C40" Sender: owner-netdev@oss.sgi.com Precedence: bulk This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C126FE.57B78C40 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi there, I have a very, very strange problem. My english is very poor, so I am going to be very short: 1) I insert a new route: linux-rt:/proc/net# route add -host 194.69.249.21 gw 195.53.58.97 2) I ping the new host: linux-rt:/proc/net# ping 194.69.249.21 & [1] 8904 PING 194.69.249.21 (194.69.249.21): 56 data bytes 3) And I want to what happens: linux-rt:/proc/net# tcpdump -v -n -i eth5 src or dst 194.69.249.21 tcpdump: listening on eth5 10:58:28.348284 195.53.58.98 > 194.69.249.21: icmp: echo request (ttl = 64, id 43415) 10:58:28.349563 195.53.58.98 > 194.69.249.21: icmp: echo request (ttl = 63, id 43415) ???? 10:58:29.348275 195.53.58.98 > 194.69.249.21: icmp: echo request (ttl = 64, id 43419) 10:58:29.349578 195.53.58.98 > 194.69.249.21: icmp: echo request (ttl = 63, id 43419) ????? 10:58:30.348258 195.53.58.98 > 194.69.249.21: icmp: echo request (ttl = 64, id 43421) 10:58:30.349546 195.53.58.98 > 194.69.249.21: icmp: echo request (ttl = 63, id 43421) ????? It seems that the packets are duplicated, but with a lower ttl = =BF?=BF? Ok, the ping (icmp) is crazy, let's try with tracepath tool (udp): 4) Thx Kuznet for your tool: linux-rt:/proc/net# tracepath 194.69.249.21 & [1] 9177 1?: [LOCALHOST] pmtu 1500 5) Let's see: linux-rt:/proc/net# tcpdump -v -n -i eth5 src or dst 194.69.249.21 1: = no reply tcpdump: listening on eth5 11:06:16.188295 195.53.58.98.1475 > 194.69.249.21.44448: udp 1472 (DF) = (ttl 2, id 44275) 11:06:17.188281 195.53.58.98.1475 > 194.69.249.21.44449: udp 1472 (DF) = (ttl 2, id 44278) 2: no reply 11:06:18.188368 195.53.58.98.1475 > 194.69.249.21.44450: udp 1472 (DF) = (ttl 3, id 44280) 11:06:19.188284 195.53.58.98.1475 > 194.69.249.21.44451: udp 1472 (DF) = (ttl 3, id 44282) 11:06:20.188279 195.53.58.98.1475 > 194.69.249.21.44452: udp 1472 (DF) = (ttl 3, id 44284) 3: no reply 11:06:21.188352 195.53.58.98.1475 > 194.69.249.21.44453: udp 1472 (DF) = (ttl 4, id 44286) 11:06:22.188280 195.53.58.98.1475 > 194.69.249.21.44454: udp 1472 (DF) = (ttl 4, id 44288) 11:06:23.188280 195.53.58.98.1475 > 194.69.249.21.44455: udp 1472 (DF) = (ttl 4, id 44292) 4: no reply 11:06:24.188340 195.53.58.98.1475 > 194.69.249.21.44456: udp 1472 (DF) = (ttl 5, id 44294) 11:06:25.188279 195.53.58.98.1475 > 194.69.249.21.44457: udp 1472 (DF) = (ttl 5, id 44300) This time, the packets are not duplicated.............=BF?=BF?=BF? I have a kernel 2.2.16 Thx all. Javier Castillo Alc=EDbar - castillo@alhsys.com Alhambra Systems, S.A. - www.alhsys.com c/Albasanz 14, 28037 Madrid Tel.: +34 91 787 23 00 Fax.: +34 91 787 23 01 ------_=_NextPart_001_01C126FE.57B78C40 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Help needed with routing problems

        Hi = there,

        I have a = very, very strange problem. My english is very poor, so I am going to = be very short:

1) I insert a new route:
linux-rt:/proc/net# route add -host 194.69.249.21 gw = 195.53.58.97

2) I ping the new host:
linux-rt:/proc/net# ping 194.69.249.21 &
[1] 8904
PING 194.69.249.21 (194.69.249.21): 56 data = bytes

3) And I want to what happens:
linux-rt:/proc/net# tcpdump -v -n -i eth5 src or dst = 194.69.249.21
tcpdump: listening on eth5
10:58:28.348284 195.53.58.98 > 194.69.249.21: = icmp: echo request (ttl 64, id 43415)
10:58:28.349563 195.53.58.98 > 194.69.249.21: = icmp: echo request (ttl 63, id 43415)  ????
10:58:29.348275 195.53.58.98 > 194.69.249.21: = icmp: echo request (ttl 64, id 43419)
10:58:29.349578 195.53.58.98 > 194.69.249.21: = icmp: echo request (ttl 63, id 43419) ?????
10:58:30.348258 195.53.58.98 > 194.69.249.21: = icmp: echo request (ttl 64, id 43421)
10:58:30.349546 195.53.58.98 > 194.69.249.21: = icmp: echo request (ttl 63, id 43421) ?????


        It seems = that the packets are duplicated, but with a lower ttl =BF?=BF?

Ok, the ping (icmp) is crazy, let's try with = tracepath tool (udp):

4) Thx Kuznet for your tool:
linux-rt:/proc/net# tracepath 194.69.249.21 = &
[1] 9177
 1?: [LOCALHOST]      = pmtu 1500

5) Let's see:

linux-rt:/proc/net# tcpdump -v -n -i eth5 src or dst = 194.69.249.21 1:  no reply

tcpdump: listening on eth5
11:06:16.188295 195.53.58.98.1475 > = 194.69.249.21.44448: udp 1472 (DF) (ttl 2, id 44275)
11:06:17.188281 195.53.58.98.1475 > = 194.69.249.21.44449: udp 1472 (DF) (ttl 2, id 44278)
 2:  no reply
11:06:18.188368 195.53.58.98.1475 > = 194.69.249.21.44450: udp 1472 (DF) (ttl 3, id 44280)
11:06:19.188284 195.53.58.98.1475 > = 194.69.249.21.44451: udp 1472 (DF) (ttl 3, id 44282)
11:06:20.188279 195.53.58.98.1475 > = 194.69.249.21.44452: udp 1472 (DF) (ttl 3, id 44284)
 3:  no reply
11:06:21.188352 195.53.58.98.1475 > = 194.69.249.21.44453: udp 1472 (DF) (ttl 4, id 44286)
11:06:22.188280 195.53.58.98.1475 > = 194.69.249.21.44454: udp 1472 (DF) (ttl 4, id 44288)
11:06:23.188280 195.53.58.98.1475 > = 194.69.249.21.44455: udp 1472 (DF) (ttl 4, id 44292)
 4:  no reply
11:06:24.188340 195.53.58.98.1475 > = 194.69.249.21.44456: udp 1472 (DF) (ttl 5, id 44294)
11:06:25.188279 195.53.58.98.1475 > = 194.69.249.21.44457: udp 1472 (DF) (ttl 5, id 44300)

        This time, = the packets are not duplicated.............=BF?=BF?=BF?


        I have a = kernel 2.2.16


        Thx = all.


Javier Castillo Alc=EDbar - = castillo@alhsys.com
Alhambra Systems, S.A. - www.alhsys.com
c/Albasanz 14, 28037 Madrid
Tel.:  +34 91 787 23 00
Fax.: +34 91 787 23 01



------_=_NextPart_001_01C126FE.57B78C40-- From owner-netdev@oss.sgi.com Fri Aug 17 18:45:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7I1j0g18976 for netdev-outgoing; Fri, 17 Aug 2001 18:45:00 -0700 Received: from netbank.com.br (IDENT:postfix@garrincha.netbank.com.br [200.203.199.88]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7I1itj18969; Fri, 17 Aug 2001 18:44:55 -0700 Received: from 2-166.ctame701-1.telepar.net.br (1-166.cwb-adsl.brasiltelecom.net.br [200.193.160.166]) by netbank.com.br (Postfix) with ESMTP id B731046818; Fri, 17 Aug 2001 22:44:04 -0300 (BRST) Received: from localhost ([IPv6:::ffff:127.0.0.1]:20372 "EHLO localhost") by imladris.surriel.com with ESMTP id ; Fri, 17 Aug 2001 22:44:26 -0300 Date: Fri, 17 Aug 2001 22:44:23 -0300 (BRST) From: Rik van Riel X-X-Sender: To: Ralf Baechle Cc: Brad Chapman , , , Subject: Re: Bad performance when using IPv4 and IPv6 together In-Reply-To: <20010816153755.A30079@bacchus.dhis.org> Message-ID: X-spambait: aardvark@kernelnewbies.org X-spammeplease: aardvark@nl.linux.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Thu, 16 Aug 2001, Ralf Baechle wrote: > 6bone has been found to survive routing problems in the underlying > IPv4 structure used to interconnect IPv6 islands rather well so some > people already (try to ...) rely on it. I rely on it ;) At times I have as much as 30% packet loss between two hosts on ipv4, or the routing table entry in one of the 20 in-between hops is gone completely. In these situations, ipv6 takes me to the other host reliably. It's worth it having a partial mesh inside your pTLA, trust me... Rik -- IA64: a worthy successor to i860. http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) From owner-netdev@oss.sgi.com Sat Aug 18 09:57:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7IGv1B01718 for netdev-outgoing; Sat, 18 Aug 2001 09:57:01 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7IGuqj01714 for ; Sat, 18 Aug 2001 09:56:52 -0700 Received: from localhost (pc3-oxfo3-0-cust1.oxf.cable.ntl.com [213.107.68.1]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id JAB02174 for ; Sat, 18 Aug 2001 09:56:51 -0700 (PDT) mail_from (ian@lynagh.demon.co.uk) Received: from ian by localhost with local (Exim 3.22 #1 (Debian)) id 15Y9FK-0005gU-00; Sat, 18 Aug 2001 17:46:50 +0100 Date: Sat, 18 Aug 2001 17:46:50 +0100 From: Ian Lynagh To: netdev@oss.sgi.com Subject: Problems with corruption sending packets from a module Message-ID: <20010818174650.A21733@stu163.keble.ox.ac.uk> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="azLHFNyN32YCQGCU" Content-Disposition: inline User-Agent: Mutt/1.3.15i Sender: owner-netdev@oss.sgi.com Precedence: bulk --azLHFNyN32YCQGCU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi all To learn how things work I am experimenting with writing kernel code. At the moment I am trying to send an ICMP packet from a kernel module. I have pasted my current code below. I suspect some of the lengths are wrong and a few small things like that, but the first time I insmod the module it all seems to basically go OK and send the packet. However, if I remove the module and reinsert it then somewhere in the ip_send function the sk_buff seems to get corrupted. Subsequent insertions corrupt the packet in the same way. Here are some messages logged during correct and corrupted packet sending showing that it is during the ip_send call: kernel: Sending now src 1678246666... kernel: Sent...now src 1678246666... kernel: Sending now src 1678246666... kernel: Sent...now src 268435456... Is there something I'm missing? Something to do with locking or something perhaps? I have attached a correct and corrupt packet in tcpdump capture format. void foo(void) { struct sk_buff *skb; struct pckt *p; struct rtable *rt = NULL; skb = alloc_skb(sizeof(struct pckt), GFP_KERNEL); if (!skb) goto end; p = (struct pckt *) skb_put(skb, sizeof(struct pckt)); memset(p, 0, sizeof(struct pckt)); skb->mac.ethernet = &p->ethernet; skb->nh.iph = &p->iph; skb->h.icmph = &p->icmph; p->iph.saddr = (((((100 << 8) + 8) << 8) + 3) << 8) + 10; p->iph.daddr = (((((128 << 8) + 8) << 8) + 3) << 8) + 10; if (ip_route_output(&rt, p->iph.daddr, p->iph.saddr, RT_TOS(0), 0)) { printk(LOGAT "ip_route_output failed\n"); } skb->dst = &rt->u.dst; skb->pkt_type = PACKET_HOST; skb->protocol = __constant_htons(ETH_P_IP); p->data[0] = 'A'; p->data[1] = 'B'; p->data[2] = 'C'; p->data[3] = 'D'; p->icmph.type = ICMP_ECHOREPLY; p->icmph.code = 0; p->icmph.checksum = 0; p->iph.ihl = sizeof(struct iphdr) >> 2; p->iph.ttl = 64; p->iph.version = 4; p->iph.frag_off = __constant_htons(IP_DF); p->iph.protocol = IPPROTO_ICMP; // p->iph.tot_len = htons(sizeof(struct pckt)); p->iph.tot_len = htons(sizeof(struct iphdr) + sizeof(struct icmphdr) + 4); ip_send_check(&p->iph); skb->len = 4; // skb->len = sizeof(struct pckt); skb->csum = 0; skb->data = p->data; printk(LOGAT "Sending now src %u...\n", p->iph.saddr); ip_send(skb); printk(LOGAT "Sent...now src %u...\n", p->iph.saddr); } BTW, if there is a source of good documentation that I might have missed then please point it out to me! Thanks Ian --azLHFNyN32YCQGCU Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: attachment; filename="correct.txt" Content-Transfer-Encoding: quoted-printable =D4=C3=B2=A1=02=00=04=00=00=00=00=00=00=00=00=00=FF=FF=00=00=01=00=00=00_|~= ;=AF=D9 =00<=00=00=00<=00=00=00=00=10K=C1=A2=08=00=C0=DF=10=BD*=08=00E=00=00 =00=00= @=00@=01=15=F4 =03=08d =03=08=80=00=00=00=00=00=00=00=00ABCD=00=00=00=00=00=00=00=02=00=01=86=A3= =00=00 --azLHFNyN32YCQGCU Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: attachment; filename="corrupt.txt" Content-Transfer-Encoding: quoted-printable =D4=C3=B2=A1=02=00=04=00=00=00=00=00=00=00=00=00=FF=FF=00=00=01=00=00=00=A1= |~;O=85=02=00<=00=00=00<=00=00=00=00=10K=C1=A2=08=00=C0=DF=10=BD*=08=00E=00= =00 =00=00@=00@=01=15=F4=00=00=00=10K=C1=A2=08=00=C0=DF=10=BD*=08=00ABCD=00= =00=00=00=00=00=00=02=00=01=86=A3=00=00 --azLHFNyN32YCQGCU-- From owner-netdev@oss.sgi.com Mon Aug 20 00:09:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7K79Ok08742 for netdev-outgoing; Mon, 20 Aug 2001 00:09:24 -0700 Received: from sink.san.rr.com (24-25-197-107.san.rr.com [24.25.197.107]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7K79Mj08739 for ; Mon, 20 Aug 2001 00:09:22 -0700 Received: (qmail 23895 invoked by uid 500); 20 Aug 2001 07:09:23 -0000 Date: Mon, 20 Aug 2001 00:09:23 -0700 From: acmay@acmay.homeip.net To: netdev@oss.sgi.com Subject: Writting a "trusted" network device Message-ID: <20010820000923.G1165@sink.san.rr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk I am working a network device for a custom hardware device and I can't really go into the details on the hardware. So I won't go into too much detail on why I want to do things this way. I want to have the net device get a packet and call netif_rx() with a skbuff. In one situation the packets will just get routed out normally, but at other times the packet will already be formed with a IP Source that is for one of the local interfaces. In this case ip_input drops the packet since fib_validate_source fails. It seems that I am stuck either: 1. changing ip_input/fib_validate_source or 2. filling out skb->dst and calling ip_forward myself in the driver. Are there any other options? For #1 I am not too fond of it unless other people want this feature as well. So are there other people that see any value in doing this? I can't follow the fib code yet, so I don't have a patch. I would think it would involve adding some sort of flag that marks the device as "trusted" to set the source IP to any value. From owner-netdev@oss.sgi.com Mon Aug 20 05:10:12 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7KCACl16110 for netdev-outgoing; Mon, 20 Aug 2001 05:10:12 -0700 Received: from mailweb18.rediffmail.com (IDENT:qmailr@[203.199.83.142]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7KCA9j16107 for ; Mon, 20 Aug 2001 05:10:09 -0700 Received: (qmail 30412 invoked by uid 510); 20 Aug 2001 12:12:21 -0000 Date: 20 Aug 2001 12:12:21 -0000 Message-ID: <20010820121221.30411.qmail@mailweb18.rediffmail.com> Received: from unknown (203.199.155.136) by rediffmail.com via HTTP; 20 Aug 2001 12:12:21 -0000 MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Project Query From: "kedar sovani" Content-ID: Content-type: text/plain Content-Description: Body Content-Transfer-Encoding: quoted-printable Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 594 Lines: 28 Hi, We r a group of three students set out to do a project as a part of ou= r curriculum. We wish to have a project which is challenging as well as e= ducative. We r thinking of implementation of the network driver ideas discussed = in the kernel summit. We would be very glad to have ur precious advice in this regard.can we= know the feasibility and the amount of effort that may go in doing this.= We would also be very glad to have any more ideas or suggestions regar= ding the same. = Waiting eagerly in response. Jayesh. Kedar. Nitin. (India) = = From owner-netdev@oss.sgi.com Mon Aug 20 09:04:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7KG4P024023 for netdev-outgoing; Mon, 20 Aug 2001 09:04:25 -0700 Received: from localhost (pc3-oxfo3-0-cust1.oxf.cable.ntl.com [213.107.68.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7KG4Nj24019 for ; Mon, 20 Aug 2001 09:04:24 -0700 Received: from ian by localhost with local (Exim 3.22 #1 (Debian)) id 15YrXM-0000C4-00; Mon, 20 Aug 2001 17:04:24 +0100 Date: Mon, 20 Aug 2001 17:04:24 +0100 From: Ian Lynagh To: netdev@oss.sgi.com Subject: Re: Problems with corruption sending packets from a module Message-ID: <20010820170424.A713@stu163.keble.ox.ac.uk> References: <20010818174650.A21733@stu163.keble.ox.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.15i In-Reply-To: <20010818174650.A21733@stu163.keble.ox.ac.uk>; from igloo@earth.li on Sat, Aug 18, 2001 at 05:46:50PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 427 Lines: 14 On Sat, Aug 18, 2001 at 05:46:50PM +0100, Ian Lynagh wrote: > > skb->data = p->data; In case someone is reading the archives and has a similar problem, this appears to be the cause of the trouble. If I do skb->data += sizeof(struct ethhdr); then it's fine. I guess I ought to be using skb_reserve really and not have the ethhdr in my struct. It seems to work regardless if the destination is not in the arp cache. Ian From owner-netdev@oss.sgi.com Mon Aug 20 09:50:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7KGoi325218 for netdev-outgoing; Mon, 20 Aug 2001 09:50:44 -0700 Received: from web11508.mail.yahoo.com (web11508.mail.yahoo.com [216.136.172.40]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7KGohj25213 for ; Mon, 20 Aug 2001 09:50:43 -0700 Message-ID: <20010820165042.57283.qmail@web11508.mail.yahoo.com> Received: from [216.221.201.55] by web11508.mail.yahoo.com; Mon, 20 Aug 2001 09:50:42 PDT Date: Mon, 20 Aug 2001 09:50:42 -0700 (PDT) From: Guilhem Tardy Subject: ip_input.c / time measurements To: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1024 Lines: 23 Hi! I am trying to make some measurement of the delays in the IP stack by adding time values inside UDP packets received on a special port. I intend to hack into the xmit function of a particular net driver, as in parts of the ip_input.c file. What is in your opinion the first function of ip_input invoked upon reception of a packet, and the last before passing to higher protocols? I thought of ip_rcv(), where incidentally iph = skb->nh.iph; is performed twice (unnecessarily?), and ip_local_deliver_finish(). Besides, I was a bit confused by the choice of a common time measure for the kernel parts and the application. Using jiffies limits me to 1/100 of a second measurement, unless I change that to 1/1000 or less in param.h ! Is there a more precise variable or function that I could invoke to get exact timings? Thanks for your help! Guilhem. __________________________________________________ Do You Yahoo!? Make international calls for as low as $.04/minute with Yahoo! Messenger http://phonecard.yahoo.com/ From owner-netdev@oss.sgi.com Mon Aug 20 11:24:40 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7KIOep27503 for netdev-outgoing; Mon, 20 Aug 2001 11:24:40 -0700 Received: from schmee.sfgoth.com (schmee.sfgoth.com [63.205.85.133]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7KIOc927500 for ; Mon, 20 Aug 2001 11:24:38 -0700 Received: (from mitch@localhost) by schmee.sfgoth.com (8.9.3/8.9.3) id LAA75750; Mon, 20 Aug 2001 11:24:33 -0700 (PDT) Date: Mon, 20 Aug 2001 11:24:33 -0700 From: Mitchell Blank Jr To: Dave Jones Cc: netdev@oss.sgi.com Subject: Re: Aula2 Message-ID: <20010820112433.D74316@sfgoth.com> References: <200108201205.KAA24974@unisul.rct-sc.br> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from davej@suse.de on Mon, Aug 20, 2001 at 03:14:33PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Dave Jones wrote: > Can something be done to limit the size of attachments to the list ? A better idea would be to front the list with procmail and install the sanitizer: http://www.impsec.org/email-tools/procmail-security.html We've had it here for awhile, and all this sircam crap is bouncing off it nicely. I've also yet to see a false positive. Great stuff. -Mitch From owner-netdev@oss.sgi.com Mon Aug 20 11:26:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7KIQIB27531 for netdev-outgoing; Mon, 20 Aug 2001 11:26:18 -0700 Received: from e24.nc.us.ibm.com (e24.nc.us.ibm.com [32.97.136.230]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7KIQC927525 for ; Mon, 20 Aug 2001 11:26:14 -0700 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e24.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id NAA65766; Mon, 20 Aug 2001 13:23:34 -0500 Received: from gateway1.beaverton.ibm.com (gateway1.sequent.com [138.95.180.2]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f7KIPaK69152; Mon, 20 Aug 2001 14:25:37 -0400 Received: from eng4.beaverton.ibm.com (eng4.sequent.com [138.95.7.64]) by gateway1.beaverton.ibm.com (8.11.2/8.11.2) with ESMTP id f7KIPa908601; Mon, 20 Aug 2001 11:25:36 -0700 Received: (from nivedita@localhost) by eng4.beaverton.ibm.com (8.8.5/8.8.5/token.aware-1.2) id LAA27324; Mon, 20 Aug 2001 11:25:35 -0700 (PDT) From: Nivedita Singhvi Message-Id: <200108201825.LAA27324@eng4.beaverton.ibm.com> Subject: Re: ip_input.c / time measurements To: guilhem_tardy@yahoo.com (Guilhem Tardy) Date: Mon, 20 Aug 2001 11:25:35 -0700 (PDT) Cc: netdev@oss.sgi.com In-Reply-To: <20010820165042.57283.qmail@web11508.mail.yahoo.com> from "Guilhem Tardy" at Aug 20, 2001 08:50:42 AM PST X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Yep, ip_rcv() would be the first IP function, and you might even want to stick the measuring end at the start of tcp_v4_rcv(), rather than at the end of ip_local_deliver_finish(). I'm trying to do similar measuring, when I get time, using Andrew Morton's time pegs. Still trying to actually do it right.. There is a reason for the duplicate iph = skb->nh.iph; skb might have changed after the call to skb_share_check(). thanks, Nivedita --- Nivedita Singhvi (503) 578-4580 Linux Technology Center nivedita@us.ibm.com IBM Beaverton, OR nivedita@sequent.com > Hi! > > I am trying to make some measurement of the delays in the IP stack by adding > time values inside UDP packets received on a special port. I intend to hack > into the xmit function of a particular net driver, as in parts of the > ip_input.c file. What is in your opinion the first function of ip_input invoked > upon reception of a packet, and the last before passing to higher protocols? I > thought of ip_rcv(), where incidentally iph = skb->nh.iph; is performed twice > (unnecessarily?), and ip_local_deliver_finish(). > Besides, I was a bit confused by the choice of a common time measure for the > kernel parts and the application. Using jiffies limits me to 1/100 of a second > measurement, unless I change that to 1/1000 or less in param.h ! Is there a > more precise variable or function that I could invoke to get exact timings? > > Thanks for your help! > > Guilhem. > > > __________________________________________________ > Do You Yahoo!? > Make international calls for as low as $.04/minute with Yahoo! Messenger > http://phonecard.yahoo.com/ > From owner-netdev@oss.sgi.com Mon Aug 20 14:10:20 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7KLAKh31097 for netdev-outgoing; Mon, 20 Aug 2001 14:10:20 -0700 Received: from yoda.planetinternet.be (yoda.planetinternet.be [195.95.30.146]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7KLAH931089 for ; Mon, 20 Aug 2001 14:10:17 -0700 Received: from dialup.planetinternet.be (postfix@u212-239-145-55.dialup.planetinternet.be [212.239.145.55]) by yoda.planetinternet.be (8.11.3/8.11.1) with ESMTP id f7KLADu16452; Mon, 20 Aug 2001 23:10:14 +0200 Received: by dialup.planetinternet.be (Postfix, from userid 501) id 4BEC726132; Mon, 20 Aug 2001 23:10:10 +0200 (CEST) Date: Mon, 20 Aug 2001 23:10:09 +0200 From: Kurt Roeckx To: Pekka Savola Cc: netdev@oss.sgi.com Subject: Re: ICMP NDISC: fake message with non-255 Hop Limit received: 249 Message-ID: <20010820231009.A6358@ping.be> References: <20010707030227.A1676@ping.be> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, Jul 07, 2001 at 10:04:19AM +0300, Pekka Savola wrote: > On Sat, 7 Jul 2001, Kurt Roeckx wrote: > > Jul 5 19:05:51 thunderbird kernel: ICMP NDISC: fake message with > > non-255 Hop Limit received: 249 > > The specs require that all IPv6 neighbour discovery messages MUST be > originated in the same network. In your case, you're getting these > messages from over the Internet. It says that any node should silenty drop any with a hop different then 255. It seems Linux is the only that drops it, although not silently. > Still, I'd suggest getting tcpdump 3.6.2 (compiled with ipv6), and > capturing the traffic a bit if this happens again: > > # tcpdump -n -s 512 -vvv icmp6 > > If you do capture something, please also describe your network topology. It suddenly got very bad. I already have 44K of those packets in the log. They look like this: 12:15:29.332636 3ffe:8100:100:a::71d > 3ffe:80c0:220::b: icmp6: neighbor sol: who has 3ffe:80c0:220::b (len 24, hlim 251) This box I'm on only has 1 tunnel, and it's a /128. The user from this packet is a tunnel broker user, which also has a /128. All hosts between me and that users cisco router are running FreeBSD, afaik. >From what I understand, all hosts in between should have dropped that packet for 2 reasons: - The hop != 255 - It's not a multicast address. It should have send a packet to ff02::1:0:b Is that correct? I tried to contact the end users, but none of them replied yet. Do you have any question you would like me to ask them? Kurt PS: Please CC me, I'm not on the list. From owner-netdev@oss.sgi.com Mon Aug 20 14:36:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7KLaol31762 for netdev-outgoing; Mon, 20 Aug 2001 14:36:50 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7KLaj931758 for ; Mon, 20 Aug 2001 14:36:45 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f7KLaWg10150; Tue, 21 Aug 2001 00:36:32 +0300 Date: Tue, 21 Aug 2001 00:36:31 +0300 (EEST) From: Pekka Savola To: Kurt Roeckx cc: Subject: Re: ICMP NDISC: fake message with non-255 Hop Limit received: 249 In-Reply-To: <20010820231009.A6358@ping.be> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Mon, 20 Aug 2001, Kurt Roeckx wrote: > On Sat, Jul 07, 2001 at 10:04:19AM +0300, Pekka Savola wrote: > > On Sat, 7 Jul 2001, Kurt Roeckx wrote: > > > Jul 5 19:05:51 thunderbird kernel: ICMP NDISC: fake message with > > > non-255 Hop Limit received: 249 > > > > The specs require that all IPv6 neighbour discovery messages MUST be > > originated in the same network. In your case, you're getting these > > messages from over the Internet. > > It says that any node should silenty drop any with a hop > different then 255. It seems Linux is the only that drops it, > although not silently. Dropping silently in RFC context means not sending any ICMP errors, or anything like that about dropped packets. Logging is a policy issue. In the long term, a lot of these messages can be moved to just counters, but this way bugs in implementations (our own too :-) and erroneuous setups are more easily detected. > It suddenly got very bad. I already have 44K of those packets in > the log. > > They look like this: > > 12:15:29.332636 3ffe:8100:100:a::71d > 3ffe:80c0:220::b: icmp6: > neighbor sol: who has 3ffe:80c0:220::b (len 24, hlim 251) > > This box I'm on only has 1 tunnel, and it's a /128. The user > from this packet is a tunnel broker user, which also has a /128. > All hosts between me and that users cisco router are running > FreeBSD, afaik. It appears to me that the system on '3ffe:8100:100:a::71d' has a very hosed setup/implementation, is parforming some tests or something. > >From what I understand, all hosts in between should have dropped > that packet for 2 reasons: > > - The hop != 255 Yes. > - It's not a multicast address. It should have send a packet > to ff02::1:0:b Not necessarily. Address resolving, for example uses multicast (most common scenario), but e.g. neighbor unreachability detection (NUD) uses unicast. Still, NUD should only be perfomed with your physical neighbors, not across the Internet. See RFC2462 7.1.1 and 4.3. > I tried to contact the end users, but none of them replied yet. > Do you have any question you would like me to ask them? - which implementation they are using (appears to be clearly wrong; tried updating?) - which implementation their ipv6 router is using - are they just testing something (intentional bad packets) or not - traceroute to your address - what's their network setup (It could be that if for some odd reason the implementation thought all addresses were on-link, as it MUST be the case if there are no IPv6 routers present, and the packets would still, using some mechanism, be relayed to the internet). HTH. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Mon Aug 20 14:54:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7KLsJw32049 for netdev-outgoing; Mon, 20 Aug 2001 14:54:19 -0700 Received: from yoda.planetinternet.be (yoda.planetinternet.be [195.95.30.146]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7KLsG932046 for ; Mon, 20 Aug 2001 14:54:16 -0700 Received: from dialup.planetinternet.be (postfix@u212-239-145-55.dialup.planetinternet.be [212.239.145.55]) by yoda.planetinternet.be (8.11.3/8.11.1) with ESMTP id f7KLsEu22221; Mon, 20 Aug 2001 23:54:14 +0200 Received: by dialup.planetinternet.be (Postfix, from userid 501) id BFA8A26132; Mon, 20 Aug 2001 23:54:10 +0200 (CEST) Date: Mon, 20 Aug 2001 23:54:10 +0200 From: Kurt Roeckx To: Pekka Savola Cc: netdev@oss.sgi.com Subject: Re: ICMP NDISC: fake message with non-255 Hop Limit received: 249 Message-ID: <20010820235410.A6561@ping.be> References: <20010820231009.A6358@ping.be> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: Sender: owner-netdev@oss.sgi.com Precedence: bulk On Tue, Aug 21, 2001 at 12:36:31AM +0300, Pekka Savola wrote: > On Mon, 20 Aug 2001, Kurt Roeckx wrote: > > > I tried to contact the end users, but none of them replied yet. > > Do you have any question you would like me to ask them? > > - which implementation they are using (appears to be clearly wrong; tried > updating?) > - which implementation their ipv6 router is using The admin of the cisco said it was using an old IOS ... But I know one of the other people with the same problem their router is a FreeBSD. > - are they just testing something (intentional bad packets) or not I doubt they are testing something. I'm running an IRC server, and it just appear to be users trying to connect to me. Atleast 2 of them just have a /128. > - traceroute to your address > - what's their network setup I'll try to get more info soon. Kurt From owner-netdev@oss.sgi.com Mon Aug 20 21:40:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7L4eB407537 for netdev-outgoing; Mon, 20 Aug 2001 21:40:11 -0700 Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7L4e8907534 for ; Mon, 20 Aug 2001 21:40:08 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id AAA80306; Tue, 21 Aug 2001 00:37:58 -0400 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7L4e7U62292; Mon, 20 Aug 2001 22:40:07 -0600 Importance: Normal Subject: Re: ip_input.c / time measurements To: guilhem_tardy@yahoo.com Cc: netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.3 (Intl) 21 March 2000 Message-ID: From: "Nivedita Singhvi" Date: Mon, 20 Aug 2001 21:26:23 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/20/2001 10:40:07 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1112 Lines: 33 Sorry, a couple of posts in the last couple of days have gone astray - reposting. > I am trying to make some measurement of the delays in the IP stack by adding > time values inside UDP packets received on a special port. I intend to hack > into the xmit function of a particular net driver, as in parts of the > ip_input.c file. What is in your opinion the first function of ip_input invoked > upon reception of a packet, and the last before passing to higher protocols? I > thought of ip_rcv(), where incidentally iph = skb->nh.iph; is performed twice > (unnecessarily?), and ip_local_deliver_finish(). Yep, ip_rcv() would be the first IP function, and you might even want to stick the measuring end at the start of tcp_v4_rcv(), rather than at the end of ip_local_deliver_finish(). You might want to take a look at Andrew Morton's time pegs code: http://www.uow.edu.au/~andrewm/linux/#timepegs which allows you to measure latencies between any 2 points of code.. There is a reason for the duplicate iph = skb->nh.iph; skb might have changed after the call to skb_share_check(). thanks, Nivedita From owner-netdev@oss.sgi.com Mon Aug 20 22:11:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7L5BID08130 for netdev-outgoing; Mon, 20 Aug 2001 22:11:18 -0700 Received: from whatever.local ([65.10.228.207]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7L5BH908127 for ; Mon, 20 Aug 2001 22:11:17 -0700 Received: (qmail 12176 invoked by uid 513); 20 Aug 2001 17:19:24 -0000 From: chuckw@ieee.org Date: Mon, 20 Aug 2001 13:19:24 -0400 To: netdev@oss.sgi.com Subject: Questions about the tcp code in 2.4.9 Message-ID: <20010820131924.A12155@ieee.org> Mail-Followup-To: netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 476 Lines: 12 Hello All, I have 2 questions. They may seem stupid, but the guys over at kernelnewbies said you guys would know the answers to these. 1) on line 1054 of tcp.c it seems that the sock buffer is being added to the front of the write queue. If my observation is correct ? why : what is happening 2) when an iovec is written to the sock buffer, it looks like the whole iovec is written into the same sock buffer. Is that correct? Thanks in advance, Chuck From owner-netdev@oss.sgi.com Tue Aug 21 04:40:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7LBe2o18074 for netdev-outgoing; Tue, 21 Aug 2001 04:40:02 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7LBe0918071 for ; Tue, 21 Aug 2001 04:40:00 -0700 Received: by colin.muc.de id <140637-2>; Tue, 21 Aug 2001 13:39:49 +0200 Message-ID: <20010821133944.12622@colin.muc.de> Date: Tue, 21 Aug 2001 13:39:44 +0200 From: Andi Kleen To: chuckw@ieee.org Cc: netdev@oss.sgi.com Subject: Re: Questions about the tcp code in 2.4.9 References: <20010820131924.A12155@ieee.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <20010820131924.A12155@ieee.org>; from chuckw@ieee.org on Mon, Aug 20, 2001 at 07:19:24PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 676 Lines: 17 On Mon, Aug 20, 2001 at 07:19:24PM +0200, chuckw@ieee.org wrote: > Hello All, > I have 2 questions. They may seem stupid, but the guys over at kernelnewbies said > you guys would know the answers to these. > 1) on line 1054 of tcp.c it seems that the sock buffer is being added to the > front of the write queue. > If my observation is correct ? why : what is happening The tcp write queue is a ring and prev of the is head is the back. > > 2) when an iovec is written to the sock buffer, it looks like the whole iovec > is written into the same sock buffer. Is that correct? Not necessarily. It can be split over multiple skbs when needed. -Andi From owner-netdev@oss.sgi.com Tue Aug 21 09:05:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7LG5wl24593 for netdev-outgoing; Tue, 21 Aug 2001 09:05:58 -0700 Received: from fork.powerdns.com (fork.powerdns.com [213.244.168.244]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7LG5t924590 for ; Tue, 21 Aug 2001 09:05:56 -0700 Received: (qmail 21485 invoked by uid 1011); 21 Aug 2001 16:05:53 -0000 Date: Tue, 21 Aug 2001 18:05:53 +0200 From: bert hubert To: netdev@oss.sgi.com Subject: Simple Packet Signing Message-ID: <20010821180553.A21415@fork.powerdns.com> Mail-Followup-To: bert hubert , netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1119 Lines: 28 Hi everybody, I'm considering implementing something called Simple Packet Signing. The current plan is at http://ds9a.nl/sps/PLAN "Ok, I have an itch to scratch. I have a laptop wich travels a lot and therefore has a very dynamic IP address. Even our home has a dynamic IP address, within a certain range. I currently grant broad access to my servers so that I am able to connect from all those IP addresses to ssh, to open up my access lists, so I can ssh to the rest of the network. Also, I am sometimes in a situation where I need to trust an IP address which can be forged by lots of untrustworthy people. Everybody in the chain from me to that server might be able to acquire my IP address, and thus gain access to my servers! * Sometimes I just wish that I would be able to simply sign my packets, and * have my access lists recognise the signature, and accept my traffic." For more rationale, see the URL. I would very much appreciate your input. Is this a wise idea? Are there better ways to achieve this, are people already working on this (besides IPSEC)? etc et. Thanks! Regards, bert From owner-netdev@oss.sgi.com Tue Aug 21 09:20:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7LGKTN24895 for netdev-outgoing; Tue, 21 Aug 2001 09:20:29 -0700 Received: from lust.cs.ohiou.edu (adsl-dynamic1-129.cleveland.oh.ameritech.net [64.108.88.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7LGKP924892 for ; Tue, 21 Aug 2001 09:20:25 -0700 Received: (from elb@localhost) by lust.cs.ohiou.edu (8.11.2/8.11.2) id f7LGKMf20892; Tue, 21 Aug 2001 12:20:22 -0400 X-Authentication-Warning: localhost.localdomain: elb set sender to eblanton@cs.ohiou.edu using -f Date: Tue, 21 Aug 2001 12:20:22 -0400 From: Ethan Blanton To: bert hubert Cc: netdev@oss.sgi.com Subject: Re: Simple Packet Signing Message-ID: <20010821122022.A20737@localhost.localdomain> Mail-Followup-To: bert hubert , netdev@oss.sgi.com References: <20010821180553.A21415@fork.powerdns.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ibTvN161/egqYuK8" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010821180553.A21415@fork.powerdns.com>; from ahu@ds9a.nl on Tue, Aug 21, 2001 at 06:05:53PM +0200 X-Operating-System: Linux X-GnuPG-Fingerprint: A290 14A8 C682 5C88 AE51 4787 AFD9 00F4 883C 1C14 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1625 Lines: 55 --ibTvN161/egqYuK8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable bert hubert spake unto us the following wisdom: > I'm considering implementing something called Simple Packet Signing. The > current plan is at http://ds9a.nl/sps/PLAN > For more rationale, see the URL. I would very much appreciate your input.= Is > this a wise idea? Are there better ways to achieve this, are people alrea= dy > working on this (besides IPSEC)? etc et. Sort of. Check out: http://www.ietf.org/internet-drafts/draft-moskowitz-hip-04.txt http://www.ietf.org/internet-drafts/draft-moskowitz-hip-arch-02.txt http://www.ietf.org/internet-drafts/draft-moskowitz-hip-impl-01.txt It goes a bit further even than what you are proposing (allowing complete substitution of crypotgraphic ID for the host IP in most circumstances), but it is a *very* good idea. I'm not sure I agree with all the details at this stage, but the WG hasn't even been formed yet, so there is a long way to go. :-) The mailing list information and subscription form is at: http://mail.freeswan.org/mailman/listinfo/hipsec Ethan --=20 If I've told you once, I've told you once And once is all that you needed. -- The Refreshments, "Carefree" --ibTvN161/egqYuK8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7gopGr9kA9Ig8HBQRAhssAJ0edw89V8InpfmjYDFOnowhGNlVOwCeKa69 RuVriGx65WRgfRWj+dqfTBI= =/8cO -----END PGP SIGNATURE----- --ibTvN161/egqYuK8-- From owner-netdev@oss.sgi.com Tue Aug 21 10:42:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7LHgR926165 for netdev-outgoing; Tue, 21 Aug 2001 10:42:27 -0700 Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7LHgQ926162 for ; Tue, 21 Aug 2001 10:42:26 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id NAA52526; Tue, 21 Aug 2001 13:40:11 -0400 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7LHgFA203578; Tue, 21 Aug 2001 11:42:15 -0600 Importance: Normal Subject: Re: ip_input.c / time measurements To: guilhem_tardy@yahoo.com Cc: netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.3 (Intl) 21 March 2000 Message-ID: From: "Nivedita Singhvi" Date: Tue, 21 Aug 2001 10:36:42 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/21/2001 11:42:38 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 353 Lines: 18 >>thought of ip_rcv(), where incidentally iph = skb->nh.iph; is performed twice >>(unnecessarily?), and ip_local_deliver_finish(). >There is a reason for the duplicate iph = skb->nh.iph; >skb might have changed after the call to skb_share_check(). Er, youre right, in that the initialization where its declared isnt necessary. thanks, Nivedita From owner-netdev@oss.sgi.com Tue Aug 21 15:13:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7LMDFU31854 for netdev-outgoing; Tue, 21 Aug 2001 15:13:15 -0700 Received: from fork.powerdns.com (fork.powerdns.com [213.244.168.244]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7LMDD931851 for ; Tue, 21 Aug 2001 15:13:14 -0700 Received: (qmail 29728 invoked by uid 1011); 21 Aug 2001 22:13:12 -0000 Date: Wed, 22 Aug 2001 00:13:12 +0200 From: bert hubert To: Ethan Blanton Cc: netdev@oss.sgi.com Subject: Re: Simple Packet Signing Message-ID: <20010822001312.A29648@fork.powerdns.com> Mail-Followup-To: bert hubert , Ethan Blanton , netdev@oss.sgi.com References: <20010821180553.A21415@fork.powerdns.com> <20010821122022.A20737@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010821122022.A20737@localhost.localdomain>; from eblanton@cs.ohiou.edu on Tue, Aug 21, 2001 at 12:20:22PM -0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 651 Lines: 18 On Tue, Aug 21, 2001 at 12:20:22PM -0400, Ethan Blanton wrote: > It goes a bit further even than what you are proposing (allowing > complete substitution of crypotgraphic ID for the host IP in most > circumstances), but it is a *very* good idea. I'm not sure I agree > with all the details at this stage, but the WG hasn't even been formed > yet, so there is a long way to go. :-) Looks somewhat interesting, though still pretty complex. SPS is very much a 'now' thing, but perhaps the HIP people can learn from the mistakes we will obviously make. Thanks for pointing me to this - it has already been useful reading their ideas. Regards, bert From owner-netdev@oss.sgi.com Tue Aug 21 21:25:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7M4Pq919771 for netdev-outgoing; Tue, 21 Aug 2001 21:25:52 -0700 Received: from almesberger.net (IDENT:root@lsb-catv-1-p021.vtxnet.ch [212.147.5.21]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7M4Pn919768 for ; Tue, 21 Aug 2001 21:25:49 -0700 Received: (from almesber@localhost) by almesberger.net (8.9.3/8.9.3) id GAA13093; Wed, 22 Aug 2001 06:25:43 +0200 Date: Wed, 22 Aug 2001 06:25:43 +0200 From: Werner Almesberger To: bert hubert Cc: netdev@oss.sgi.com Subject: Re: Simple Packet Signing Message-ID: <20010822062543.E27708@almesberger.net> References: <20010821180553.A21415@fork.powerdns.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010821180553.A21415@fork.powerdns.com>; from ahu@ds9a.nl on Tue, Aug 21, 2001 at 06:05:53PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 979 Lines: 23 bert hubert wrote: > For more rationale, see the URL. I would very much appreciate your input. Is > this a wise idea? Are there better ways to achieve this, are people already > working on this (besides IPSEC)? etc et. You can set up SSH such that it only looks at a key, not at the IP address (well, it looks at it briefly, but look away if it doesn't like what it sees). You can either just copy the public host key of your dynamic systems to $HOME/.ssh/authorized_keys on your server (if you trust every user on those dynamic systems), or - better - generate new keys for all trusted users on those dynamic hosts with ssh-keygen and use it with ssh -i. If you want, you can then also run PPP over SSH to build your own little VPN. - Werner -- _________________________________________________________________________ / Werner Almesberger, Lausanne, CH wa@almesberger.net / /_http://icawww.epfl.ch/almesberger/_____________________________________/ From owner-netdev@oss.sgi.com Wed Aug 22 05:21:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7MCL6A27689 for netdev-outgoing; Wed, 22 Aug 2001 05:21:06 -0700 Received: from whatever.local ([65.10.228.207]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7MCL4927686 for ; Wed, 22 Aug 2001 05:21:05 -0700 Received: (qmail 1834 invoked by uid 513); 22 Aug 2001 00:29:08 -0000 From: chuckw@ieee.org Date: Tue, 21 Aug 2001 20:29:08 -0400 To: netdev@oss.sgi.com Subject: Re: Questions about the tcp code in 2.4.9 Message-ID: <20010821202908.A1367@ieee.org> Mail-Followup-To: netdev@oss.sgi.com References: <20010820131924.A12155@ieee.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010820131924.A12155@ieee.org>; from chuckw@ieee.org on Mon, Aug 20, 2001 at 01:19:24PM -0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 32 Lines: 3 Thanks for the comments. Chuck From owner-netdev@oss.sgi.com Wed Aug 22 14:44:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7MLipF07039 for netdev-outgoing; Wed, 22 Aug 2001 14:44:51 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7MLiod07036 for ; Wed, 22 Aug 2001 14:44:50 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id OAA00527 for ; Wed, 22 Aug 2001 14:44:46 -0700 (PDT) mail_from (samudrala@us.ibm.com) Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id QAA36006 for ; Wed, 22 Aug 2001 16:37:28 -0500 Received: from w-sridhar2.des.beaverton.ibm.com (w-sridhar2.des.beaverton.ibm.com [9.47.18.20]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f7MLdbY94236 for ; Wed, 22 Aug 2001 17:39:37 -0400 Date: Wed, 22 Aug 2001 14:39:39 -0700 (PDT) From: Sridhar Samudrala X-Sender: sridhar@w-sridhar2.des.sequent.com To: netdev@oss.sgi.com Subject: [PATCH] Simple bug fix in tcp_getsockopt(): TCP_DEFER_ACCEPT Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 614 Lines: 17 getsockopt(TCP_DEFER_ACCEPT) returns value in terms of timer ticks instead of seconds. --- tcp.c.orig Wed Aug 22 14:20:10 2001 +++ tcp.c Wed Aug 22 14:20:41 2001 @@ -2361,7 +2361,7 @@ val = (val ? : sysctl_tcp_fin_timeout)/HZ; break; case TCP_DEFER_ACCEPT: - val = tp->defer_accept == 0 ? 0 : (TCP_TIMEOUT_INIT<<(tp->defer_accept-1)); + val = tp->defer_accept == 0 ? 0 : ((TCP_TIMEOUT_INIT/HZ)<<(tp->defer_accept-1)); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; Thanks Sridhar From owner-netdev@oss.sgi.com Thu Aug 23 20:36:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7O3aO500437 for netdev-outgoing; Thu, 23 Aug 2001 20:36:24 -0700 Received: from whatever.local ([65.10.228.207]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7O3aNd00434 for ; Thu, 23 Aug 2001 20:36:23 -0700 Received: (qmail 4459 invoked by uid 513); 23 Aug 2001 15:44:21 -0000 From: chuckw@ieee.org Date: Thu, 23 Aug 2001 11:44:21 -0400 To: netdev@oss.sgi.com Subject: sk_buff question Message-ID: <20010823114421.A4454@ieee.org> Mail-Followup-To: netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 123 Lines: 6 Hello again, I was wondering if anyone knew why in sk_buff the next and prev pointers _need_ to be first? Thanks, Chuck From owner-netdev@oss.sgi.com Thu Aug 23 20:52:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7O3qtb01384 for netdev-outgoing; Thu, 23 Aug 2001 20:52:55 -0700 Received: from netbank.com.br (IDENT:postfix@garrincha.netbank.com.br [200.203.199.88]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7O3qrd01380 for ; Thu, 23 Aug 2001 20:52:54 -0700 Received: from brinquedo.distro.conectiva (1-246.ctame701-2.telepar.net.br [200.181.138.246]) by netbank.com.br (Postfix) with ESMTP id CD18846812; Fri, 24 Aug 2001 00:50:49 -0300 (BRST) Received: by brinquedo.distro.conectiva (Postfix, from userid 501) id 9E6B5C44B; Fri, 24 Aug 2001 00:52:48 -0300 (BRT) Date: Fri, 24 Aug 2001 00:52:48 -0300 From: Arnaldo Carvalho de Melo To: chuckw@ieee.org Cc: netdev@oss.sgi.com Subject: Re: sk_buff question Message-ID: <20010824005248.A1022@conectiva.com.br> References: <20010823114421.A4454@ieee.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010823114421.A4454@ieee.org>; from chuckw@ieee.org on Thu, Aug 23, 2001 at 11:44:21AM -0400 X-Url: http://advogato.org/person/acme Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 298 Lines: 9 Em Thu, Aug 23, 2001 at 11:44:21AM -0400, chuckw@ieee.org escreveu: > Hello again, > I was wondering if anyone knew why in sk_buff the next and prev > pointers _need_ to be first? look at sk_buff_head definition and at functions using sk_buff_head, the casts, etc, and you'll know 8) - Arnaldo From owner-netdev@oss.sgi.com Fri Aug 24 08:44:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7OFiRk16944 for netdev-outgoing; Fri, 24 Aug 2001 08:44:27 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7OFi9d16926 for ; Fri, 24 Aug 2001 08:44:09 -0700 Received: from southrelay02.raleigh.ibm.com (southrelay02.raleigh.ibm.com [9.37.3.209]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id KAA125838; Fri, 24 Aug 2001 10:41:08 -0500 Received: from d04nm106.raleigh.ibm.com (d04nm106.raleigh.ibm.com [9.67.228.133]) by southrelay02.raleigh.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f7OFhF8197082; Fri, 24 Aug 2001 11:43:15 -0400 Importance: Normal Subject: incorporating bonding code changes into the kernel To: netdev@oss.sgi.com Cc: Andi Kleen , bonding-devel@lists.sourceforge.net X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 Message-ID: From: "Janice Girouard" Date: Fri, 24 Aug 2001 10:43:14 -0500 X-MIMETrack: Serialize by Router on D04NM106/04/M/IBM(Release 5.0.6 |December 14, 2000) at 08/24/2001 11:43:15 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 5736 Lines: 130 Yesterday I added a patch out at www.sourceforge.net/projects/bonding in release Bonding 2.4.6 2001-08-23. The code now uses the new MII ioctl calls suggested by Andi Kleen earlier. My question is, given this seemed to be the only significant issue resulting from my earlier post, is there something I need to do to get these changes into the standard kernel? I know the MAINTAINERS file lists this mailing list as the maintainer. The summary of changes at this site that have not been accepted into the kernel (to my knowledge) is included below. There are quite a few changes. Janice ________________________________________________________________________________ * * Changes: * Arnaldo Carvalho de Melo * - fix leaks on failure at bond_init * * 2000/09/30 - Willy Tarreau * - added trivial code to release a slave device. * - fixed security bug (CAP_NET_ADMIN not checked) * - implemented MII link monitoring to disable dead links : * All MII capable slaves are checked every milliseconds * (100 ms seems good). This value can be changed by passing it to * insmod. A value of zero disables the monitoring (default). * - fixed an infinite loop in bond_xmit_roundrobin() when there's no * good slave. * - made the code hopefully SMP safe * * 2000/10/03 - Willy Tarreau * - optimized slave lists based on relevant suggestions from Thomas Davis * - implemented active-backup method to obtain HA with two switches: * stay as long as possible on the same active interface, while we * also monitor the backup one (MII link status) because we want to know * if we are able to switch at any time. ( pass "mode=1" to insmod ) * - lots of stress testings because we need it to be more robust than the * wires ! :-> * * 2000/10/09 - Willy Tarreau * - added up and down delays after link state change. * - optimized the slaves chaining so that when we run forward, we never * repass through the bond itself, but we can find it by searching * backwards. Renders the deletion more difficult, but accelerates the * scan. * - smarter enslaving and releasing. * - finer and more robust SMP locking * * 2000/10/17 - Willy Tarreau * - fixed two potential SMP race conditions * * 2000/10/18 - Willy Tarreau * - small fixes to the monitoring FSM in case of zero delays * * 2000/11/01 - Willy Tarreau * - fixed first slave not automatically used in trunk mode. * * 2000/11/10 : spelling of "EtherChannel" corrected. * * 2000/11/13 : fixed a race condition in case of concurrent accesses to ioctl(). * * 2000/12/16 : fixed improper usage of rtnl_exlock_nowait(). * * 2001/1/3 - Chad N. Tindel * - The bonding driver now simulates MII status monitoring, just like * a normal network device. It will show that the link is down iff * every slave in the bond shows that their links are down. If at least * one slave is up, the bond's MII status will appear as up. * * 2001/2/7 - Chad N. Tindel * - Applications can now query the bond from user space to get * information which may be useful. They do this by calling * the BOND_INFO_QUERY ioctl. Once the app knows how many slaves * are in the bond, it can call the BOND_SLAVE_INFO_QUERY ioctl to * get slave specific information (# link failures, etc). See * for more details. The structs of interest * are ifbond and ifslave. * * 2001/4/5 - Chad N. Tindel * - Ported to 2.4 Kernel * * 2001/5/2 - Jeffrey E. Mast * - When a device is detached from a bond, the slave device is no longer * left thinking that is has a master. * * 2001/5/16 - Jeffrey E. Mast * - memset did not appropriately initialized the bond rw_locks. Used * rwlock_init to initialize to unlocked state to prevent deadlock when * first attempting a lock * - Called SET_MODULE_OWNER for bond device * * 2001/5/17 - Tim Anderson * - 2 paths for releasing for slave release; 1 through ioctl * and 2) through close. Both paths need to release the same way. * - the free slave in bond release is changing slave status before * the free. The netdev_set_master() is intended to change slave state * so it should not be done as part of the release process. * - Simple rule for slave state at release: only the active in A/B and * only one in the trunked case. * * 2001/6/01 - Tim Anderson * - Now call dev_close when releasing a slave so it doesn't screw up * out routing table. * * 2001/6/01 - Chad N. Tindel * - Added /proc support for getting bond and slave information. * Information is in /proc/net//info. * - Changed the locking when calling bond_close to prevent deadlock. * * 2001/8/05 - Janice Girouard * - correct problem where refcnt of slave is not incremented in bond_ioctl * so the system hangs when halting. * - correct locking problem when unable to malloc in bond_enslave. * - adding bond_xmit_xor logic. * * 2001/8/23 - Janice Girouard * - bzero netdev during initialization, correcting oops * - convert ioctl calls to the new SIOCGMIIPHY calls */ From owner-netdev@oss.sgi.com Mon Aug 27 15:54:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7RMs5005343 for netdev-outgoing; Mon, 27 Aug 2001 15:54:05 -0700 Received: from whatever.local ([65.10.228.207]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7RMs3d05339 for ; Mon, 27 Aug 2001 15:54:03 -0700 Received: (qmail 31786 invoked by uid 513); 27 Aug 2001 11:01:52 -0000 From: chuckw@ieee.org Date: Mon, 27 Aug 2001 07:01:52 -0400 To: netdev@oss.sgi.com Subject: Re: sk_buff question Message-ID: <20010827070152.B31757@ieee.org> Mail-Followup-To: netdev@oss.sgi.com References: <20010823114421.A4454@ieee.org> <20010824005248.A1022@conectiva.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010824005248.A1022@conectiva.com.br>; from acme@conectiva.com.br on Fri, Aug 24, 2001 at 12:52:48AM -0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 423 Lines: 16 I see ... Thanks much. Chuck On Fri, Aug 24, 2001 at 12:52:48AM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Aug 23, 2001 at 11:44:21AM -0400, chuckw@ieee.org escreveu: > > Hello again, > > I was wondering if anyone knew why in sk_buff the next and prev > > pointers _need_ to be first? > > look at sk_buff_head definition and at functions using sk_buff_head, the > casts, etc, and you'll know 8) > > - Arnaldo From owner-netdev@oss.sgi.com Mon Aug 27 19:07:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7S27Jc08558 for netdev-outgoing; Mon, 27 Aug 2001 19:07:19 -0700 Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7S279d08555 for ; Mon, 27 Aug 2001 19:07:10 -0700 Received: from westrelay03.boulder.ibm.com (westrelay03.boulder.ibm.com [9.99.140.24]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id WAA20550 for ; Mon, 27 Aug 2001 22:04:57 -0400 Received: from gateway.beaverton.ibm.com (gateway.beaverton.ibm.com [138.95.180.1]) by westrelay03.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7S278c136928 for ; Mon, 27 Aug 2001 20:07:08 -0600 Received: from eng4.beaverton.ibm.com (eng4.beaverton.ibm.com [138.95.7.64]) by gateway.beaverton.ibm.com (8.10.0.Beta10/8.8.5) with ESMTP id f7S277K01319 for ; Mon, 27 Aug 2001 19:07:07 -0700 (PDT) Received: (from nivedita@localhost) by eng4.beaverton.ibm.com (8.10.0.Beta10/8.8.5/token.aware-1.2) id f7S275J18467 for netdev@oss.sgi.com; Mon, 27 Aug 2001 19:07:05 -0700 (PDT) From: Nivedita Singhvi Message-Id: <200108280207.f7S275J18467@eng4.beaverton.ibm.com> Subject: [patch] 2.4.9 new socket option TCP_DELACK (RFC) To: netdev@oss.sgi.com Date: Mon, 27 Aug 2001 19:07:05 -0700 (PDT) X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3743 Lines: 110 This patch adds a new socket option TCP_DELACK, which disables the sending of delayed acks altogether on that socket connection. Built against 2.4.9. This was to test with and play with, more than anything else, but figured others might be interested. Possibly useful in improving performance in the small packet and highly interactive app environment. Arguments to include it in the source tree: - possibly better performance in certain environments - _very_ convenient for academic, benchmarking, and certain other environments to use - portability: its available on some other OS's, and so some apps that make use of it might want to continue to use it on Linux when ported here.. - makes the tcp engine more configurable based on what the user wants to do, and the easier it is to fiddle with things, the more learning, innovation that happens on the OS, and I'd like to encourage that happening on Linux ;) - somewhat the same rationale behind allowing the users to to disable delayed sends (TCP_NODELAY) Cons: - feature bloat (but its small, minimized diffs) - makes the tcp engine more configurable based on what the user wants to do, but the user needs to know what they're doing, given extra ack load (but ditto with other options, really) I'd be interested in hearing if others found it useful, or have any comments or feedback.. thanks, Nivedita ----- diff -urN linux-2.4.9/include/linux/tcp.h linux-2.4.9-da2/include/linux/tcp.h --- linux-2.4.9/include/linux/tcp.h Wed Aug 15 14:21:32 2001 +++ linux-2.4.9-da2/include/linux/tcp.h Sat Aug 25 10:44:01 2001 @@ -127,6 +127,7 @@ #define TCP_WINDOW_CLAMP 10 /* Bound advertised window */ #define TCP_INFO 11 /* Information about this connection. */ #define TCP_QUICKACK 12 /* Block/reenable quick acks */ +#define TCP_NODELACK 13 /* Disable delayed acks */ #define TCPI_OPT_TIMESTAMPS 1 #define TCPI_OPT_SACK 2 diff -urN linux-2.4.9/include/net/sock.h linux-2.4.9-da2/include/net/sock.h --- linux-2.4.9/include/net/sock.h Wed Aug 15 14:21:32 2001 +++ linux-2.4.9-da2/include/net/sock.h Sat Aug 25 10:44:01 2001 @@ -278,6 +278,7 @@ __u32 lrcvtime; /* timestamp of last received data packet*/ __u16 last_seg_size; /* Size of last incoming segment */ __u16 rcv_mss; /* MSS used for delayed ACK decisions */ + __u8 no_delack; /* no delayed acks at all */ } ack; /* Data for direct copy to user */ diff -urN linux-2.4.9/net/ipv4/tcp.c linux-2.4.9-da2/net/ipv4/tcp.c --- linux-2.4.9/net/ipv4/tcp.c Wed Aug 15 01:22:17 2001 +++ linux-2.4.9-da2/net/ipv4/tcp.c Sat Aug 25 11:11:48 2001 @@ -2305,6 +2305,10 @@ } break; + case TCP_NODELACK: + tp->ack.no_delack = val > 0; + break; + default: err = -ENOPROTOOPT; break; @@ -2431,6 +2435,10 @@ case TCP_QUICKACK: val = !tp->ack.pingpong; break; + case TCP_NODELACK: + val = tp->ack.no_delack; + break; + default: return -ENOPROTOOPT; }; diff -urN linux-2.4.9/net/ipv4/tcp_input.c linux-2.4.9-da2/net/ipv4/tcp_input.c --- linux-2.4.9/net/ipv4/tcp_input.c Wed Aug 15 01:22:17 2001 +++ linux-2.4.9-da2/net/ipv4/tcp_input.c Sat Aug 25 10:44:01 2001 @@ -3015,7 +3015,8 @@ struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp); /* More than one full frame received... */ - if (((tp->rcv_nxt - tp->rcv_wup) > tp->ack.rcv_mss + if (tp->ack.no_delack || + ((tp->rcv_nxt - tp->rcv_wup) > tp->ack.rcv_mss /* ... and right edge of window advances far enough. * (tcp_recvmsg() will send ACK otherwise). Or... */ @@ -3351,7 +3352,8 @@ } if (eaten) { - if (tcp_in_quickack_mode(tp)) { + if (tp->ack.no_delack || + tcp_in_quickack_mode(tp)) { tcp_send_ack(sk); } else { tcp_send_delayed_ack(sk); From owner-netdev@oss.sgi.com Tue Aug 28 06:16:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7SDGF221857 for netdev-outgoing; Tue, 28 Aug 2001 06:16:15 -0700 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7SDGCd21854 for ; Tue, 28 Aug 2001 06:16:13 -0700 Received: (qmail 4792 invoked by uid 89); 28 Aug 2001 13:16:05 -0000 Message-ID: <20010828131605.4791.qmail@titan.bieringer.de> From: "Peter Bieringer" To: users@ipv6.org Cc: linux-ipv6@list.f00f.org, netdev@oss.sgi.com, usagi-users@linux-ipv6.org Subject: Who is 2001:230:201:1:203:31ff:fe4b:4000, it's ping-reply flooding me Date: Tue, 28 Aug 2001 13:16:05 GMT Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 478 Lines: 17 Hi, I got a ICMPv6 ping "echo reply" flood from that host to my tunnel: Who the hell is using an IPv6 address out of my space as source address? Looks like IPv6 gateways need anti spoofing filters! 15:10:17.567312 128.176.191.66 > 195.226.187.50: 2001:230:201:1:203:31ff:fe4b:4000 > 3ffe:400:100:f101::40: icmp6: echo reply (encap) 15:10:17.567669 195.226.187.50 > 128.176.191.66: 2001:230:201:1:203:31ff:fe4b:4000 > 3ffe:400:100:f101::40: icmp6: echo reply (encap) Peter From owner-netdev@oss.sgi.com Tue Aug 28 06:59:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7SDx8r23260 for netdev-outgoing; Tue, 28 Aug 2001 06:59:08 -0700 Received: from purgatory.unfix.org (postfix@purgatory.xs4all.nl [194.109.237.229]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7SDx1d23256 for ; Tue, 28 Aug 2001 06:59:01 -0700 Received: from cyan (intranet.azr.nl [::ffff:156.83.254.8]) by purgatory.unfix.org (Postfix) with ESMTP id 504933129; Tue, 28 Aug 2001 15:58:55 +0200 (CEST) From: "Jeroen Massar" To: "'Peter Bieringer'" , Cc: , , Subject: RE: Who is 2001:230:201:1:203:31ff:fe4b:4000, it's ping-reply flooding me Date: Tue, 28 Aug 2001 15:58:45 +0200 Organization: Unfix Message-ID: <004001c12fc9$8ec01710$2a1410ac@kei.azr.nl> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2479.0006 Importance: Normal In-Reply-To: <20010828131605.4791.qmail@titan.bieringer.de> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2764 Lines: 94 Peter Bieringer : > I got a ICMPv6 ping "echo reply" flood from that host to my tunnel: > > Who the hell is using an IPv6 address out of my space as source address? > Looks like IPv6 gateways need anti spoofing filters! Ofcourse it needs it > 15:10:17.567312 128.176.191.66 > 195.226.187.50: > 2001:230:201:1:203:31ff:fe4b:4000 > 3ffe:400:100:f101::40: icmp6: echo > reply (encap) from inet -> you > 15:10:17.567669 195.226.187.50 > 128.176.191.66: > 2001:230:201:1:203:31ff:fe4b:4000 > 3ffe:400:100:f101::40: icmp6: echo > reply (encap) from you -> inet.... which would mean that the ::40 is on the outside of your tunnel I presume... :) And where are the echo requests? :) traceroute6 to 2001:230:201:1:203:31ff:fe4b:4000 (2001:230:201:1:203:31ff:fe4b:4000) from 2001:6e0::250:4ff:fe4a:7708, 30 hops max, 16 byte packets 1 Amsterdam.core.ipv6.intouch.net (2001:6e0::2) 1.157 ms 1.237 ms 0.875 ms 2 2001:200:0:4402::2 (2001:200:0:4402::2) 79.461 ms 78.731 ms 79.332 ms 3 3ffe:2e00:e:fffa::1 (3ffe:2e00:e:fffa::1) 529.963 ms 931.205 ms 858.571 ms 4 2001:230:e:a::2 (2001:230:e:a::2) 663.898 ms * 511.524 ms hmmm $ whois -h whois.6bone.net 3ffe:2e00:e:fffa::1 inet6num: 3FFE:2E00::/24 netname: ETRI descr: pTLA delegation for the 6bone country: KR admin-c: MS3-6BONE tech-c: MS3-6BONE remarks: This object is automatically converted from the RIPE181 registry mnt-by: MNT-ETRI changed: mkshin@pec.etri.re.kr 19980723 changed: auto-dbm@whois.6bone.net 20010117 source: 6BONE $ whois -h whois.apnic.net 2001:230:201:1:203:31ff:fe4b:4000 % Rights restricted by copyright. See http://www.apnic.net/db/dbcopyright.html % (whois7.apnic.net) inet6num: 2001:230:201::/48 netname: OPICOM-KRV6-ETRI-20000622 descr: OPICOM IPv6 Network country: KR admin-c: MS75-AP tech-c: MS75-AP status: NLA notify: mkshin@pec.etri.re.kr mnt-by: MAINT-KR-ETRI changed: mkshin@pec.etri.re.kr 20000622 source: APNIC person: Myung-Ki Shin address: 161 Kajong-Dong, Yusong-Gu, address: Taejon, 305-350, Korea country: KR phone: +82-42-860-4847 fax-no: +82-42-861-5404 e-mail: mkshin@pec.etri.re.kr nic-hdl: MS75-AP mnt-by: MAINT-KR-ETRI changed: mkshin@pec.etri.re.kr 20000309 source: APNIC Also found on http://www.krv6.net/whois.htm with google... Hope this little extra info helps... Oh btw the other registries I always try are: whois.[apnic.net|arin.org|ripe.net] these cover the most space... and if it isn't in there check http://www.apnic.net/maps/tld-list.html for the tld's :) And don't forget to contact your upstreams if you want to stop it this instant... Greets, Jeroen From owner-netdev@oss.sgi.com Tue Aug 28 07:01:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7SE1Du23486 for netdev-outgoing; Tue, 28 Aug 2001 07:01:13 -0700 Received: from mail.ndsoftware.net (ns207.ovh.net [213.186.34.70]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7SE1Ad23483 for ; Tue, 28 Aug 2001 07:01:10 -0700 Received: (qmail 16827 invoked from network); 28 Aug 2001 14:10:24 -0000 Received: from nerim.fr.surfnetconnexion.net (HELO mail.localnet.ndsoftware.net) (62.4.18.114) by mail.ndsoftware.net with RC4-MD5 encrypted SMTP IPv6 ready; 28 Aug 2001 14:10:27 -0000 Received: from billy ([10.1.2.1]) by mail.localnet.ndsoftware.net with Microsoft SMTPSVC(5.0.2195.3779); Tue, 28 Aug 2001 16:01:16 +0200 From: "NDSoftware" To: "'Peter Bieringer'" , Cc: , , Subject: RE: Who is 2001:230:201:1:203:31ff:fe4b:4000, it's ping-reply flooding me Date: Tue, 28 Aug 2001 16:01:17 +0200 Message-ID: <000701c12fc9$e82ade20$0102010a@localnet.ndsoftware.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4807.1700 In-Reply-To: <20010828131605.4791.qmail@titan.bieringer.de> X-OriginalArrivalTime: 28 Aug 2001 14:01:16.0920 (UTC) FILETIME=[E7EEBF80:01C12FC9] Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1118 Lines: 42 Hi Peter, ETRI-KRNIC-KR-19991124 2001:230::/35 You use a firewall for log that ? What's firewall do you use (what's your OS) ? Thanks -----Original Message----- From: owner-users@ipv6.org [mailto:owner-users@ipv6.org] On Behalf Of Peter Bieringer Sent: Tuesday, August 28, 2001 3:16 PM To: users@ipv6.org Cc: linux-ipv6@list.f00f.org; netdev@oss.sgi.com; usagi-users@linux-ipv6.org Subject: Who is 2001:230:201:1:203:31ff:fe4b:4000, it's ping-reply flooding me Hi, I got a ICMPv6 ping "echo reply" flood from that host to my tunnel: Who the hell is using an IPv6 address out of my space as source address? Looks like IPv6 gateways need anti spoofing filters! 15:10:17.567312 128.176.191.66 > 195.226.187.50: 2001:230:201:1:203:31ff:fe4b:4000 > 3ffe:400:100:f101::40: icmp6: echo reply (encap) 15:10:17.567669 195.226.187.50 > 128.176.191.66: 2001:230:201:1:203:31ff:fe4b:4000 > 3ffe:400:100:f101::40: icmp6: echo reply (encap) Peter --------------------------------------------------------------------- The IPv6 Users Mailing List Unsubscribe by sending "unsubscribe users" to majordomo@ipv6.org From owner-netdev@oss.sgi.com Tue Aug 28 11:59:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7SIxGi31507 for netdev-outgoing; Tue, 28 Aug 2001 11:59:16 -0700 Received: from devserv.devel.redhat.com (nat-pool-meridian.redhat.com [199.183.24.200]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7SIxDd31504 for ; Tue, 28 Aug 2001 11:59:13 -0700 Received: from toomuch.toronto.redhat.com (IDENT:bcrl@toomuch.toronto.redhat.com [172.16.14.22]) by devserv.devel.redhat.com (8.11.0/8.11.0) with ESMTP id f7SIxCO14030 for ; Tue, 28 Aug 2001 14:59:13 -0400 Date: Tue, 28 Aug 2001 14:59:11 -0400 (EDT) From: Ben LaHaise X-X-Sender: To: Subject: bug: excess context switches on read() of tcp sockets Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1754 Lines: 40 Hello all, In testing TCP throughput with a pair of National Semiconductor based gige cards (using zerocopy send and receive) the kernel seems to be performing far too many context switchs in the receiver, leading to only a 75MB/s transfer rate. sys.net.ipv.tcp_[rw]mem on both machines is set to ~32MB right now. The transmit program inner loop is basically: while ((i = sendfile(... 16KB)) > 0 && (sent += i) < 2000000000); which works out quite well. The receiver code looks like: while (read(s, buf, 2048*1024) > 0); I tried various buffer sizes from 4K to 32MB, but the impact on performance was negligable. What is constant is that there are 2 context switches per interrupt -> odd. Here's some vmstat output during a run: sender 1 0 0 0 3831264 134320 10364 0 0 0 0 2575 26 1 24 75 0 0 0 0 3831264 134320 10364 0 0 0 0 2554 28 0 35 65 0 0 0 0 3831264 134320 10364 0 0 0 0 2552 27 0 73 27 0 0 0 0 3831264 134320 10364 0 0 0 8 2545 29 0 17 83 receiver 1 0 0 0 84660 1372 12628 0 0 0 0 2629 5015 0 85 15 1 0 0 0 84660 1372 12628 0 0 0 0 2626 5015 2 54 44 1 0 0 0 84660 1372 12628 0 0 0 0 2627 5015 0 96 4 1 0 0 0 84660 1372 12628 0 0 0 0 2625 5010 0 87 13 vmstat shows that the transmitter only wakes up a couple of dozen times per second -- about what's expected given the size of the tcp window. The receiver is another story entirely. Does anyone have any idea as to what might be going on? This is with 2.4.9-ac2, but 2.4.8-ac6 shows the same behaviour. One of the 2.4.3 kernels I tried (ia64) seems to be much quicker. -ben From owner-netdev@oss.sgi.com Tue Aug 28 12:06:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7SJ6xo31764 for netdev-outgoing; Tue, 28 Aug 2001 12:06:59 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7SJ6vd31761 for ; Tue, 28 Aug 2001 12:06:57 -0700 Received: by colin.muc.de id <140554-2>; Tue, 28 Aug 2001 21:07:24 +0200 Message-ID: <20010828210719.08488@colin.muc.de> Date: Tue, 28 Aug 2001 21:07:19 +0200 From: Andi Kleen To: Ben LaHaise Cc: netdev@oss.sgi.com Subject: Re: bug: excess context switches on read() of tcp sockets References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from Ben LaHaise on Tue, Aug 28, 2001 at 08:59:11PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 525 Lines: 14 On Tue, Aug 28, 2001 at 08:59:11PM +0200, Ben LaHaise wrote: > vmstat shows that the transmitter only wakes up a couple of dozen times > per second -- about what's expected given the size of the tcp window. The > receiver is another story entirely. Does anyone have any idea as to what > might be going on? This is with 2.4.9-ac2, but 2.4.8-ac6 shows the same > behaviour. One of the 2.4.3 kernels I tried (ia64) seems to be much > quicker. The likely suspect is ksoftirqd. Add counters to the ksoftirqd loop. -Andi From owner-netdev@oss.sgi.com Tue Aug 28 12:33:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7SJXIX32482 for netdev-outgoing; Tue, 28 Aug 2001 12:33:18 -0700 Received: from devserv.devel.redhat.com (nat-pool-meridian.redhat.com [199.183.24.200]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7SJXFd32478 for ; Tue, 28 Aug 2001 12:33:15 -0700 Received: from toomuch.toronto.redhat.com (IDENT:bcrl@toomuch.toronto.redhat.com [172.16.14.22]) by devserv.devel.redhat.com (8.11.0/8.11.0) with ESMTP id f7SJXDO18619; Tue, 28 Aug 2001 15:33:13 -0400 Date: Tue, 28 Aug 2001 15:33:12 -0400 (EDT) From: Ben LaHaise X-X-Sender: To: Andi Kleen cc: Subject: Re: bug: excess context switches on read() of tcp sockets In-Reply-To: <20010828210719.08488@colin.muc.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1108 Lines: 50 On Tue, 28 Aug 2001, Andi Kleen wrote: > The likely suspect is ksoftirqd. Add counters to the ksoftirqd loop. That was my original thinking, but I applied the patch below. It doesn't seem to be triggering the "oh, bother" message at all, so our friendly scapegoat is not to be. What's the next dead chicken I should offer up? -ben --- /md0/kernels/2.4/v2.4.9-ac2/kernel/softirq.c Mon Aug 13 15:12:09 2001 +++ kernel/softirq.c Tue Aug 28 15:24:48 2001 @@ -63,7 +63,7 @@ int cpu = smp_processor_id(); __u32 pending; long flags; - __u32 mask; + int i = 0; if (in_interrupt()) return; @@ -75,7 +75,6 @@ if (pending) { struct softirq_action *h; - mask = ~pending; local_bh_disable(); restart: /* Reset the pending bitmask before enabling irqs */ @@ -95,14 +94,15 @@ local_irq_disable(); pending = softirq_pending(cpu); - if (pending & mask) { - mask &= ~pending; + if (pending && (i++ < 16)) goto restart; - } + __local_bh_enable(); - if (pending) + if (pending) { + printk("oh bother\n"); wakeup_softirqd(cpu); + } } local_irq_restore(flags); From owner-netdev@oss.sgi.com Tue Aug 28 16:53:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7SNrnU06648 for netdev-outgoing; Tue, 28 Aug 2001 16:53:49 -0700 Received: from e34.bld.us.ibm.com (e34.co.us.ibm.com [32.97.110.132]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7SNrjd06645 for ; Tue, 28 Aug 2001 16:53:45 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e34.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id TAA86276; Tue, 28 Aug 2001 19:51:26 -0400 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7SNqCH153722; Tue, 28 Aug 2001 17:52:13 -0600 From: "David Stevens" Importance: Normal Subject: [Patch 1of2] IPv6 addrconf_forward_change() bug To: netdev@oss.sgi.com, usagi-users@linux-ipv6.org X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: Date: Tue, 28 Aug 2001 16:56:37 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/28/2001 05:56:45 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2735 Lines: 61 File net/ipv6/addrconf.c has the following code: static int addrconf_sysctl_forward(ctl_table *ctl, int write, struct file * filp, void *buffer, size_t *lenp) { ... if (valp != &ipv6_devconf.forwarding) { struct net_device *dev = dev_get_by_index(ctl->ctl_name) The problem is, at this point, "ctl" points to the child sysctl entry (procname "forwarding", ctl_name is 1 (always)). The parent node has ifindex set, but not the child. On my system, the dev returned (no matter what device the sysctl was changing) is "lo", which has ifindex 1. Below is a patch that saves a pointer to in6_dev in ctl->extra1 and uses that as the argument to addrconf_forward_change(). The existing code doesn't actually use the idev pointer for anything meaningful, and the case where it'd matter most ("all") doesn't do the look-up. This would be most broken on existing systems if an ifindex 1 device didn't exist. Then, it appears a change on any interface would affect all of them. The new code passes a pointer, rather than an ifindex, in the unused "extra1" field. Note that the cnf data is embedded in the in6_dev structure, so no extra reference count is needed, and that avoids the call and search by index in dev_get_by_index(). Because no new reference is acquired, no in6_dev_put() on exit. Another patch (to follow) actually needs a valid idev pointer here, which is how I came across it. Patch line numbers are relative to 2.4.9. +-DLS --- linux-2.4.9/net/ipv6/addrconf.c Tue Aug 7 08:30:50 2001 +++ linux-2.4.9NEW/net/ipv6/addrconf.c Tue Aug 28 14:49:48 2001 @@ -1836,11 +1836,7 @@ struct inet6_dev *idev = NULL; if (valp != &ipv6_devconf.forwarding) { - struct net_device *dev = dev_get_by_index(ctl->ctl_name); - if (dev) { - idev = in6_dev_get(dev); - dev_put(dev); - } + idev = (struct net_device *)ctl->extra1; if (idev == NULL) return ret; } else @@ -1850,8 +1846,6 @@ if (*valp) rt6_purge_dflt_routers(0); - if (idev) - in6_dev_put(idev); } return ret; @@ -1928,6 +1922,7 @@ for (i=0; iaddrconf_vars)/sizeof(t->addrconf_vars[0])-1; i++) { t->addrconf_vars[i].data += (char*)p - (char*)&ipv6_devconf; t->addrconf_vars[i].de = NULL; + t->addrconf_vars[i].extra1 = idev; /* embedded; no ref */ } if (dev) { t->addrconf_dev[0].procname = dev->name; From owner-netdev@oss.sgi.com Tue Aug 28 16:55:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7SNtUm06713 for netdev-outgoing; Tue, 28 Aug 2001 16:55:30 -0700 Received: from e33.bld.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7SNtQd06707 for ; Tue, 28 Aug 2001 16:55:27 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e33.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id SAA132960; Tue, 28 Aug 2001 18:53:12 -0500 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7SNs6H246274; Tue, 28 Aug 2001 17:54:06 -0600 From: "David Stevens" Importance: Normal Subject: [Patch 2of2] IPv6 routers don't join/leave the all routers group To: netdev@oss.sgi.com, usagi-users@linux-ipv6.org X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: Date: Tue, 28 Aug 2001 16:58:31 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/28/2001 05:58:34 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1368 Lines: 55 This is a patch to join the "all routers" multicast group when forwarding is turned on for an interface, and to leave it when forwarding is turned off. It requires the addrconf_sysctl_forward() patch I posted previously, as well. Patch is relative to 2.4.9. +-DLS --- linux-2.4.9P1/net/ipv6/addrconf.c Tue Aug 28 15:05:54 2001 +++ linux-2.4.9NEW/net/ipv6/addrconf.c Tue Aug 28 15:45:45 2001 @@ -298,12 +298,33 @@ return idev; } +static void dev_forward_change(struct inet6_dev *idev) +{ + struct net_device *dev; + struct in6_addr addr; + + if (!idev) + return; + dev = idev->dev; + if (!dev || !(dev->flags & IFF_MULTICAST)) + return; + + ipv6_addr_all_routers(&addr); + + if (idev->cnf.forwarding) + ipv6_dev_mc_inc(dev, &addr); + else + ipv6_dev_mc_dec(dev, &addr); +} + static void addrconf_forward_change(struct inet6_dev *idev) { struct net_device *dev; - if (idev) + if (idev) { + dev_forward_change(idev); return; + } read_lock(&dev_base_lock); for (dev=dev_base; dev; dev=dev->next) { @@ -312,6 +333,7 @@ if (idev) idev->cnf.forwarding = ipv6_devconf.forwarding; read_unlock(&addrconf_lock); + dev_forward_change(idev); } read_unlock(&dev_base_lock); } From owner-netdev@oss.sgi.com Tue Aug 28 21:20:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T4KLU12455 for netdev-outgoing; Tue, 28 Aug 2001 21:20:21 -0700 Received: from yue.hongo.wide.ad.jp ([203.178.139.94]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T4KId12452 for ; Tue, 28 Aug 2001 21:20:18 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id NAA32439; Wed, 29 Aug 2001 13:23:10 +0900 To: usagi-users@linux-ipv6.org, dlstevens@us.ibm.com Cc: netdev@oss.sgi.com Subject: Re: [Patch 1of2] IPv6 addrconf_forward_change() bug In-Reply-To: References: X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010829132310K.yoshfuji@linux-ipv6.org> Date: Wed, 29 Aug 2001 13:23:10 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1035 Lines: 26 In article (at Tue, 28 Aug 2001 16:56:37 -0700), "David Stevens" says: > Below is a patch that saves a pointer to in6_dev in ctl->extra1 and > uses that as the argument to addrconf_forward_change(). The existing code applied. thanks. > --- linux-2.4.9/net/ipv6/addrconf.c Tue Aug 7 08:30:50 2001 > +++ linux-2.4.9NEW/net/ipv6/addrconf.c Tue Aug 28 14:49:48 2001 > @@ -1836,11 +1836,7 @@ > struct inet6_dev *idev = NULL; > > if (valp != &ipv6_devconf.forwarding) { > - struct net_device *dev = dev_get_by_index(ctl->ctl_name); > - if (dev) { > - idev = in6_dev_get(dev); > - dev_put(dev); > - } > + idev = (struct net_device *)ctl->extra1; ~~~~~~~~~~~~ inet6_dev * > if (idev == NULL) > return ret; > } else --yoshfuji From owner-netdev@oss.sgi.com Tue Aug 28 21:26:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T4QDo12619 for netdev-outgoing; Tue, 28 Aug 2001 21:26:13 -0700 Received: from yue.hongo.wide.ad.jp ([203.178.139.94]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T4QCd12614 for ; Tue, 28 Aug 2001 21:26:12 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id NAA32463; Wed, 29 Aug 2001 13:29:05 +0900 To: usagi-users@linux-ipv6.org, dlstevens@us.ibm.com Cc: netdev@oss.sgi.com Subject: Re: (usagi-users 00721) [Patch 2of2] IPv6 routers don't join/leave the all routers group In-Reply-To: References: X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010829132905G.yoshfuji@linux-ipv6.org> Date: Wed, 29 Aug 2001 13:29:05 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 503 Lines: 14 In article (at Tue, 28 Aug 2001 16:58:31 -0700), "David Stevens" says: > This is a patch to join the "all routers" multicast group when forwarding >is turned on for an interface, and to leave it when forwarding is turned >off. being applied. thanks. BTW, ip6_forward() in net/ipv6/ip6_output.c checks ipv6_devconf.forwarding only. It seems we should check idev_of_input_device->cnf.forwarding, too... --yoshfuji From owner-netdev@oss.sgi.com Tue Aug 28 22:05:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T55nr13376 for netdev-outgoing; Tue, 28 Aug 2001 22:05:49 -0700 Received: from www.fortuitous.com (cs666824-51.austin.rr.com [66.68.24.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T55ld13371 for ; Tue, 28 Aug 2001 22:05:47 -0700 Received: by www.fortuitous.com (Postfix, from userid 500) id 5364A862; Wed, 29 Aug 2001 00:06:29 -0500 (CDT) Date: Wed, 29 Aug 2001 00:06:29 -0500 From: pac@fortuitous.com To: netdev@oss.sgi.com Subject: tcpdump file.. encrypt? Message-ID: <20010829000629.A10526@bistro.marx> Reply-To: pac@fortuitous.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 419 Lines: 10 Anyone have a public key on netdev that I can use to encrypt and sign a tcpdump file? Should I even bother? -pac -- .--------------------------------------------------------. | Dr. Philip A. Carinhas | pac@fortuitous.com | | Fortuitous Technologies Inc. | http://fortuitous.com | | Linux Consulting & Training | Tel : 1-512-467-2154 | `--------------------------------------------------------' From owner-netdev@oss.sgi.com Tue Aug 28 23:03:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T63j314406 for netdev-outgoing; Tue, 28 Aug 2001 23:03:45 -0700 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T63gd14403 for ; Tue, 28 Aug 2001 23:03:42 -0700 Received: (qmail 18978 invoked from network); 29 Aug 2001 06:03:36 -0000 Received: from pd950f3f4.dip.t-dialin.net (HELO worker.muc.bieringer.de) (217.80.243.244) by mail.bieringer.de with SMTP; 29 Aug 2001 06:03:36 -0000 Date: Wed, 29 Aug 2001 08:04:46 +0200 From: Peter Bieringer To: usagi-users@linux-ipv6.org, dlstevens@us.ibm.com cc: netdev@oss.sgi.com, Pekka Savola Subject: Re: (usagi-users 00725) Re: [Patch 2of2] IPv6 routers don't join/leave the all routers group Message-ID: <15990000.999065086@localhost> In-Reply-To: <20010829132905G.yoshfuji@linux-ipv6.org> References: <20010829132905G.yoshfuji@linux-ipv6.org> X-Mailer: Mulberry/2.1.0 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 903 Lines: 29 --On Wednesday, August 29, 2001 01:29:05 PM +0900 "YOSHIFUJI Hideaki / ?$B5HF#1QL@?(B" wrote: > In article > (at Tue, > 28 Aug 2001 16:58:31 -0700), "David Stevens" > says: > >> This is a patch to join the "all routers" multicast group when >> forwarding is turned on for an interface, and to leave it when >> forwarding is turned off. > > being applied. thanks. > > BTW, ip6_forward() in net/ipv6/ip6_output.c checks > ipv6_devconf.forwarding only. > It seems we should check idev_of_input_device->cnf.forwarding, > too... Attention: afaik there are different meanings of this switch "per device" has different meaning than "per IPv6" on setting. Forwarding switching per device is currently not implemented, control has another meaning (sets isRouter on advertisements). Peter From owner-netdev@oss.sgi.com Tue Aug 28 23:23:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T6N1i14973 for netdev-outgoing; Tue, 28 Aug 2001 23:23:01 -0700 Received: from yue.hongo.wide.ad.jp ([203.178.139.94]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T6Mxd14970 for ; Tue, 28 Aug 2001 23:22:59 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id PAA32693; Wed, 29 Aug 2001 15:25:42 +0900 To: usagi-users@linux-ipv6.org, pb@bieringer.de Cc: dlstevens@us.ibm.com, netdev@oss.sgi.com, pekkas@netcore.fi Subject: Re: (usagi-users 00728) Re: [Patch 2of2] IPv6 routers don't join/leave the all routers group In-Reply-To: <15990000.999065086@localhost> References: <20010829132905G.yoshfuji@linux-ipv6.org> <15990000.999065086@localhost> X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010829152542K.yoshfuji@linux-ipv6.org> Date: Wed, 29 Aug 2001 15:25:42 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 645 Lines: 19 In article <15990000.999065086@localhost> (at Wed, 29 Aug 2001 08:04:46 +0200), Peter Bieringer says: > > BTW, ip6_forward() in net/ipv6/ip6_output.c checks > > ipv6_devconf.forwarding only. > > It seems we should check idev_of_input_device->cnf.forwarding, > > too... > > Attention: afaik there are different meanings of this switch > "per device" has different meaning than "per IPv6" on setting. Yes, but > Forwarding switching per device is currently not implemented, control > has another meaning (sets isRouter on advertisements). If a node forwards, it should announce NA with is_router set, IMHO. --yoshfuji From owner-netdev@oss.sgi.com Tue Aug 28 23:36:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T6aGt15220 for netdev-outgoing; Tue, 28 Aug 2001 23:36:16 -0700 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T6aCd15215 for ; Tue, 28 Aug 2001 23:36:12 -0700 Received: (qmail 19348 invoked from network); 29 Aug 2001 06:36:06 -0000 Received: from pd950f3f4.dip.t-dialin.net (HELO worker.muc.bieringer.de) (217.80.243.244) by mail.bieringer.de with SMTP; 29 Aug 2001 06:36:06 -0000 Date: Wed, 29 Aug 2001 08:37:15 +0200 From: Peter Bieringer To: usagi-users@linux-ipv6.org cc: dlstevens@us.ibm.com, netdev@oss.sgi.com, pekkas@netcore.fi Subject: Re: (usagi-users 00729) Re: [Patch 2of2] IPv6 routers don't join/leave the all routers group Message-ID: <33830000.999067035@localhost> In-Reply-To: <20010829152542K.yoshfuji@linux-ipv6.org> References: <20010829132905G.yoshfuji@linux-ipv6.org><15990000.999065086@localhost> <20010829152542K.yoshfuji@linux-ipv6.org> X-Mailer: Mulberry/2.1.0 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1800 Lines: 57 --On Wednesday, August 29, 2001 03:25:42 PM +0900 "YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=" wrote: > In article <15990000.999065086@localhost> (at Wed, 29 Aug 2001 > 08:04:46 +0200), Peter Bieringer says: > >> > BTW, ip6_forward() in net/ipv6/ip6_output.c checks >> > ipv6_devconf.forwarding only. >> > It seems we should check idev_of_input_device->cnf.forwarding, >> > too... >> >> Attention: afaik there are different meanings of this switch >> "per device" has different meaning than "per IPv6" on setting. > > Yes, but > > >> Forwarding switching per device is currently not implemented, >> control has another meaning (sets isRouter on advertisements). > > If a node forwards, it should announce NA with is_router set, IMHO. I had a discussion with Pekka some time ago in which he find out, what the settings are really do. Control the flag isRouter can be needed if a router has more than 2 interfaces and one of them is a stub network for which the router should not announce that he is a router. Behavior is like KAME at the moment. Thread around May 02, 2001 -- 8<-- (itojun on usagi-users) in KAME stack, the only legal combination is: accept_rtadv=0, forwarding=1 router accept_rtadv=1, forwarding=0 autoconfigured host accept_rtadv=0, forwarding=0 manually configured host 1/1 combination is not prohibited, just for experimental purposes. we are not trying to promote configuration like 1/1. (netbsd /etc/rc.d/network prohibits 1/1). -- >8-- BTW: afair in IPv4 (where the forwarding-per-device switch has a different meaning) this switch is checked on packet input on that device, not on output. Peter From owner-netdev@oss.sgi.com Tue Aug 28 23:40:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T6e5i15353 for netdev-outgoing; Tue, 28 Aug 2001 23:40:05 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T6e1d15349 for ; Tue, 28 Aug 2001 23:40:02 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id f7T6dbh23017; Wed, 29 Aug 2001 09:39:46 +0300 Date: Wed, 29 Aug 2001 09:39:37 +0300 (EEST) From: Pekka Savola To: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= cc: , , , Subject: Re: (usagi-users 00728) Re: [Patch 2of2] IPv6 routers don't join/leave the all routers group In-Reply-To: <20010829152542K.yoshfuji@linux-ipv6.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1428 Lines: 44 On Wed, 29 Aug 2001, YOSHIFUJI Hideaki / [iso-2022-jp] $B5HF#1QL@(B wrote: > > Forwarding switching per device is currently not implemented, control > > has another meaning (sets isRouter on advertisements). > > If a node forwards, it should announce NA with is_router set, IMHO. Consider a situation like: Internet eth0 | router / \ eth1 eth2 | | | | clients NFS server ---> NFS server's primary network connection It's legal to set general packet forwarding on every interface (you can't avoid that), but you may not want to enable IsRouter etc. on eth2 because the router acts as a "client" on that interface. So this might not be as simple in "mixed host/router" environments. Most scenarios with this are a bit far-fetched, but at least in the future, I believe more of these will arrive.. The extra check suggested would appear to be safe in this kind of scenario though (as router - NFS server traffic should not go through ip6_forward anyway, and you don't want clients being able to add a static route towards NFS server's IPv6 address to point to eth1). This would change the logical behaviour a bit though. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Aug 29 00:02:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T72DB15920 for netdev-outgoing; Wed, 29 Aug 2001 00:02:13 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T729d15916 for ; Wed, 29 Aug 2001 00:02:10 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id f7T71vi23155; Wed, 29 Aug 2001 10:01:57 +0300 Date: Wed, 29 Aug 2001 10:01:57 +0300 (EEST) From: Pekka Savola To: Peter Bieringer cc: , , Subject: Re: (usagi-users 00729) Re: [Patch 2of2] IPv6 routers don't join/leave the all routers group In-Reply-To: <33830000.999067035@localhost> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1644 Lines: 46 On Wed, 29 Aug 2001, Peter Bieringer wrote: > --On Wednesday, August 29, 2001 03:25:42 PM +0900 "YOSHIFUJI Hideaki > / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=" > wrote: > > If a node forwards, it should announce NA with is_router set, IMHO. > > I had a discussion with Pekka some time ago in which he find out, > what the settings are really do. > > Control the flag isRouter can be needed if a router has more than 2 > interfaces and one of them is a stub network for which the router > should not announce that he is a router. I understand Yoshifuji suggested a check like: if forwarding packet to interface X general packet forwarding must be enabled NEW: interface X must announce IsRouter (require symmetry, unless otherwise worked around) (note, if you want to forward _to_ an interface, but not _from_ it -- asymmetric routing, this check would bite you. I'm not sure if this is a scenario worth considering). not: if general packet forwarding is enabled all interfaces will have IsRouter flag enabled .. which would simplify scenarios greatly, but IMO might be an over-simplication at least in the long term. > BTW: afair in IPv4 (where the forwarding-per-device switch has a > different meaning) this switch is checked on packet input on that > device, not on output. Probably as I don't need any hint of output device control (except for forward netfilter hooks) in ipv4/ip_forward.c. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Aug 29 00:04:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T74hS16087 for netdev-outgoing; Wed, 29 Aug 2001 00:04:43 -0700 Received: from e34.bld.us.ibm.com (e34.co.us.ibm.com [32.97.110.132]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T74dd16084 for ; Wed, 29 Aug 2001 00:04:39 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e34.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id DAA108754; Wed, 29 Aug 2001 03:02:22 -0400 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7T73HH101622; Wed, 29 Aug 2001 01:03:18 -0600 From: "David Stevens" Importance: Normal Subject: Re: (usagi-users 00728) Re: [Patch 2of2] IPv6 routers don't join/leave the all routers group To: Pekka Savola Cc: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= , , , X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: Date: Wed, 29 Aug 2001 01:07:42 -0600 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/29/2001 01:07:44 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1266 Lines: 32 I think a simple interpretation for the flag is best (for example, "will forward packets received on this interface" with the implication from that that you join the all-routers group, set isRouter on NA's, etc.) Forward flag not set should be "won't forward packets received, won't set isRouter, etc", but will still forward packets received on other intf's to it, IMHO. There will always be scenarios that aren't covered-- even overloading "input and output" breaks at some point and you split forwarding to "input" and "output" flags to prevent forwarding in cases like your example, you can also imagine 4 interfaces A, B, C, D where you might want A & B hosts to talk to each other and C & D hosts to talk to each other, but not A-C, or A-D, B-C or B-D. No input/output forwarding flags combination covers that. At some point, you need a filter with complex rules for the complex cases. I would chuck the global flags and just use interface forwarding for input, routing table, routing protocols, etc (and a filter, if needed) for output decisions, and leave "all" as a convenient shorthand for setting the per-interface flags. In that case, devconf shouldn't be checked, just the interface flag. +-DLS From owner-netdev@oss.sgi.com Wed Aug 29 01:37:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T8bFg18036 for netdev-outgoing; Wed, 29 Aug 2001 01:37:15 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T8bAd18033 for ; Wed, 29 Aug 2001 01:37:11 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id f7T8aub23729; Wed, 29 Aug 2001 11:36:56 +0300 Date: Wed, 29 Aug 2001 11:36:56 +0300 (EEST) From: Pekka Savola To: David Stevens cc: , , Subject: Re: (usagi-users 00728) Re: [Patch 2of2] IPv6 routers don't join/leave the all routers group In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2614 Lines: 62 On Wed, 29 Aug 2001, David Stevens wrote: > I think a simple interpretation for the flag is best (for example, > "will forward packets received on this interface" with the implication > from that that you join > the all-routers group, set isRouter on NA's, etc.) Forward flag not set > should > be "won't forward packets received, won't set isRouter, etc", but will > still forward > packets received on other intf's to it, IMHO. Simple is good (, as long as it doesn't limit your options too much). > There will always be scenarios that aren't covered-- even overloading > "input and output" breaks at some point and you split forwarding to "input" > and > "output" flags to prevent forwarding in cases like your example, you can > also > imagine 4 interfaces A, B, C, D where you might want A & B hosts to talk > to each > other and C & D hosts to talk to each other, but not A-C, or A-D, B-C > or B-D. No input/output forwarding flags combination covers that. > At some point, you need a filter with complex rules for the complex > cases. I would chuck the global flags and just use interface forwarding for > input, > routing table, routing protocols, etc (and a filter, if needed) for output > decisions, The more complex scenarios are left to be handled with netfilter. However, netfilter cannot control what bits you put in ICMPv6 ND messages and the like though, only whether forwarding should be possible or not. IPv4 doesn't have stuff like IsRouter bit in _normal_ ND messages, which makes it more challenging to be able to control forwarding/ICMPv6 interaction of the interfaces, so direct comparison is not really applicable. These problems basically come up if you there would be only one flag "enable all/disable all". With interface-specific flag, or the current (more complex) approach, these problems can be avoided AFAICS. > and leave "all" as a convenient shorthand for setting the per-interface > flags. > In that case, devconf shouldn't be checked, just the interface flag. Based on IPv4, this would be the "intuitive" approach. I'm not sure whether the current approach is due to (older) specification that "either node (and all its interfaces) is a router or it is not", or whether there were any other technical reasons for it, e.g. address scoping. Do you remember this/care to comment on the ideas, Alexey? (I'm sure you would anyway, but still... :-) -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Aug 29 01:47:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7T8lWX18252 for netdev-outgoing; Wed, 29 Aug 2001 01:47:32 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7T8lUd18249 for ; Wed, 29 Aug 2001 01:47:30 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id f7T8lLN23790; Wed, 29 Aug 2001 11:47:21 +0300 Date: Wed, 29 Aug 2001 11:47:21 +0300 (EEST) From: Pekka Savola To: cc: Subject: Re: (usagi-users 00720) [Patch 1of2] IPv6 addrconf_forward_change() bug In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 892 Lines: 19 On Tue, 28 Aug 2001, David Stevens wrote: > The problem is, at this point, "ctl" points to the child sysctl entry (procname > "forwarding", ctl_name is 1 (always)). The parent node has ifindex set, but not > the child. On my system, the dev returned (no matter what device the sysctl > was changing) is "lo", which has ifindex 1. As a side note, the implementation gets in a pretty bad shape if you do 'ifconfig lo down'. For instance, it'll get into a loop trying to ICMPv6 ND its own IPv6 addresses, when you perform e.g. ping6 next, through ethernet interface. This is fixed by putting lo back up _and_ re-adding the addresses (link-locals too, so basically reload the nic driver). -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Thu Aug 30 03:01:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UA12N16462 for netdev-outgoing; Thu, 30 Aug 2001 03:01:02 -0700 Received: from mail.zabbadoz.net (mail.zabbadoz.net [195.2.176.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UA0ud16452 for ; Thu, 30 Aug 2001 03:00:57 -0700 Received: from localhost (bz@localhost) by mail.zabbadoz.net (8.11.0/8.11.0) with ESMTP id f7UA0nr04366 for ; Thu, 30 Aug 2001 12:00:49 +0200 (CEST) Date: Thu, 30 Aug 2001 12:00:49 +0200 (CEST) From: "Bjoern A. Zeeb" X-Sender: To: Subject: setting ip address from within driver Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2012 Lines: 55 Hi, I am currently having a look at 'cisco Serial Line Encapsulation'. They have a feature that enables one end of a pointopoint connection to come up without IP address and get it from the remote end: --- snipp --- ... The serial line model supported by SLARP assumes that each serial line is a separate IP subnet, and that one end of the line is host number 1, while the other end is host number 2. The SLARP address resolution protocol allows system A to request that system B tell system A system B's IP address, along with the IP netmask to be used on the network. It does this by sending a SLARP address resolution request packet, to which system B responds with a SLARP address resolution reply packet. System A then attempts to determine its own IP address based on the address of system B. If the host portion of system B's address is 1, system A will use 2 for the host portion of its own IP address. Conversely, if system B's IP host number is 2, system A will use IP host number 1. If system B replies with any IP host number other than 1 or 2, system A assumes that system B is unable to provide it with an address via SLARP. ... --- snipp --- It is no problem to check all that but is it also possible to set ip address from within driver without problems ? Would setting ifa_list->* be some kind of right approach ? --- some kind of this --- ... /* take primary(first) address of interface */ struct in_ifaddr *ifa = in_dev->ifa_list; if (ifa != NULL) { /* * according to * ip addr add 10.1.2.206 peer 10.1.2.205/30 dev ifname * set * ifa->ifa_local to local address (binary, net byte order) * ifa->ifa_address to remote addr ( -- "" -- ) * ifa->ifa_mask to 255.255.255.252 * ifa->ifa_prefixlen to 30 * ifa->ifa_label to ifname * * how to explicitly state pointopoint ? needed ? * do I also need to set scope ? * what did I make oops now ? */ } ... --- end --- -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT 56 69 73 69 74 http://www.zabbadoz.net/ From owner-netdev@oss.sgi.com Thu Aug 30 06:10:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UDAwo19863 for netdev-outgoing; Thu, 30 Aug 2001 06:10:58 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UDAvd19860 for ; Thu, 30 Aug 2001 06:10:57 -0700 Received: by colin.muc.de id <140619-3>; Thu, 30 Aug 2001 15:10:50 +0200 Message-ID: <20010830151046.65125@colin.muc.de> Date: Thu, 30 Aug 2001 15:10:46 +0200 From: Andi Kleen To: "Bjoern A. Zeeb" Cc: netdev@oss.sgi.com Subject: Re: setting ip address from within driver References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from Bjoern A. Zeeb on Thu, Aug 30, 2001 at 12:00:49PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 296 Lines: 9 > It is no problem to check all that but is it also possible to set ip > address from within driver without problems ? > > Would setting ifa_list->* be some kind of right approach ? The right approach would be a user mode daemon that sets it (kernel should supply mechanism, not policy) -Andi From owner-netdev@oss.sgi.com Thu Aug 30 07:10:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UEAv720751 for netdev-outgoing; Thu, 30 Aug 2001 07:10:57 -0700 Received: from mail.zabbadoz.net (mail.zabbadoz.net [195.2.176.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UEAsd20748 for ; Thu, 30 Aug 2001 07:10:54 -0700 Received: from localhost (bz@localhost) by mail.zabbadoz.net (8.11.0/8.11.0) with ESMTP id f7UEA2O11371; Thu, 30 Aug 2001 16:10:02 +0200 (CEST) Date: Thu, 30 Aug 2001 16:10:02 +0200 (CEST) From: "Bjoern A. Zeeb" X-Sender: To: Andi Kleen cc: Subject: Re: setting ip address from within driver In-Reply-To: <20010830151046.65125@colin.muc.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1516 Lines: 35 On Thu, 30 Aug 2001, Andi Kleen wrote: > > It is no problem to check all that but is it also possible to set ip > > address from within driver without problems ? > > > > Would setting ifa_list->* be some kind of right approach ? > > The right approach would be a user mode daemon that sets it > (kernel should supply mechanism, not policy) ok, also thought of this but "mechanism" for this will blow up kernel about the same size I think as doing it in ? Another point is that as long none of the two peers are rebooted or the line goes down (carrier prob) this only should happen once. running a full blown user space daemon just for the possibility that the line might go down isn't what I would like (even if it would sleep all the time). Last I would do it that way if one could easily get that data but p.ex. on isdn interface (where I am doing this stuff; it is also not in net/wan/hdlc.c yet) one cannot use for example libpcap to listen and wait for the packet (don't know if one even can capture it because this happens the same second the interface/leased line comes up). So one will need to buffer the values somewhere in kernel space. Well anyway can you provide me with some pointers/ideas for how to do this in userspace ? How to make the "mechanism" in a good way that would be acceptable (main question is directed to kernel/userspace interaction) ? Then I am going to do it this way. thanks in advance -- Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT 56 69 73 69 74 http://www.zabbadoz.net/ From owner-netdev@oss.sgi.com Thu Aug 30 08:03:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UF3GX21774 for netdev-outgoing; Thu, 30 Aug 2001 08:03:16 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UF30d21768 for ; Thu, 30 Aug 2001 08:03:00 -0700 Received: from storm (mailhost.vip-za.com [163.203.223.149]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id IAA06327 for ; Thu, 30 Aug 2001 08:02:36 -0700 (PDT) mail_from (paitan@bizweb.co.za) Received: from user3866.vip-za.com ([163.203.239.26] helo=paitan) by storm with smtp (Exim 3.22 #1) id 15cTCf-0003cs-00 for netdev@oss.sgi.com; Thu, 30 Aug 2001 16:53:58 +0200 Message-ID: <001401c13163$c3a9e820$0100a8c0@paitan> Reply-To: "gmarran" From: "gmarran" To: Subject: ICMP Destination Unreachable Message not conforming to standards? Date: Thu, 30 Aug 2001 16:55:07 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0011_01C13174.85B0AC00" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2314.1300 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 14123 Lines: 295 This is a multi-part message in MIME format. ------=_NextPart_000_0011_01C13174.85B0AC00 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, In the format of the ICMP Destination Unreachable Message as given in RFC 792 [http://www.ietf.org/rfc/rfc0792.txt] the data portion of the packet is meant to contain only the original IP header + 64 bits of the original data datagram. However, packet analysis of an ICMP destination unreachable message returned from a gateway running Redhat Linux 7.0 (kernel 2.2.16-22) gives a data portion of the packet containing the original IP header + 44*8 bits. This gives it as including the IP header, TCP header + TCP data portion + 18 bytes of garbage. If you cannot help but know someone who can, please forward this = message to them, or send me their e-mail address. Following is the sniffed packet and a detailed analysis of it. =20 =20 Here is the sniffed packet: =20 ICMP DESTINATION UNREACHABLE: GATEWAY -> WEB SERVER =20 00 10 5A 2E 1C 02 00 00 E8 D6 0B 63 08 00 45 C0 00 5C 01 4F 00 00 FF 01 38 3E C0 A8 00 01 C0 A8 00 02 03 00 2B CA 00 00 00 00 45 00 00 2C 09 69 40 00 40 06 6E B7 C0 A8 00 02 01 01 01 01 00 50 6A 06 1D 65 1F 8C DA 2B F1 03 60 12 7F B8 E6 D6 00 00 02 04 02 18 00 00 01 00 01 00 00 00 88 00 00 00 06 00 00 00 04 00 00 00 =20 Here is an analysis of the packet: Ethernet Header Destination Address 00:10:5A:2E:1C:02 (Server) Source Address 00:00:E8:D6:0B:63 (Gateway) Packet Type 08 00 (Internet Protocol) Internet Protocol Header Version 4 Header Length 5 words (20 bytes) Type of Service 192 (Internet Control) Total Length 92 Identifier 335 Fragment Offset 0 Fragmentation Flags None Time to Live 255 Protocol 1 (ICMP) Header Checksum 38 3E Source Address 192.168.0.1 (Gateway) Destination Address 192.168.0.2 (Server) Internet Control Message Protocol Header Type 3 (Destination Unreachable) Code 0 (Network Unreachable) Checksum 2B CA Original Internet Protocol Header Version 4 Header Length 5 words (20 bytes) Type of Service 0 (Routine) Total Length 44 Identifier 2409 Fragment Offset 0 Fragmentation Flags Don't Fragment Time to Live 64 Protocol 6 (TCP) Header Checksum 6E B7 Source Address 192.168.0.2 (Server) Destination Address 1.1.1.1 (Spoofed Address) Original Transmission Control Protocol Header Source Port 80 (Web Traffic) Destination Port 27142 Sequence Number 493166476 Acknowledgement Number 3660312835 Data Offset 6 words (24 bytes) Flags 12 (SYN and ACK Flags = set) =20 Window Size 32696 Checksum E6 D6 Urgent Pointer 0 Maximum Segment Size Option 536 Original TCP Payload Data 00 00 Garbage Data 01 00 01 00 00 00 88 00 = 00 00 06 00 00 00 04 00 00 00 =20 =20 Thanks Garth ------=_NextPart_000_0011_01C13174.85B0AC00 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi,

 In the format of the ICMP Destination = Unreachable=20 Message as given in
 RFC 792 [http://www.ietf.org/rfc/rfc0= 792.txt]=20 the data portion of the
 packet is meant to contain only the = original IP=20 header + 64 bits of the
 original data datagram. However, packet = analysis of an ICMP destination
 unreachable message returned = from a=20 gateway running Redhat Linux 7.0
 (kernel 2.2.16-22) gives a = data=20 portion of the packet containing the
 original IP header + 44*8 = bits.=20 This gives it as including the IP
 header, TCP header + TCP data = portion=20 + 18 bytes of garbage.
 If you cannot help but know someone who = can,=20 please forward this message
 to them, or send me their e-mail = address.=20 Following is the sniffed
 packet and a detailed analysis of=20 it.
 
 
Here is the sniffed packet:
 
= ICMP=20 DESTINATION UNREACHABLE: GATEWAY -> WEB SERVER
 
00 10 5A = 2E 1C=20 02 00 00 E8 D6 0B 63 08 00 45 C0
 00 5C 01 4F 00 00 FF 01 38 3E = C0 A8 00=20 01 C0 A8
 00 02 03 00 2B CA 00 00 00 00 45 00 00 2C 09 = 69
 40 00=20 40 06 6E B7 C0 A8 00 02 01 01 01 01 00 50
 6A 06 1D 65 1F 8C DA = 2B F1 03=20 60 12 7F B8 E6 D6
 00 00 02 04 02 18 00 00 01 00 01 00 00 00 88=20 00
 00 00 06 00 00 00 04 00 00 00
 
Here is an = analysis of=20 the packet:
 Ethernet Header
 Destination=20 Address           =  =20 00:10:5A:2E:1C:02 (Server)
 Source=20 Address           =          =20 00:00:E8:D6:0B:63 (Gateway)
 Packet=20 Type           &nb= sp;           &nbs= p;  =20 08 00 (Internet Protocol)
 Internet Protocol=20 Header
 Version        &n= bsp;           &nb= sp;           &nbs= p;  =20 4
 Header=20 Length           &= nbsp;           =20 5 words (20 bytes)
 Type of=20 Service           =           =20 192 (Internet Control)
 Total=20 Length           &= nbsp;           &n= bsp;  =20 92
 Identifier        &nb= sp;           &nbs= p;           =20 335
 Fragment=20 Offset           &= nbsp;        =20 0
 Fragmentation=20 Flags           &n= bsp; =20 None
 Time to=20 Live           &nb= sp;           &nbs= p;   =20 255
 Protocol        &nbs= p;            = ;            =  =20 1 (ICMP)
 Header=20 Checksum           = ;      =20 38 3E
 Source=20 Address           =           =20 192.168.0.1 (Gateway)
 Destination=20 Address           =   =20 192.168.0.2 (Server)
 Internet Control Message Protocol=20 Header
 Type         = ;            =             &= nbsp;     =20 3 (Destination=20 Unreachable)
 Code        = ;            =             &= nbsp;      =20 0 (Network=20 Unreachable)
 Checksum       &= nbsp;           &n= bsp;          =20 2B CA
 Original Internet Protocol=20 Header
 Version        &n= bsp;           &nb= sp;           &nbs= p;  =20 4
 Header=20 Length           &= nbsp;           =20 5 words (20 bytes)
 Type of=20 Service           =            =20 0 (Routine)
 Total=20 Length           &= nbsp;           &n= bsp;    =20 44
 Identifier        &nb= sp;           &nbs= p;            = ; =20 2409
 Fragment=20 Offset           &= nbsp;         =20 0
 Fragmentation=20 Flags           &n= bsp;  =20 Don't Fragment
 Time to=20 Live           &nb= sp;           &nbs= p;   =20 64
 Protocol         = ;            =             &= nbsp;=20 6 (TCP)
 Header=20 Checksum           = ;       =20 6E B7
 Source=20 Address           =           =20 192.168.0.2 (Server)
 Destination=20 Address           =    =20 1.1.1.1 (Spoofed Address)
 Original Transmission Control = Protocol=20 Header
 Source=20 Port           &nb= sp;           &nbs= p;     =20 80 (Web Traffic)
 Destination=20 Port           &nb= sp;          =20 27142
 Sequence=20 Number           &= nbsp;       =20 493166476
 Acknowledgement Number    =20 3660312835
 Data=20 Offset           &= nbsp;           &n= bsp;     =20 6 words (24=20 bytes)
 Flags        &nbs= p;            = ;            =        =20 12 (SYN and ACK Flags set)
 
Window=20 Size           &nb= sp;           &nbs= p;   =20 32696
 Checksum        &n= bsp;           &nb= sp;           =20 E6 D6
 Urgent=20 Pointer           =             &= nbsp; =20 0
 Maximum Segment Size Option    =20 536
 Original TCP Payload=20 Data           &nb= sp; 00=20 00
 Garbage=20 Data           &nb= sp;           &nbs= p;          =20 01 00 01 00 00 00 88 00 00 00 06 00 00 00 04 00 00=20 00
 
 
Thanks
  =20 Garth
------=_NextPart_000_0011_01C13174.85B0AC00-- From owner-netdev@oss.sgi.com Thu Aug 30 09:06:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UG6cR23061 for netdev-outgoing; Thu, 30 Aug 2001 09:06:38 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UG6Zd23058 for ; Thu, 30 Aug 2001 09:06:36 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA04949; Thu, 30 Aug 2001 20:03:27 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108301603.UAA04949@ms2.inr.ac.ru> Subject: Re: (usagi-users 00728) Re: [Patch 2of2] IPv6 routers don't join/leave To: pekkas@netcore.fi (Pekka Savola) Date: Thu, 30 Aug 2001 20:03:27 +0400 (MSK DST) Cc: dlstevens@us.ibm.com, usagi-users@linux-ipv6.org, netdev@oss.sgi.com In-Reply-To: from "Pekka Savola" at Aug 29, 1 11:36:56 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 598 Lines: 19 Hello! > Do you remember this/care to comment on the ideas, Alexey? Seems, I have already commented on this... Probably, I understand current problem incorrectly. So, elaborate, please. We do not have any flag, which blocks forwarding of packets received on some interface. This is duty of firewall. We do have such flag for IPv4, because this functionality is obtained gratuitously and it is silly not to accept donations. :-) What's about all the rest, all these things may/should be controlled by different flags. Even combination "send RAs" + "but do not forward" is not illegal. Alexey From owner-netdev@oss.sgi.com Thu Aug 30 11:11:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UIBqo25487 for netdev-outgoing; Thu, 30 Aug 2001 11:11:52 -0700 Received: from e32.bld.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UIBnd25481 for ; Thu, 30 Aug 2001 11:11:49 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e32.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id OAA84966; Thu, 30 Aug 2001 14:09:25 -0400 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7UIALD227380; Thu, 30 Aug 2001 12:10:21 -0600 From: "David Stevens" Importance: Normal Subject: Re: (usagi-users 00728) Re: [Patch 2of2] IPv6 routers don't join/leave To: kuznet@ms2.inr.ac.ru Cc: pekkas@netcore.fi (Pekka Savola), usagi-users@linux-ipv6.org, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: Date: Thu, 30 Aug 2001 12:14:46 -0600 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/30/2001 12:14:52 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2170 Lines: 53 > Probably, I understand current problem incorrectly. So, elaborate, please. > We do not have any flag, which blocks forwarding of packets > received on some interface. This is duty of firewall. I think the problem is that "being a router" or "being a host" should be per-interface, which means the global devconf6 forwarding flag should go away. At least the checks in packet processing that make use of it. And router vs host decisions are input packet decisions. Advertisements on an interface tell hosts on that link you're willing to receive and try to forward their packets. If you're not, you don't send the RA's, or otherwise behave as a router on that link. All of that to me means the forwarding flag should be used only per interface (with an "all" as a convenience to set or clear them all at once, but nothing global checked in the code for packet processing). Second, the per-interface forwarding flag should determine whether you forward or drop a packet received on that interface that isn't for you. That also determines whether you set isRouter in NA's, send RA's on that link, join the all-routers multicast group, etc. etc. On each link, behaving as a router on that link is determined by the forwarding flag for that link. On links where you don't want to accept packets for forwarding, you can still reach them for packet delivery. So, not forwarding to them on output makes no sense. Again, forwarding is logically an input decision. And unless all interfaces have the same value for "forwarding", no global flag for forwarding makes sense. The per-interface flags should decide whether you appear to be a host or router to other hosts on that link. And if you claim to be a router, you should try to route packets you receive on that link, not drop them based on flags associated with some other link or globally. For the case where you want to drop them based on other than the link forwarding flag, I'd use a firewall, because generally you *want* to deliver all packets if you have any way to get there. Dropping them is the special case. my $.04 +-DLS From owner-netdev@oss.sgi.com Thu Aug 30 11:30:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UIUAB25860 for netdev-outgoing; Thu, 30 Aug 2001 11:30:10 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UIU4d25856 for ; Thu, 30 Aug 2001 11:30:05 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA06144; Thu, 30 Aug 2001 22:29:25 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108301829.WAA06144@ms2.inr.ac.ru> Subject: Re: (usagi-users 00728) Re: [Patch 2of2] IPv6 routers don't join/leave To: dlstevens@us.ibm.com (David Stevens) Date: Thu, 30 Aug 2001 22:29:25 +0400 (MSK DST) Cc: pekkas@netcore.fi, usagi-users@linux-ipv6.org, netdev@oss.sgi.com In-Reply-To: from "David Stevens" at Aug 30, 1 12:14:46 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1121 Lines: 38 Hello! > > We do not have any flag, which blocks forwarding of packets > > received on some interface. This is duty of firewall. > > I think the problem is that "being a router" or "being a host" should be > per-interface, which means the global devconf6 forwarding flag should go > away. Let me to repeat: we do not have any flag which blocks forwarding per interface. :-) Killing global flag would mean that we do not have any way to disable forwarding at all. So, as soon as you forward at least on one interface, global flag must be ON. And packet filtering must be made with firewall. > join > the all-routers multicast group, etc. etc. WHAT? Kernel does not use this multicast group, hence it has no reasons to join it. If some module will start to use this group, it will join it. > you claim to be a router, you should try to route packets you receive on > that Particualrly, dropping all of them. :-) > packets if you have any way to get there. Dropping them is the special > case. Exactly. Which exactly matches to the statement that all the functions are controlled by separate flags. :-) Alexey From owner-netdev@oss.sgi.com Thu Aug 30 12:23:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UJNsp27446 for netdev-outgoing; Thu, 30 Aug 2001 12:23:54 -0700 Received: from e33.bld.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UJNnd27442 for ; Thu, 30 Aug 2001 12:23:49 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e33.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id OAA25234; Thu, 30 Aug 2001 14:21:28 -0500 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7UJMKD242580; Thu, 30 Aug 2001 13:22:24 -0600 From: "David Stevens" Importance: Normal Subject: Re: (usagi-users 00745) Re: [Patch 2of2] IPv6 routers don't join/leave To: usagi-users@linux-ipv6.org Cc: pekkas@netcore.fi, usagi-users@linux-ipv6.org, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: Date: Thu, 30 Aug 2001 12:26:47 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/30/2001 01:26:55 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3069 Lines: 80 >> > We do not have any flag, which blocks forwarding of packets >> > received on some interface. This is duty of firewall. >> >> I think the problem is that "being a router" or "being a host" should be >> per-interface, which means the global devconf6 forwarding flag should go >> away. > >Let me to repeat: we do not have any flag which blocks forwarding >per interface. :-) Killing global flag would mean that we do not have >any way to disable forwarding at all. I know, I'm saying that's the problem. :-) I'm saying a more useful and intuitive way to interpret the per-interface "forwarding" flag is as a "willing to forward received packets from this interface" flag. Then, that would be the *only* flag you check when receiving something not for a local address. If you want more complicated set-ups, then you use a firewall. That flag should also determine whether you behave as a router on that link with all other things, including isRouter flag, etc. I can do a patch prototype, if you want to see what I mean in detail. :-) >So, as soon as you forward at least on one interface, global >flag must be ON. And packet filtering must be made with firewall. Yes, I think that should be changed. I think being a router or a host should be per-interface on input. Once you have a packet, output is the same whether you generated it or someone else did (no global flag, no "output forwarding" flag). If you want more specialized behaviour, then you use a firewall. >> join >> the all-routers multicast group, etc. etc. > >WHAT? Kernel does not use this multicast group, hence it has >no reasons to join it. I know it doesn't, it's supposed to. That's what the patch I submitted that started this discussion fixes. Just like the "all-nodes" group, all routers should join the all-routers group. The problem (or, one problem) with having individual applications join it is they all have to check repeatedly the state of "forwarding" (because hosts are not supposed to be in that group, routers are). If a router becomes a host or vice versa after boot, the all-routers group membership should change. It also makes it easier for applications. They can just bind to INADDR_ANY and receive things sent to any valid address, including the all-routers group, but leave the management of that (again, must be based on forwarding state) to the kernel. > If some module will start to use this group, it will join it. There are things that use it that should be in the kernel. They just aren't implemented yet (in the kernel, anyway). But even "ping ff02::2" should work with conforming implementations. It's trivial, but still a handy way to find the routers. The point is: any application that can receive multicast packets should receive all routers group packets when forwarding is on and not when forwarding is off. They won't with the current code, and doing it right without the kernel joining the group would require the applications to check the state of the in-kernel forwarding. +-DLS From owner-netdev@oss.sgi.com Thu Aug 30 12:30:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UJURR27624 for netdev-outgoing; Thu, 30 Aug 2001 12:30:27 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UJUMd27619 for ; Thu, 30 Aug 2001 12:30:22 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.6/8.11.6) with ESMTP id f7UJU1803526; Thu, 30 Aug 2001 22:30:05 +0300 Date: Thu, 30 Aug 2001 22:30:01 +0300 (EEST) From: Pekka Savola To: gmarran cc: Subject: Re: ICMP Destination Unreachable Message not conforming to standards? In-Reply-To: <001401c13163$c3a9e820$0100a8c0@paitan> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1202 Lines: 23 On Thu, 30 Aug 2001, gmarran wrote: > In the format of the ICMP Destination Unreachable Message as given in > RFC 792 [http://www.ietf.org/rfc/rfc0792.txt] the data portion of the > packet is meant to contain only the original IP header + 64 bits of the > original data datagram. However, packet analysis of an ICMP destination > unreachable message returned from a gateway running Redhat Linux 7.0 > (kernel 2.2.16-22) gives a data portion of the packet containing the > original IP header + 44*8 bits. This gives it as including the IP > header, TCP header + TCP data portion + 18 bytes of garbage. > If you cannot help but know someone who can, please forward this message > to them, or send me their e-mail address. Following is the sniffed > packet and a detailed analysis of it. This is intentional; 64 bits is nowhere enough to identify the offending packet properly. Therefore, with current link speeds, there's no harm in attaching "everything you know" to the ICMP message. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Thu Aug 30 12:31:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UJVSO27728 for netdev-outgoing; Thu, 30 Aug 2001 12:31:28 -0700 Received: from yue.hongo.wide.ad.jp ([203.178.139.94]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UJVPd27725 for ; Thu, 30 Aug 2001 12:31:25 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id EAA03810; Fri, 31 Aug 2001 04:34:12 +0900 To: usagi-users@linux-ipv6.org, kuznet@ms2.inr.ac.ru Cc: dlstevens@us.ibm.com, pekkas@netcore.fi, netdev@oss.sgi.com Subject: Re: (usagi-users 00745) Re: [Patch 2of2] IPv6 routers don't join/leave In-Reply-To: <200108301829.WAA06144@ms2.inr.ac.ru> References: <200108301829.WAA06144@ms2.inr.ac.ru> X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: 90 22 65 EB 1E CF 3A D1 0B DF 80 D8 48 07 F8 94 E0 62 0E EA X-PGP-Key-URL: http://cerberus.hongo.wide.ad.jp/%7Eyoshfuji/hideaki@yoshifuji.org.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010831043412T.yoshfuji@linux-ipv6.org> Date: Fri, 31 Aug 2001 04:34:12 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1117 Lines: 35 In article <200108301829.WAA06144@ms2.inr.ac.ru> (at Thu, 30 Aug 2001 22:29:25 +0400 (MSK DST)), kuznet@ms2.inr.ac.ru says: > Let me to repeat: we do not have any flag which blocks forwarding > per interface. :-) Killing global flag would mean that we do not have > any way to disable forwarding at all. > > So, as soon as you forward at least on one interface, global > flag must be ON. And packet filtering must be made with firewall. This description is true *about current implementation*. But please note that, for each interface, - whether we forward a packet from it - whether we set is_router flag in NA to be sent on it - whether we join (some scope(s) of) all-routers multicast on it SHOULD be the same. The patch is not completed for our (at least David's and my) thought, but it is a start of hacking.... > > join > > the all-routers multicast group, etc. etc. > > WHAT? Kernel does not use this multicast group, hence it has > no reasons to join it. > > If some module will start to use this group, it will join it. ping6 ff02::2%eth0 ? (How about ff05::2%site1 etc... sigh.) --yoshfuji From owner-netdev@oss.sgi.com Thu Aug 30 12:56:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UJuXc28440 for netdev-outgoing; Thu, 30 Aug 2001 12:56:33 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UJuTd28437 for ; Thu, 30 Aug 2001 12:56:30 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA06457; Thu, 30 Aug 2001 23:43:16 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200108301943.XAA06457@ms2.inr.ac.ru> Subject: Re: (usagi-users 00745) Re: [Patch 2of2] IPv6 routers don't To: yoshfuji@linux-ipv6.org (YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) Date: Thu, 30 Aug 2001 23:43:16 +0400 (MSK DST) Cc: usagi-users@linux-ipv6.org, dlstevens@us.ibm.com, pekkas@netcore.fi, netdev@oss.sgi.com In-Reply-To: <20010831043412T.yoshfuji@linux-ipv6.org> from "YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=" at Aug 31, 1 04:34:12 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 553 Lines: 24 Hello! > - whether we forward a packet from it > - whether we set is_router flag in NA to be sent on it > - whether we join (some scope(s) of) all-routers multicast on it > > SHOULD be the same. Why? They should be different flags and the last is not flag at all. Kernel is not going to police you. What's about the first item, it is too complicated. Grabbing device lock in data path is not an option. > ping6 ff02::2%eth0 ? Compare to ping4 224.0.0.2%eth0 Routers lose expensive entry in multicast entry not for fun, but for work. Alexey From owner-netdev@oss.sgi.com Thu Aug 30 15:41:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7UMfne00805 for netdev-outgoing; Thu, 30 Aug 2001 15:41:49 -0700 Received: from e33.bld.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7UMfkd00802 for ; Thu, 30 Aug 2001 15:41:46 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e33.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id RAA34192; Thu, 30 Aug 2001 17:39:29 -0500 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7UMePD66438; Thu, 30 Aug 2001 16:40:25 -0600 From: "David Stevens" Importance: Normal Subject: Re: (usagi-users 00745) Re: [Patch 2of2] IPv6 routers don't To: kuznet@ms2.inr.ac.ru Cc: "yoshfuji" (YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=), usagi-users@linux-ipv6.org, pekkas@netcore.fi, netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: Date: Thu, 30 Aug 2001 15:44:55 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/30/2001 04:44:56 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1460 Lines: 44 [sorry if this is duplicated; I'm having mailer problems ] > >> - whether we forward a packet from it >> - whether we set is_router flag in NA to be sent on it >> - whether we join (some scope(s) of) all-routers multicast on it >> >> SHOULD be the same. > >Why? They should be different flags and the last is not flag at all. >Kernel is not going to police you. > >What's about the first item, it is too complicated. Grabbing device >lock in data path is not an option. I'm no expert on the linux implementation, but I'd think this should just require a reference you already have. It doesn't really have to be atomic, since reading a stale value for a short time during a transition wouldn't be fatal. >> ping6 ff02::2%eth0 ? > >Compare to ping4 224.0.0.2%eth0 > >Routers lose expensive entry in multicast entry not for fun, >but for work. RFC 2373, section 2.8: A router is required to recognize all addresses that a host is required to recognize, plus the following addresses as identifying itself: ... o All-Routers Multicast Addresses Even if it weren't a requirement, I think it's still a good idea. For network debugging and for applications that simply can't (easily) join and leave the group under the right circumstances, because appropriate membership depends on the forwarding state of the interface, something applications shouldn't have to check and recheck for correctness. +-DLS From owner-netdev@oss.sgi.com Fri Aug 31 11:58:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7VIwsS26363 for netdev-outgoing; Fri, 31 Aug 2001 11:58:54 -0700 Received: from blackhole.kfki.hu (blackhole.kfki.hu [148.6.0.114]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7VIwpd26360 for ; Fri, 31 Aug 2001 11:58:51 -0700 Received: by blackhole.kfki.hu (Postfix, from userid 311) id 9D6C51049B; Fri, 31 Aug 2001 13:40:16 +0200 (CEST) Date: Fri, 31 Aug 2001 13:40:16 +0200 (CEST) From: Jozsef Kadlecsik To: Subject: icmp bug in 2.4.5? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1214 Lines: 33 Hello, After upgrading a firewall which is configured with connection tracking from 2.4.2 to 2.4.5, the following strange thing happens on it: traceroute targeted to the firewall completes successfully: icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0) in udp.c generates proper (large enough) response packets, which then can be handled by the connection tracking code. traceroute going through the firewall doesn't generate "proper" ICMP packets from the firewall: icmp_send(skb, ICMP_TIME_EXCEEDED, ICMP_EXC_TTL, 0) in ip_forward.c seems to generate too short packets, which cannot therefore be tracked: Aug 31 12:16:12 zzz kernel: denied: IN= OUT=eth1 SRC=zzz.zzz.zzz.zzz DST=a.b.c.d LEN=66 TOS=0x00 PREC=0xC0 TTL=255 ID=10383 PROTO=ICMP TYPE=11 CODE=0 [SRC=a.b.c.d DST=x.y.z.w LEN=38 TOS=0x00 PREC=0x00 TTL=1 ID=42915 PROTO=UDP INCOMPLETE [6 bytes] ] Nothing else's changed, only an upgrade happened. Is it a known bug? If yes, is it fixed in later releases? Regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu WWW-Home: http://www.kfki.hu/~kadlec Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary From owner-netdev@oss.sgi.com Fri Aug 31 16:10:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f7VNAx200536 for netdev-outgoing; Fri, 31 Aug 2001 16:10:59 -0700 Received: from e32.bld.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f7VNAud00533 for ; Fri, 31 Aug 2001 16:10:56 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e32.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id TAA22028 for ; Fri, 31 Aug 2001 19:08:43 -0400 Received: from d03nm104.boulder.ibm.com (d03nm104.boulder.ibm.com [9.99.140.96]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f7VN9dl46298 for ; Fri, 31 Aug 2001 17:09:39 -0600 From: "David Stevens" Importance: Normal Subject: source routing honored by hosts? To: netdev@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: Date: Fri, 31 Aug 2001 16:14:11 -0700 X-MIMETrack: Serialize by Router on D03NM104/03/M/IBM(Release 5.0.8 |June 18, 2001) at 08/31/2001 05:14:12 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1229 Lines: 34 ip6_forward() has the following two lines: if (ipv6_devconf.forwarding == 0 && opt->srcrt == 0) goto error; Aside from the other issue of per-interface forwarding :-), this appears to allow forwarding of source-routed packets even when the node is a host, only. That seems to be a security hole to me. Suppose you have a multihomed host, or in a different world, a router with per-interface routing and forwarding turned off on the input interface. If that host has routing to networks that some other machine can't reach, a bad guy can simply source-route through the privileged host where normal routing would fail. (it appears) Is there something else that would prohibit this? Is it intentional? I was thinking this should be just if (ipv6_devconf.forwarding == 0) goto error; [only forward packets, source-routed or not, if forwarding is on] or (better :-)) something like in_idev = in6_dev_get(skb->dev); if (!in_idev) goto error; forwarding = in_idev->cnf.forwarding; in6_dev_put(in_idev); if (!forwarding) goto error; [only forward packets if the input device's forwarding flag is on] +-DLS