From owner-netdev@oss.sgi.com Sat Apr 1 03:17:44 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 03:17:25 -0800 Received: from md4690e42.utfors.se ([212.105.14.66]:33029 "EHLO boris.prodako.se") by oss.sgi.com with ESMTP id ; Sat, 1 Apr 2000 03:17:00 -0800 Received: from goteborg.utfors.se (IDENT:tori@igor.prodako.se [192.168.123.1]) by boris.prodako.se (8.9.3/8.9.3) with ESMTP id NAA25216 for ; Sat, 1 Apr 2000 13:16:49 +0200 Message-ID: <38E5DA9F.85048708@goteborg.utfors.se> Date: Sat, 01 Apr 2000 13:16:47 +0200 From: Tobias =?iso-8859-1?Q?Ringstr=F6m?= X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.3.99-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: rtnetlink bug in 2.3 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi! The following back-trace (by the excellent kgdb) illustrates a problem with rtnetlink. The function rtmsg_ifinfo is called from interrupt context when inserting a cardbus card. It tries to allocate an skb with GFP_KERNEL, causing a kernel panic. I changed the allocate flag to GFP_ATOMIC, and all seems fine now. I have only had be brief look at the code, and the fix should be verified by someone. The bug is real, though. (Linux 2.3.99-pre3) /Tobias GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) rmt 0xc0110ee9 in breakpoint () at gdbstub.c:711 711 if (initialized) BREAKPOINT(); (gdb) break skbuff.c:140 Breakpoint 1 at 0xc01e2858: file skbuff.c, line 140. (gdb) c Continuing. Breakpoint 1, alloc_skb (size=0xf60, gfp_mask=0x7) at skbuff.c:140 140 if (++count < 5) { (gdb) bt #0 alloc_skb (size=0xf60, gfp_mask=0x7) at skbuff.c:140 #1 0xc01e883c in rtmsg_ifinfo (type=0x10, dev=0xc4090800, change=0xffffffff) at rtnetlink.c:258 #2 0xc01e8cee in rtnetlink_event (this=0xc0290020, event=0x5, ptr=0xc4090800) at rtnetlink.c:505 #3 0xc01e576b in register_netdevice (dev=0xc4090800) at /home/tori/linux-2.3-2/include/linux/notifier.h:71 #4 0xc01a411b in init_netdev (dev=0x0, sizeof_priv=0x3b8, mask=0xc0249ef1 "eth%d", setup=0xc01a41b0 ) at net_init.c:138 #5 0xc01a4146 in init_etherdev (dev=0x0, sizeof_priv=0x3b8) at net_init.c:164 #6 0xc01a3239 in tulip_init_one (pdev=0xc401fc00, ent=0xc028bc98) at tulip_core.c:1003 #7 0xc01c4f59 in pci_announce_device (drv=0xc028bea0, dev=0xc401fc00) at pci.c:289 #8 0xc01c5079 in pci_insert_device (dev=0xc401fc00, bus=0xc40eb0a0) at pci.c:339 #9 0xc01d0431 in cb_alloc (s=0xc1132000) at cardbus.c:319 #10 0xc01c7c65 in unreset_socket (i=0x0) at cs.c:571 #11 0xc011fb6a in timer_bh () at timer.c:283 #12 0xc011cdb9 in bh_action (nr=0x0) at softirq.c:239 #13 0xc011cd08 in tasklet_hi_action (a=0xc02c3d00) at softirq.c:175 #14 0xc011cbda in do_softirq () at softirq.c:73 #15 0xc010bf74 in do_IRQ (regs={ebx = 0xc0108990, ecx = 0xc40f6000, edx = 0xc0294000, esi = 0xc0294000, edi = 0xc0108990, ebp = 0xc0295fd4, eax = 0x0, xds = 0xc0100018, xes = 0xc0290018, orig_eax = 0xffffff00, eip = 0xc01089b6, xcs = 0x10, eflags = 0x246, esp = 0xc0295fe8, xss = 0xc0108a02}) at irq.c:628 #16 0xc010ad44 in ret_from_intr () at usb-uhci.c:2819 #17 0xc0108a02 in cpu_idle () at process.c:104 #18 0xc0296b26 in start_kernel () at init/main.c:581 #19 0xc010018e in L6 () at usb-uhci.c:2819 Cannot access memory at address 0xa0. (gdb) --- rtnetlink.c.orig Sat Apr 1 12:49:12 2000 +++ rtnetlink.c Sat Apr 1 12:49:18 2000 @@ -255,7 +255,7 @@ struct sk_buff *skb; int size = NLMSG_GOODSIZE; - skb = alloc_skb(size, GFP_KERNEL); + skb = alloc_skb(size, GFP_ATOMIC); if (!skb) return; @@ -264,7 +264,7 @@ return; } NETLINK_CB(skb).dst_groups = RTMGRP_LINK; - netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, GFP_KERNEL); + netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, GFP_ATOMIC); } static int rtnetlink_done(struct netlink_callback *cb) From owner-netdev@oss.sgi.com Sat Apr 1 03:30:04 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 03:29:54 -0800 Received: from smtp1.libero.it ([193.70.192.51]:17874 "EHLO smtp1.libero.it") by oss.sgi.com with ESMTP id ; Sat, 1 Apr 2000 03:29:41 -0800 Received: from armageddon.allanon.org (151.20.21.16) by smtp1.libero.it; 1 Apr 2000 13:29:33 +0200 Received: by armageddon.allanon.org (Postfix, from userid 0) id 559AF5F99; Sat, 1 Apr 2000 13:25:49 +0200 (CEST) Date: Sat, 1 Apr 2000 13:25:49 +0200 From: Gigi Sullivan To: netdev@oss.sgi.com Subject: Re: Local Denial-of-Service attack against Linux Message-ID: <20000401132548.A402@armageddon.libero.it> Reply-To: sullivan@sikurezza.org References: <20000323175509.A23709@clearway.com> <20000327215409.A467@armageddon.libero.it> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=ReaqsoxgOBHFXBhH X-Mailer: Mutt 0.95.5i In-Reply-To: <20000327215409.A467@armageddon.libero.it>; from Gigi Sullivan on Mon, Mar 27, 2000 at 09:54:19PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing --ReaqsoxgOBHFXBhH Content-Type: text/plain; charset=us-ascii Aiee :) Hello! Some times ago, Jay Fenlason posted the following message to bugtraq ml. I tried to do a little-easy patch to fix the problem. After I did that, I figure out that wasn't the right way to do it (although the patch worked), so I did another very little easy patch. This is intended to be a temporary patch (maybe stupid one, forgive me). Attached to this post there's the patch (tested on kernel 2.2.14). However, and again, I'm not sure if this will break some actual behaviour, so if something is wrong, please let me know. Thx a lot! > From: Jay Fenlason > Subject: Local Denial-of-Service attack against Linux > X-To: bugtraq@securityfocus.com > To: BUGTRAQ@SECURITYFOCUS.COM > Status: RO > X-Status: A > Content-Length: 1308 > Lines: 42 > > This amusing little program will hang Linux 2.2.12 (default Red Hat 6.1), > 2.2.14 (latest stable kernel) and 2.3.99-pre2 (latest development kernel) > on my 6x86 scratch machine and our various Pentium development machines. > Note that this does not require any special privileges. > > The send system call immediately puts the kernel in a loop spewing > kmalloc: Size (131076) too large > forever (or until you hit the reset button). > > Apparently unix domain sockets are ignoring the /proc/sys/net/core/wmem_max > parameter, despite the documentation to the contrary. The fix should be > simple, but I haven't had time to chase it down, and I'm not (usually) a > Linux kernel developer. > > -- JF > > --- BEGIN INCLUDED SOURCE FILE --- > > #include > #include > #include > > char buf[128 * 1024]; > > int main ( int argc, char **argv ) > { > struct sockaddr SyslogAddr; > int LogFile; > int bufsize = sizeof(buf)-5; > int i; > > for ( i = 0; i < bufsize; i++ ) > buf[i] = ' '+(i%95); > buf[i] = '\0'; > > SyslogAddr.sa_family = AF_UNIX; > strncpy ( SyslogAddr.sa_data, "/dev/log", sizeof(SyslogAddr.sa_data) ); > LogFile = socket ( AF_UNIX, SOCK_DGRAM, 0 ); > sendto ( LogFile, buf, bufsize, 0, &SyslogAddr, sizeof(SyslogAddr) ); > return 0; > } > --- END INCLUDED SOURCE FILE --- bye bye -- gg sullivan -- Lorenzo Cavallaro `Gigi Sullivan' Until I loved, life had no beauty; I did not know I lived until I had loved. (Theodor Korner) --ReaqsoxgOBHFXBhH Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=ldos_patch_last --- sock.c.orig Fri Mar 31 23:36:00 2000 +++ sock.c Fri Mar 31 23:36:29 2000 @@ -79,10 +79,6 @@ * Jay Schulist : Added SO_ATTACH_FILTER and SO_DETACH_FILTER. * Andi Kleen : Add sock_kmalloc()/sock_kfree_s() * Andi Kleen : Fix write_space callback - * Lorenzo `Gigi Sullivan' Cavallaro: Temporary Fix to local DoS due to - * too big buffer (AF_UNIX SOCK_DGRAM). - * Maybe this will broke something else. - * I apologize. * * To Fix: * @@ -570,18 +566,6 @@ skb->sk = sk; return skb; } - - /* - * kmalloc (mm/slab.c) checks the size to allocate through a - * `cache size struct'. - * If we try to allocate much more then the maximum, just report it - * backwardly. - * XXX Will this broke something, like sock_wait_for_wmem() - * defined here (net/core/sock.c)? - * Is this the right way ? - */ - - sk->err = EMSGSIZE; } return NULL; } --- af_unix.c.orig Fri Mar 31 23:36:40 2000 +++ af_unix.c Sat Apr 1 00:31:40 2000 @@ -43,6 +43,8 @@ * number of socks to 2*max_files and * the number of skb queueable in the * dgram receiver. + * Lorenzo `Gigi Sullivan' Cavallaro : Fixed local DoS attack, due to + * unchecked sysctl_wmem_max sysctl (I hope) :) * * Known differences from reference BSD that was tested: * @@ -972,6 +974,16 @@ if (sock->passcred && !sk->protinfo.af_unix.addr) unix_autobind(sock); + /* + * This should FIX the local DoS attack about sending msgs > sk->sndbuf + * Never had time to look the optimization code used for unix_stream, + * so, if the buffer we are going to send is > sysctl_wmem_max, just + * report an error (Drop the `packet'). + */ + + if (len > sk->sndbuf - 16) + return -EMSGSIZE; + skb = sock_alloc_send_skb(sk, len, 0, msg->msg_flags&MSG_DONTWAIT, &err); if (skb==NULL) goto out; --ReaqsoxgOBHFXBhH-- From owner-netdev@oss.sgi.com Sat Apr 1 05:49:05 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 05:48:55 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:18702 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 1 Apr 2000 05:48:33 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id RAA16851; Sat, 1 Apr 2000 17:48:23 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004011348.RAA16851@ms2.inr.ac.ru> Subject: Re: rtnetlink bug in 2.3 To: zajbot@goteborg.utfors.SE (Tobias =?iso-8859-1?Q?Ringstr=F6m?=) Date: Sat, 1 Apr 2000 17:48:23 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <38E5DA9F.85048708@goteborg.utfors.se> from "Tobias =?iso-8859-1?Q?Ringstr=F6m?=" at Apr 1, 0 04:13:37 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 390 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > The following back-trace (by the excellent kgdb) illustrates a problem > with rtnetlink. The function rtmsg_ifinfo is called from interrupt > context when inserting a cardbus card. This must not ever occur. This driver cannot be used. > with GFP_KERNEL, causing a kernel panic. Excellent. If it will not panic here, your kernel will be destroyed secretly and fatally. Alexey From owner-netdev@oss.sgi.com Sat Apr 1 05:56:35 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 05:56:25 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:64761 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Sat, 1 Apr 2000 05:56:22 -0800 Received: from fred.muc.de (none@ns1201.munich.netsurf.de [195.180.235.201]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id PAA01631; Sat, 1 Apr 2000 15:51:14 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12bNvi-0000OK-00; Sat, 1 Apr 2000 15:27:10 +0200 Date: Sat, 1 Apr 2000 15:27:10 +0200 From: Andi Kleen To: Andrey Savochkin Cc: Donald Becker , jamal , Andrew Morton , netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) Message-ID: <20000401152710.A1499@fred.muc.de> References: <20000401113010.A20780@saw.sw.com.sg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <20000401113010.A20780@saw.sw.com.sg>; from Andrey Savochkin on Sat, Apr 01, 2000 at 05:32:20AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, Apr 01, 2000 at 05:32:20AM +0200, Andrey Savochkin wrote: > Hello, > > On Fri, Mar 31, 2000 at 11:16:56AM -0500, Donald Becker wrote: > > On Fri, 31 Mar 2000, jamal wrote: > > > Mitigation (which seems to be added to some of Donalds drivers by Jeff > > > Garzik and Andrey Savochkin) will to a certain extent. > > At least for eepro100, the receive interrupt mitigation doesn't exist in the > driver. There were some changes about TX completion interrupts, but I > consider them as rather irrelevant. > > The other question is that it's possible to turn on hardware interrupt > mitigation on Intel's chips by uploading a microcode. > Intel's driver claims to do it. Interesting is that Intel's driver also shows how to use RX hardware checksums (although they use a very ugly way to implement it with CHECKSUM_NONE, instead of using CHECKSUM_HW). Unfortunately my eepro100 is too old to have it (82557 rev 1) -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Sat Apr 1 07:28:55 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 07:28:38 -0800 Received: from mail.cyberus.ca ([209.195.95.1]:56768 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Sat, 1 Apr 2000 07:28:27 -0800 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id KAA01687; Sat, 1 Apr 2000 10:28:27 -0500 (EST) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id KAA19356; Sat, 1 Apr 2000 10:28:25 -0500 (EST) Date: Sat, 1 Apr 2000 10:28:25 -0500 (EST) From: jamal To: Michael Richardson cc: netdev@oss.sgi.com Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) In-Reply-To: <200003311908.OAA00894@solidum.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 31 Mar 2000, Michael Richardson wrote: > http://www.research.solidum.com/papers/ols1999/top.html > Michael, I think we had this debate during your presentation ;-> Here are my thoughts: Bus Latency is not a problem as far as throughput is concerned. This problem can be equated to *exactly* the high RTT-BW problem in TCP. You just have to adjust your ring-buffering accordingly. I dont think processing latency is an issue either; even with your broken pcnet driver[1] you come up with a number of 4007 cycles to process a packet. Get yourself a faster processor ;-> So your assertion that "the 33Mhz, 32 bit PCI bus itself can theoretically handle up to one and a half million (1428571 to be exact) frames per second, or 50 10 Mb/s adaptors" is misleading. I realize you say it is theoretical; however, ask people who use Alexey's fast forwarding driver and they'll tell you they definetly do more than 50Mbps. BTW, current 2.3 kernels allow you to use APICs even on a single processor. cheers, jamal [1] A modified tulip driver at 100Mbps FD which does all the rx processing (record stats etc) but drops the packet instead of passing the packet up the stack easily handles 150Kpps. I have only tested with one interface. The stats are derived by simply using ifconfig and comparing with the hardware generator -- nothing fancy. I should retry it blasting at two NICS and see whether they can both handle it. This was a while back using a hardware traffic generator (very precise interpacket times of 0.96 microsecs) with some 2.2 kernel. From owner-netdev@oss.sgi.com Sat Apr 1 08:46:06 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 08:43:37 -0800 Received: from md46920c9.utfors.se ([212.105.32.201]:35589 "EHLO boris.prodako.se") by oss.sgi.com with ESMTP id ; Sat, 1 Apr 2000 08:43:13 -0800 Received: from localhost (tori@localhost) by boris.prodako.se (8.9.3/8.9.3) with ESMTP id SAA25888; Sat, 1 Apr 2000 18:25:24 +0200 X-Authentication-Warning: boris.prodako.se: tori owned process doing -bs Date: Sat, 1 Apr 2000 18:25:23 +0200 (CEST) From: Tobias Ringstrom X-Sender: tori@boris.prodako.se To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com Subject: Re: rtnetlink bug in 2.3 In-Reply-To: <200004011348.RAA16851@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, 1 Apr 2000 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > The following back-trace (by the excellent kgdb) illustrates a problem > > with rtnetlink. The function rtmsg_ifinfo is called from interrupt > > context when inserting a cardbus card. > > This must not ever occur. This driver cannot be used. What driver? rtnetlink? cardbus? tulip? I understand that I presently cannot enable rtnetlink for a computer with cardbus network devices, but that's hardly a long term solution, is it? > > with GFP_KERNEL, causing a kernel panic. > > Excellent. If it will not panic here, your kernel will be destroyed > secretly and fatally. This is why I wanted someone to have a look at it. I considered it likely that my simple/stupid patch was not correct, but the important part of my was the troble report, not the "solution". I could have been more clear about that. What is the correct solution, then? Who's at fault? /Tobias From owner-netdev@oss.sgi.com Sat Apr 1 09:15:16 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 09:12:56 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:49414 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 1 Apr 2000 09:12:54 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA24031; Sat, 1 Apr 2000 21:12:41 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004011712.VAA24031@ms2.inr.ac.ru> Subject: Re: rtnetlink bug in 2.3 To: zajbot@goteborg.utfors.se (Tobias Ringstrom) Date: Sat, 1 Apr 2000 21:12:41 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Tobias Ringstrom" at Apr 1, 0 06:25:23 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 916 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > What driver? That driver, which tries to register device from interrupt. It is impossible. > I understand that I presently cannot enable rtnetlink for a computer with > cardbus network devices, but that's hardly a long term solution, is it? rtnetlink has nothing to do with the problem. It is symptom. > > Excellent. If it will not panic here, your kernel will be destroyed > > secretly and fatally. > > Why the sarcasm? I spent a noticable amount of time to give a very good > description of the problem, and I honestly think that I managed to do so, > even if the patch I submitted was broken. It is not sarcasm, at all. Using that patch, you will finish with corrupted disk or with something like that after this patch. Correct patch looks as addition line sort of: if (in_interrupt()) panic("Attempt to register nedevice from interrupt\n"); added to function register_netdevice(). Alexey From owner-netdev@oss.sgi.com Sat Apr 1 09:24:16 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 09:22:36 -0800 Received: from mail.cyberus.ca ([209.195.95.1]:43999 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Sat, 1 Apr 2000 09:22:27 -0800 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id MAA01339; Sat, 1 Apr 2000 12:22:26 -0500 (EST) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id MAA19511; Sat, 1 Apr 2000 12:22:26 -0500 (EST) Date: Sat, 1 Apr 2000 12:22:26 -0500 (EST) From: jamal To: kuznet@ms2.inr.ac.ru cc: Tobias Ringstrom , netdev@oss.sgi.com Subject: Re: rtnetlink bug in 2.3 In-Reply-To: <200004011712.VAA24031@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, 1 Apr 2000 kuznet@ms2.inr.ac.ru wrote: > disk or with something like that after this patch. Correct patch looks > as addition line sort of: > > if (in_interrupt()) > panic("Attempt to register nedevice from interrupt\n"); > > added to function register_netdevice(). This is going to be a big problem BTW. These type of cards i believe generate interupts when inserted to allow dynamic allocation of PCI type resources. Same applies to compact PCI. The PCI subsytem is going to have to solve it not the networking code. cheers, jamal From owner-netdev@oss.sgi.com Sat Apr 1 09:29:46 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 09:27:17 -0800 Received: from adsl-151-196-244-176.bellatlantic.net ([151.196.244.176]:4597 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Sat, 1 Apr 2000 09:27:12 -0800 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id MAA10669; Sat, 1 Apr 2000 12:30:48 -0500 Date: Sat, 1 Apr 2000 12:30:48 -0500 (EST) From: Donald Becker X-Sender: becker@vaio.greennet To: jamal cc: Michael Richardson , netdev@oss.sgi.com Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, 1 Apr 2000, jamal wrote: > On Fri, 31 Mar 2000, Michael Richardson wrote: > > > http://www.research.solidum.com/papers/ols1999/top.html ... > [1] A modified tulip driver at 100Mbps FD which does all the rx processing > (record stats etc) but drops the packet instead of passing the packet up > the stack easily handles 150Kpps. This is a very important point: the Tulip hardware and driver in 2.0 and 2.2 can easily handle the worst a 100baseTx link can throw at it. Just queuing the packet in netif_rx() *should* take minimal extra work, and have minimal cache impact. Dropping the packet because the queue layer is full should take even less work. I think that some people have been making the assumption that it's the driver itself that is slow.. Donald Becker Scyld Computing Corporation, becker@scyld.com From owner-netdev@oss.sgi.com Sat Apr 1 09:32:36 2000 Received: by oss.sgi.com id ; Sat, 1 Apr 2000 09:30:26 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:34567 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 1 Apr 2000 09:30:20 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA24223; Sat, 1 Apr 2000 21:29:52 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004011729.VAA24223@ms2.inr.ac.ru> Subject: Re: rtnetlink bug in 2.3 To: hadi@cyberus.ca (jamal) Date: Sat, 1 Apr 2000 21:29:52 +0400 (MSK DST) Cc: zajbot@goteborg.utfors.se, netdev@oss.sgi.com In-Reply-To: from "jamal" at Apr 1, 0 12:22:26 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 416 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > This is going to be a big problem BTW. These type of cards i > believe generate interupts when inserted to allow dynamic allocation > of PCI type resources. Same applies to compact PCI. I see no problems, processes exist exactly to do this. BTW fiddling with PCI resources is impossible from interrupts exactly as registering any devices, so that mentioned cardbus driver is broken fundamentally. Alexey From owner-netdev@oss.sgi.com Sun Apr 2 01:02:33 2000 Received: by oss.sgi.com id ; Sun, 2 Apr 2000 01:02:14 -0800 Received: from robur.slu.se ([130.238.98.12]:51972 "EHLO robur.slu.se") by oss.sgi.com with ESMTP id ; Sun, 2 Apr 2000 01:02:05 -0800 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id LAA11451; Sun, 2 Apr 2000 11:02:01 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14567.3208.945950.713466@robur.slu.se> Date: Sun, 2 Apr 2000 11:02:00 +0200 (CEST) To: jamal Cc: Michael Richardson , netdev@oss.sgi.com Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) X-Mailer: VM 6.75 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal writes: > > [1] A modified tulip driver at 100Mbps FD which does all the rx processing > (record stats etc) but drops the packet instead of passing the packet up > the stack easily handles 150Kpps. I have only tested with one > interface. The stats are derived by simply using ifconfig and comparing > with the hardware generator -- nothing fancy. I should retry it blasting > at two NICS and see whether they can both handle it. This was a while back > using a hardware traffic generator (very precise interpacket times of 0.96 > microsecs) with some 2.2 kernel. If you have a flexible packet generator and are testing the routing path you can mark the packets when sending it - And just count the marked packets in the receiver end. And instead of passing them to netif_rx() just view them in the /proc. This has the advantage that the receiver behaves "normal" for arps and other protocols stuff it just eats/counts/views your marked packets. As you already seen the genune tulip chips can receive 148 kpps and you got a cheap sink device. We used this when we experimented with "fast switching" path. I have a patch somewhere for the driver we use if you are interested. --ro From owner-netdev@oss.sgi.com Sun Apr 2 04:31:23 2000 Received: by oss.sgi.com id ; Sun, 2 Apr 2000 04:31:14 -0700 Received: from smtprich.nortel.com ([192.135.215.8]:51079 "EHLO smtprich.nortel.com") by oss.sgi.com with ESMTP id ; Sun, 2 Apr 2000 04:30:52 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprich.nortel.com; Sun, 2 Apr 2000 06:31:31 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2BF0YYBZ; Sun, 2 Apr 2000 06:30:30 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id HNN1JNZY; Sun, 2 Apr 2000 21:30:32 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.207.103]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id VAA23353 for ; Sun, 2 Apr 2000 21:30:30 +1000 Message-ID: <38E7308B.F5DE5214@uow.edu.au> Date: Sun, 02 Apr 2000 11:35:39 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.13-7mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev Subject: More questions... Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing - dev_kfree_skb_irq can be called from non-IRQ context in 3c59x.c This looks safe. But should one wrap the call in local_irq_save()/restore() for future-safety? - Should dev_kfree_skb_irq() be doing a local_irq_save() even though it's for IRQ context only? - Are Alan and JeffG on this list? - What does the final_version macro do? - I have reviewed the "official" (pah) Linux 3c59x driver wrt Donald's. The differences are: - netif stuff - PCI resource allocation and PCI memory handling - Some new features in Donald's driver which, although important are probably not appropriate to 2.3.99-pre.. - One substantive change to vortex_interrupt: if (status & DownComplete) { unsigned int dirty_tx = vp->dirty_tx; while (vp->cur_tx - dirty_tx > 0) { int entry = dirty_tx % TX_RING_SIZE; if (inl(ioaddr + DownListPtr) == virt_to_bus(&vp->tx_ring[entry])) break; /* It still hasn't been processed. */ if (vp->tx_skbuff[entry]) { DEV_FREE_SKB(vp->tx_skbuff[entry]); vp->tx_skbuff[entry] = 0; } /* vp->stats.tx_packets++; Counted below. */ dirty_tx++; } vp->dirty_tx = dirty_tx; >>>>> outw(AckIntr | DownComplete, ioaddr + EL3_CMD); if (vp->tx_full && (vp->cur_tx - dirty_tx <= TX_RING_SIZE - 1)) { vp->tx_full= 0; clear_bit(0, (void*)&dev->tbusy); mark_bh(NET_BH); } } The interrupt ack has been moved. Donald has it prior to the loop. Can you remember why? Thanks. -- -akpm- From owner-netdev@oss.sgi.com Sun Apr 2 04:44:43 2000 Received: by oss.sgi.com id ; Sun, 2 Apr 2000 04:44:23 -0700 Received: from smtprtp1.ntcom.nortel.net ([137.118.22.14]:60877 "EHLO smtprtp1.ntcom.nortel.net") by oss.sgi.com with ESMTP id ; Sun, 2 Apr 2000 04:44:01 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by smtprtp1.ntcom.nortel.net; Sun, 2 Apr 2000 07:43:34 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2BF0CDPW; Sun, 2 Apr 2000 19:43:30 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id HNN1JNZ6; Sun, 2 Apr 2000 21:43:33 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.207.103]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id VAA23399 for ; Sun, 2 Apr 2000 21:43:30 +1000 Message-ID: <38E73398.E1727365@uow.edu.au> Date: Sun, 02 Apr 2000 11:48:40 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.13-7mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev Subject: coverage Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing How do you guys test things like: - max_interrupt_work exceeded - Tx stuck, invoke tx_timeout - All the possible error conditions like rxearly, receiver overrun, etc, etc. My approach to the first two was to put a 5 mSec delay in boomerang_rx(), decrease max_interrupt_work to 5 and ping flood it. This produces some pretty ugly results. Most of the time the packet dumping code: if ((status & (0x7fe - (UpComplete | DownComplete))) == 0) { /* Just ack these and return. */ outw(AckIntr | UpComplete | DownComplete, ioaddr + EL3_CMD); works just fine. But just occasionally this test doesn't return true and we fall into: printk(KERN_WARNING "%s: Too much work in interrupt, status " "%4.4x. Temporarily disabling functions (%4.4x).\n", dev->name, status, SetStatusEnb | ((~status) & 0x7FE)); /* Disable all pending interrupts. */ outw(SetStatusEnb | ((~status) & 0x7FE), ioaddr + EL3_CMD); outw(AckIntr | 0x7FF, ioaddr + EL3_CMD); /* The timer will reenable interrupts. */ break; and all hell breaks loose. Very occasionally the tx_timeout will do the right thing, but most of the time the ISR is invoked (possibly for Tx - I haven't checked). Within the ISR the following test: if (status & RxComplete) vortex_rx(dev); returns true. We enter vortex_rx()! But this is with 3c905B hardware. vortex_rx() doesn't talk to 3c905's and it loops infinitely. I don't want to blow more time on this corner case if there is a better way to exercise this code. Of course, my testing methods could be causing the above problems to occur... -- -akpm- "Excellent. If it will not panic here, your kernel will be destroyed secretly and fatally" - Alexey K From owner-netdev@oss.sgi.com Sun Apr 2 08:00:34 2000 Received: by oss.sgi.com id ; Sun, 2 Apr 2000 08:00:25 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:1801 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 2 Apr 2000 08:00:09 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id SAA05591; Sun, 2 Apr 2000 18:57:23 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004021457.SAA05591@ms2.inr.ac.ru> Subject: Re: More questions... To: andrewm@uow.EDU.AU (Andrew Morton) Date: Sun, 2 Apr 2000 18:57:23 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <38E7308B.F5DE5214@uow.edu.au> from "Andrew Morton" at Apr 2, 0 04:13:14 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 850 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > - dev_kfree_skb_irq can be called from non-IRQ context > in 3c59x.c This looks safe. But should one > wrap the call in local_irq_save()/restore() for > future-safety? No, you should not. See below. > - Should dev_kfree_skb_irq() be doing a local_irq_save() > even though it's for IRQ context only? There exist no such thing as "IRQ context". Each IRQ bucket has its own context and they can overlap arbitrarily. dev_kfree_skb_irq() simply enqueues skb to a per-cpu queue shared by all the IRQs, so that it needs protection against all the irqs. Essentially, all "IRQ-safe" functions (another example is netif_rx()) may be safely called from any other context. Is it answer to the first question? Actually, it is even possible to design lighter scheme to move skb destruction out of irqs, not using local_irq_save(). Alexey From owner-netdev@oss.sgi.com Sun Apr 2 09:19:55 2000 Received: by oss.sgi.com id ; Sun, 2 Apr 2000 09:19:45 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:32521 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 2 Apr 2000 09:19:33 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA06079; Sun, 2 Apr 2000 20:18:27 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004021618.UAA06079@ms2.inr.ac.ru> Subject: Re: coverage To: andrewm@uow.EDU.AU (Andrew Morton) Date: Sun, 2 Apr 2000 20:18:27 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <38E73398.E1727365@uow.edu.au> from "Andrew Morton" at Apr 2, 0 04:13:15 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1011 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > How do you guys test things like: > > - max_interrupt_work exceeded ... > My approach to the first two was to put a 5 mSec delay in > boomerang_rx(), decrease max_interrupt_work to 5 and ping flood it. It is easy. "ping -f" does not create any real load on the network, it is pure latency test, when there is always only one packet in flight. Do not use it. Just write simple program, sending small udp packets without gaps, or use tools sort of netperf. To create extremal load, it is possible to use pgen tool by Robert Olsson (ftp://robur.slu.se:/pub/Linux/tmp/), it is a bit out of date, but easy to update. > - Tx stuck, invoke tx_timeout I think any man working with network has in his table some broken hub, which generates only collisions. 8) Mmm... what will occur if to plug crossed pair to two hub slots? Will it simulate broken hub then? (I did not advise you to do this! 8)) Seriously, you can just to skip some TX ring refilling deliberately to trigger this condition. Alexey From owner-netdev@oss.sgi.com Sun Apr 2 09:36:15 2000 Received: by oss.sgi.com id ; Sun, 2 Apr 2000 09:36:06 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:27914 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 2 Apr 2000 09:35:40 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA06582; Sun, 2 Apr 2000 20:34:18 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004021634.UAA06582@ms2.inr.ac.ru> Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) To: becker@scyld.COM (Donald Becker) Date: Sun, 2 Apr 2000 20:34:18 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Donald Becker" at Apr 1, 0 10:13:09 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 913 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Just queuing the packet in netif_rx() *should* take minimal extra work, and > have minimal cache impact. Dropping the packet because the queue layer is > full should take even less work. It is true, it does not take any time. But we need not this. We need to do some work in the worst conditions, rather then to serve as perfect blackhole. Linux as it is, is perfect blackhole. We need that driver leaved us some time for packet processing. Not all the OS exists to serve driver, but driver exists to allow this OS to do some useful work. Look at HW_FLOWCONTROL, by the way. It is crappy, but it is real move, rather than useless talks. And it really works at least, when process context is not involved (i.e. forwarding). > I think that some people have been making the assumption that it's the > driver itself that is slow.. And some people avoid to assume anything before profiling. 8) Alexey From owner-netdev@oss.sgi.com Sun Apr 2 23:30:00 2000 Received: by oss.sgi.com id ; Sun, 2 Apr 2000 23:29:41 -0700 Received: from ppp75.arobas.net ([205.205.36.145]:26372 "HELO dialin156.ottawa.globalserve.net") by oss.sgi.com with SMTP id ; Sun, 2 Apr 2000 23:29:21 -0700 Received: (qmail 4090 invoked by uid 1000); 3 Apr 2000 06:26:38 -0000 Date: Mon, 3 Apr 2000 02:26:38 -0400 From: Jerome Etienne To: netdev@oss.sgi.com Subject: minor mtu fix and beautifications Message-ID: <20000403022638.A669@long-haul.net> Reply-To: jetienne@arobas.net Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii User-Agent: Mutt/1.0i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello, Reading the ip tunnel code, i have seen that the max mtu was 0xfff8 instead of 0xffff (in ipip.c/ip_gre.c/sit.c). If it is a bug, the following patch fix it. At the same time, i did some beautifications replacing some 68 by IP_MIN_MTU. --- linux-2.3.99-pre3/include/linux/ip.h Sat Mar 18 15:11:22 2000 +++ linux-2.3.99-pre3_patch/include/linux/ip.h Mon Apr 3 01:45:18 2000 @@ -67,6 +67,9 @@ #define MAXTTL 255 #define IPDEFTTL 64 +#define IP_MIN_MTU 68 /* rfc791.fragmentation */ +#define IP_MAX_MTU 0xFFFF /* rfc791.3.1.total_length */ + /* struct timestamp, struct route and MAX_ROUTES are removed. REASONS: it is clear that nobody used them because: --- linux-2.3.99-pre3/net/ipv4/ip_gre.c Fri Mar 17 13:56:20 2000 +++ linux-2.3.99-pre3_patch/net/ipv4/ip_gre.c Mon Apr 3 01:48:03 2000 @@ -91,7 +91,7 @@ that is ALL. :-) Well, it does not remove the problem completely, but exponential growth of network traffic is changed to linear (branches, that exceed pmtu are pruned) and tunnel mtu - fastly degrades to value <68, where looping stops. + fastly degrades to value <68 (IP_MIN_MTU), where looping stops. Yes, it is not good if there exists a router in the loop, which does not force DF, even when encapsulating packets have DF set. But it is not our problem! Nobody could accuse us, we made @@ -450,7 +450,7 @@ case ICMP_FRAG_NEEDED: /* And it is the only really necesary thing :-) */ rel_info = ntohs(skb->h.icmph->un.frag.mtu); - if (rel_info < grehlen+68) + if (rel_info < grehlen+IP_MIN_MTU) return; rel_info -= grehlen; /* BSD 4.2 MORE DOES NOT EXIST IN NATURE. */ @@ -708,7 +708,7 @@ mtu = rt->u.dst.pmtu - tunnel->hlen; if (skb->protocol == __constant_htons(ETH_P_IP)) { - if (skb->dst && mtu < skb->dst->pmtu && mtu >= 68) + if (skb->dst && mtu < skb->dst->pmtu && mtu >= IP_MIN_MTU) skb->dst->pmtu = mtu; df |= (old_iph->frag_off&__constant_htons(IP_DF)); @@ -976,7 +976,7 @@ static int ipgre_tunnel_change_mtu(struct net_device *dev, int new_mtu) { struct ip_tunnel *tunnel = (struct ip_tunnel*)dev->priv; - if (new_mtu < 68 || new_mtu > 0xFFF8 - tunnel->hlen) + if (new_mtu < IP_MIN_MTU || new_mtu > IP_MAX_MTU - tunnel->hlen) return -EINVAL; dev->mtu = new_mtu; return 0; --- linux-2.3.99-pre3/net/ipv4/ipip.c Mon Apr 3 01:52:40 2000 +++ linux-2.3.99-pre3_patch/net/ipv4/ipip.c Mon Apr 3 01:46:59 2000 @@ -384,7 +384,7 @@ case ICMP_FRAG_NEEDED: /* And it is the only really necesary thing :-) */ rel_info = ntohs(skb->h.icmph->un.frag.mtu); - if (rel_info < hlen+68) + if (rel_info < hlen+IP_MIN_MTU) return; rel_info -= hlen; /* BSD 4.2 MORE DOES NOT EXIST IN NATURE. */ @@ -553,7 +553,7 @@ } mtu = rt->u.dst.pmtu - sizeof(struct iphdr); - if (mtu < 68) { + if (mtu < IP_MIN_MTU) { tunnel->stat.collisions++; ip_rt_put(rt); goto tx_error; @@ -761,7 +761,7 @@ static int ipip_tunnel_change_mtu(struct net_device *dev, int new_mtu) { - if (new_mtu < 68 || new_mtu > 0xFFF8 - sizeof(struct iphdr)) + if (new_mtu < IP_MIN_MTU || new_mtu > IP_MAX_MTU - sizeof(struct iphdr)) return -EINVAL; dev->mtu = new_mtu; return 0; --- linux-2.3.99-pre3/net/ipv6/sit.c Mon Apr 3 01:52:40 2000 +++ linux-2.3.99-pre3_patch/net/ipv6/sit.c Mon Apr 3 01:49:02 2000 @@ -471,7 +471,7 @@ } mtu = rt->u.dst.pmtu - sizeof(struct iphdr); - if (mtu < 68) { + if (mtu < IP_MIN_MTU) { tunnel->stat.collisions++; ip_rt_put(rt); goto tx_error; @@ -689,7 +689,7 @@ static int ipip6_tunnel_change_mtu(struct net_device *dev, int new_mtu) { - if (new_mtu < IPV6_MIN_MTU || new_mtu > 0xFFF8 - sizeof(struct iphdr)) + if (new_mtu < IPV6_MIN_MTU || new_mtu > IP_MAX_MTU-sizeof(struct iphdr)) return -EINVAL; dev->mtu = new_mtu; return 0; From owner-netdev@oss.sgi.com Mon Apr 3 02:45:31 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 02:45:22 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:15301 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 02:45:11 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch1.nortel.com; Mon, 3 Apr 2000 04:45:28 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2BF0ZDDT; Mon, 3 Apr 2000 04:45:00 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id HNN1JPDR; Mon, 3 Apr 2000 19:45:02 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.207.103]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id TAA01144; Mon, 3 Apr 2000 19:44:57 +1000 Message-ID: <38E8695B.E99D0D6A@uow.edu.au> Date: Mon, 03 Apr 2000 09:50:19 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.13-7mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: coverage References: <38E73398.E1727365@uow.edu.au> from "Andrew Morton" at Apr 2, 0 04:13:15 pm <200004021618.UAA06079@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > It is easy. "ping -f" does not create any real load on the network, > it is pure latency test, when there is always only one packet in flight. I use 'ping -l 100000' -l preload If preload is specified, ping sends that many packets as fast as possible before falling into its normal mode of behavior. Only the super-user may use this option. > Do not use it. Just write simple program, sending small udp packets > without gaps, or use tools sort of netperf. To create extremal load, > it is possible to use pgen tool by Robert Olsson > (ftp://robur.slu.se:/pub/Linux/tmp/), it is a bit out of date, > but easy to update. Thanks - I'll look into it. > > - Tx stuck, invoke tx_timeout > > I think any man working with network has in his table some broken hub, > which generates only collisions. 8) Mmm... what will occur if to plug > crossed pair to two hub slots? Will it simulate broken hub then? > (I did not advise you to do this! 8)) > > Seriously, you can just to skip some TX ring refilling deliberately > to trigger this condition. Thanks again. -- -akpm- From owner-netdev@oss.sgi.com Mon Apr 3 03:14:32 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 03:14:22 -0700 Received: from robur.slu.se ([130.238.98.12]:44293 "EHLO robur.slu.se") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 03:14:02 -0700 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id MAA21007; Mon, 3 Apr 2000 12:08:26 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14568.28058.761608.988825@robur.slu.se> Date: Mon, 3 Apr 2000 12:08:26 +0200 (CEST) To: kuznet@ms2.inr.ac.ru Cc: andrewm@uow.EDU.AU (Andrew Morton), netdev@oss.sgi.com Subject: Re: coverage X-Mailer: VM 6.75 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru writes: > It is easy. "ping -f" does not create any real load on the network, > it is pure latency test, when there is always only one packet in flight. > Do not use it. Just write simple program, sending small udp packets > without gaps, or use tools sort of netperf. To create extremal load, > it is possible to use pgen tool by Robert Olsson > (ftp://robur.slu.se:/pub/Linux/tmp/), it is a bit out of date, > but easy to update. Hello! It is currently removed. I was suggested not have it accessible via anonymous ftp. I'll send people interested in PPS and router testing a copy. It runs from kernel needs a our hacked tulip driver plus NIC with genuine tulip chips. --ro From owner-netdev@oss.sgi.com Mon Apr 3 04:12:11 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 04:11:52 -0700 Received: from smtprich.nortel.com ([192.135.215.8]:21225 "EHLO smtprich.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 04:11:45 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprich.nortel.com; Mon, 3 Apr 2000 06:12:06 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2BF0ZDNQ; Mon, 3 Apr 2000 06:11:06 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id HNN1JP14; Mon, 3 Apr 2000 21:11:07 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.207.103]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id VAA01544 for ; Mon, 3 Apr 2000 21:11:05 +1000 Message-ID: <38E87D8A.E86A9F5B@uow.edu.au> Date: Mon, 03 Apr 2000 11:16:26 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.13-7mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev Subject: Re: coverage References: <38E73398.E1727365@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew Morton wrote: > > My approach to the first two was to put a 5 mSec delay in > boomerang_rx(), decrease max_interrupt_work to 5 and ping flood it. > > This produces some pretty ugly results. OK, found it. It could be interpreted as a silicon (or specification) bug. But it can be fixed in the driver. vortex_interrupt() { ... if (status & RxComplete) vortex_rx(dev); ... } In the 3c905, rxComplete indicates that the rx FIFO has one or more full packets. But it is _internally_ acknowledged when the upload engine has transferred the packet to memory. So there is a small window during which that bit will be set. But the bit shouldn't be visible to the processor because it's being masked off in the Indication Enable register. "Indications disabled with this command do not cause an interrupt to the host, nor are they visible in the IntStatus register". However I am very occasionally seeing this bit set when processing updateStats interrupts. Off we go to vortex_rx and the kernel is destroyed secretly and fatally! When describing this bit in the IntStatus reg the 905 spec says "This bit is automatically acked by the upload engine as it uploads packets. Drivers should disable this interrupt and mask this bit when reading IntStatus". But the initialisation code in vortex_open() only does half the job: vp->status_enable = SetStatusEnb | HostError|IntReq|StatsFull|TxComplete| (vp->full_bus_master_tx ? DownComplete : TxAvailable) | (vp->full_bus_master_rx ? UpComplete : RxComplete) | (vp->bus_master ? DMADone : 0); vp->intr_enable = SetIntrEnb | IntLatch | TxAvailable | RxComplete | StatsFull | HostError | TxComplete | IntReq | (vp->bus_master ? DMADone : 0) | UpComplete | DownComplete; Note that it is setting clearing RxComplete in the IndicationEnable reg but setting RxComplete in the InterruptEnable reg. This appears to be confusing the chip. When I cleared RxComplete in the InterruptEnable reg the problem went away. So there's the fix: vp->intr_enable = SetIntrEnb | IntLatch | TxAvailable | (vp->full_bus_master_rx ? 0 : RxComplete) | StatsFull | HostError | TxComplete | IntReq | (vp->bus_master ? DMADone : 0) | UpComplete | DownComplete; I think we should be treating TxAvailable in the same way in this statement. Don, it perturbs me that we are testing RxComplete and TxAvailable in the 905 ISR. And vice versa for 59x's, of course. I think it would be better from a cleanliness and performance (esp. cache footprint) POV to break out a separate boomerang_interrupt. What do you think? -- -akpm- From owner-netdev@oss.sgi.com Mon Apr 3 04:12:51 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 04:12:42 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:20203 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 04:12:32 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id HAA02258; Mon, 3 Apr 2000 07:12:31 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id HAA21471; Mon, 3 Apr 2000 07:12:31 -0400 (EDT) Date: Mon, 3 Apr 2000 07:12:31 -0400 (EDT) From: jamal To: Robert Olsson cc: Michael Richardson , netdev@oss.sgi.com Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) In-Reply-To: <14567.3208.945950.713466@robur.slu.se> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, 2 Apr 2000, Robert Olsson wrote: > If you have a flexible packet generator and are testing the routing path > you can mark the packets when sending it - And just count the marked packets > in the receiver end. And instead of passing them to netif_rx() just view > them in the /proc. This has the advantage that the receiver behaves > "normal" for arps and other protocols stuff it just eats/counts/views > your marked packets. As you already seen the genune tulip chips can > receive 148 kpps and you got a cheap sink device. > Sorry it was 148 kpps and not 150Kpps (which is beyond the 100Mbps spec). I have a 21143 -- is that what you mean by Genuine? > We used this when we experimented with "fast switching" path. I have a patch > somewhere for the driver we use if you are interested. > Thanks. Alexey pointed me to your extension a few months back. What kind of results do you get ? Mind sharing them? cheers, jamal From owner-netdev@oss.sgi.com Mon Apr 3 04:22:22 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 04:22:12 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:2541 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 04:22:00 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id HAA04155; Mon, 3 Apr 2000 07:21:59 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id HAA21479; Mon, 3 Apr 2000 07:21:59 -0400 (EDT) Date: Mon, 3 Apr 2000 07:21:59 -0400 (EDT) From: jamal To: kuznet@ms2.inr.ac.ru cc: Donald Becker , netdev@oss.sgi.com Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) In-Reply-To: <200004021634.UAA06582@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, 2 Apr 2000 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Just queuing the packet in netif_rx() *should* take minimal extra work, and > > have minimal cache impact. Dropping the packet because the queue layer is > > full should take even less work. > > It is true, it does not take any time. But we need not this. Looking at my notes: 2.3 just swallows as much as 2.2. Maybe this is chip dependent as Robert was implying? I have a 21143 chip. I should easily be able to check the exact revision numbers from the PCI messages if someone is interested. The h/ware is a 4-port ccard made by Znyx. cheers, jamal From owner-netdev@oss.sgi.com Mon Apr 3 04:46:12 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 04:46:02 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:42455 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 04:45:50 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch1.nortel.com; Mon, 3 Apr 2000 06:45:59 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2BF0ZDWB; Mon, 3 Apr 2000 06:45:31 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id HNN1JPGB; Mon, 3 Apr 2000 21:45:33 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.207.103]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id VAA01765; Mon, 3 Apr 2000 21:45:26 +1000 Message-ID: <38E88597.D6F4FF94@uow.edu.au> Date: Mon, 03 Apr 2000 11:50:47 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.13-7mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: More questions... References: <38E7308B.F5DE5214@uow.edu.au> from "Andrew Morton" at Apr 2, 0 04:13:14 pm <200004021457.SAA05591@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > Hello! > > > - dev_kfree_skb_irq can be called from non-IRQ context > > in 3c59x.c This looks safe. But should one > > wrap the call in local_irq_save()/restore() for > > future-safety? > > No, you should not. See below. > > > - Should dev_kfree_skb_irq() be doing a local_irq_save() > > even though it's for IRQ context only? > > There exist no such thing as "IRQ context". Each IRQ bucket > has its own context and they can overlap arbitrarily. You miss my point. /* Use this variant when it is known for sure that it * is executing from interrupt context. */ extern __inline__ void dev_kfree_skb_irq(struct sk_buff *skb) { if (atomic_dec_and_test(&skb->users)) { int cpu =smp_processor_id(); unsigned long flags; local_irq_save(flags); skb->next = softnet_data[cpu].completion_queue; softnet_data[cpu].completion_queue = skb; __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); local_irq_restore(flags); } } If "it is known for sure" then why does this function need the local_irq_save()? Is it because this CPU could take an interrupt for a different device and, within the nested ISR, find its completion queue in an inconsistent state? -- -akpm- From owner-netdev@oss.sgi.com Mon Apr 3 05:00:42 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 05:00:32 -0700 Received: from adsl-151-196-244-92.bellatlantic.net ([151.196.244.92]:20722 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 05:00:09 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id HAA19297; Mon, 3 Apr 2000 07:59:05 -0400 Date: Mon, 3 Apr 2000 07:59:05 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Andrew Morton cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: coverage In-Reply-To: <38E8695B.E99D0D6A@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 3 Apr 2000, Andrew Morton wrote: > kuznet@ms2.inr.ac.ru wrote: > > > > It is easy. "ping -f" does not create any real load on the network, > > it is pure latency test, when there is always only one packet in flight. > > I use 'ping -l 100000' > -l preload And, as you found out, most of the preloaded packets are discarded, at least with most versions of 'ping' and the kernel. The discard is done by the queue layer, before the device driver gets to see the packets. > > > - Tx stuck, invoke tx_timeout > > > > I think any man working with network has in his table some broken hub, > > which generates only collisions. 8) Mmm... what will occur if to plug > > crossed pair to two hub slots? Will it simulate broken hub then? > > (I did not advise you to do this! 8)) This won't accomplish what you expect. Most chips won't transmit on twisted pair without having link beat, producing a different type of error. Coax transceivers always get a collision when no cable is connected, resulting in a 16 collision error, not a Tx timeout. You can sometimes simulate the same effect on 10baseT with a mispaired cable, although not with 100baseTx. Donald Becker Scyld Computing Corporation, becker@scyld.com From owner-netdev@oss.sgi.com Mon Apr 3 05:22:01 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 05:21:51 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:40162 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 05:21:28 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by smtprch1.nortel.com; Mon, 3 Apr 2000 07:12:21 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2BF0C2SV; Mon, 3 Apr 2000 20:11:51 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id HNN1JPG7; Mon, 3 Apr 2000 22:11:54 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.207.103]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id WAA01885 for ; Mon, 3 Apr 2000 22:11:50 +1000 Message-ID: <38E88BC7.437850F9@uow.edu.au> Date: Mon, 03 Apr 2000 12:17:11 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.13-7mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev Subject: 3c59x redux Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Guys, I cannot _believe_ the amount of email I'm generating over this. Still, it has been useful for me and, I hope, for the Linux 3com driver. Thanks for your patience and assistance. Here is my new list of proposed changes. Please review. Please note the very last point - I think it's a buglet in the existing driver. - For static compilation, make 'debug' a constant. - For MODULE compilation, make debug a variable, but put in a comment pointing out the code size and cache footprint advantages of making it a constant. - struct pci_id_info has a 'probe1' method. It is never used. Remove this. - use init_etherdev() to allocate dev->priv (and don't kfree it). Check that the address returned from init_etherdev() is 16-byte aligned. If it isn't, force an oops. This will keep everyone happy and honest :-) - make ram_split[] static (shorter code). - priv.in_interrupt is not used. - boomerang_start_xmit(): if (1) { - Inconsistent use of 'vortex_debug' and 'debug' - *** Clarify this #if ! defined(final_version). This seems to be superfluous - it's always compiled in (there is no final version!) - handle error return from init_etherdev() kmalloc() pci_alloc_consistent() request_region() - vortex_open(): /* Use the now-standard shared IRQ implementation. */ if (request_irq(dev->irq, &vortex_interrupt, SA_SHIRQ, dev->name, dev)) { return -EAGAIN; } This happens _after_ we've called init_timer()/add_timer, so vortex_timer() will end up getting called on a driver for which the open failed. Move the init_timer()/add_timer() code to after this return. - "For later cards which can handle "porky packets", emulate the HAVE_CHANGE_MTU stuff in hamachi.c. This allows the MTU to be changed via an ioctl." Put a FIXME comment in the code for this at this time. Locking issues ============== The philosophy at this time is "no locks". The justification is below. If things start going wierd then the "big driver lock" should go in. (benchmarks?) - No spinlock is needed in vortex_start_xmit/boomerang_start_xmit because: 1: The Tx and Rx handling in the ISR are separate. The h/w is safe with this and an Rx interrupt during vortex_start)xmit() is OK. 2: netcore prevents hard_start_xmit from being reentered. - No spinlock needed on vortex_tx_timeout() for the same reasons. tx_timeout() is serialised wrt hard_start_xmit by the netcore layer. - vortex_get_stats() fiddles with the hardware a lot. It needs exclusive access. But there are no serialisation guarantees for get_stats() so we'll do a global cli() in here (it already has one) rather than clog the fast paths with a spinlock. - set_rx_mode() (aka set_multicast_list) has no protection, but it's a single outw(). No locking needed here at all. - vortex_interrupt() uses dev_kfree_skb_irq(), but vortex_interrupt() is called from elsewhere in non-IRQ context! Alexey says this is OK. - vortex_ioctl() fiddles with h/w. It calls mdio_write() which is very stateful. It sets the register window pointer which is also stateful. Use cli(). - in vortex_interrupt(): if (status & TxAvailable) { This looks scary, because it's a vortex-only thing. But the 3c905 guarantees that this bit is zero. Add a comment to clarify this. (Or split the ISR into vortex and boomerang). - Donald's concern about the new netif_wake_queue() in vortex_interrupt(). Not much we can do about this? I overestimated (I said ~100 insns). In fact netif_wake_queue is ~25 insns, ~5 memory hits worst case. - Don has extra cards in his pci_tbl. Pinch these. - Nail 'dev=0' at end of vortex_scan() - Fix the MOD_INC/DEC_USE_COUNT as per Tim Waugh's report: drivers/net/3c59x.c:vortex_attach -- wrong drivers/net/3c59x.c:vortex_open -- calls request_irq further up Still to resolve ================ - cli() is "reliable death of serial interfaces". What did Alexey mean by this? Do we actually drop Rx chars on serial ports? Hope not. - I'm confused. If 'option=1', we set vp->bus_master, but this is only for vortex. A consequence of this is that the DMADone interrupt gets enabled. But for 905, this bit (bit 8 in the interrupt reg) is 'linkEvent'! So we should clear vp->bus_master if it's a 905, and arrange for this test: if (status & DMADone) { if (inw(ioaddr + Wn7_MasterStatus) & 0x1000) { to not occur at all for 905's. We need to split up the ISR - what does Don think? -- -akpm- From owner-netdev@oss.sgi.com Mon Apr 3 06:39:52 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 06:39:32 -0700 Received: from robur.slu.se ([130.238.98.12]:51973 "EHLO robur.slu.se") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 06:39:26 -0700 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id PAA22221; Mon, 3 Apr 2000 15:39:13 +0200 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14568.40700.735283.656774@robur.slu.se> Date: Mon, 3 Apr 2000 15:39:08 +0200 (CEST) To: jamal Cc: Robert Olsson , Michael Richardson , netdev@oss.sgi.com Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) X-Mailer: VM 6.75 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal writes: > Thanks. > Alexey pointed me to your extension a few months back. What kind of > results do you get ? Mind sharing them? Not at all! We did a lot experimenting with this about 2 years ago. New ideas came every day most of them came from Russia. :-) >From my head I can give you some performance numbers with linux routing between two tulip NIC's and smallest packets 64 bytes. (PII ~400 Mhz ~100 Mhz bus geunine fast ethernet tulip chips) Normal path : ~40 KPPS Fast switching patch: ~147 KPPS In the fast switching path HW_FLOWCONTROL is active (shouldn't be too active here) and also skb recycling is used which seems to have good effect with fast switching. Expect multiport boards to have slighly less performance. PCI-bridge? As a comparison a can give real life example from last week. When some popular package was released. We have two linux routers that connects of the major archives via load sharing to our university ISP were filling 155 Mbps alone. These routers run normal path plus have some ipchains filters and as full BGP ~75000 routes. CPU is PII 350 MHZ and HW_FLOWCONTROL. The ISP Cisco said: Output queue 0/40, 0 drops; input queue 0/75, 682 drops 5 minute input rate 6268000 bits/sec, 9094 packets/sec 5 minute output rate 149098000 bits/sec, 16210 packets/sec ^^^^^^^^^ The two Linux routers CPU was loaded at: R1 ~20% This has 75 Mbps out R2 ~45% This has 75 Mbps out plus incoming trafik. You seem have equipment to verify the results above and do own experiments. Still impressive but with a load about ~45-50% I have to start worry. :-) --ro From owner-netdev@oss.sgi.com Mon Apr 3 08:13:31 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 08:13:21 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:60420 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 3 Apr 2000 08:12:56 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA00873; Mon, 3 Apr 2000 19:12:29 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004031512.TAA00873@ms2.inr.ac.ru> Subject: Re: coverage To: andrewm@uow.edu.au (Andrew Morton) Date: Mon, 3 Apr 2000 19:12:29 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <38E8695B.E99D0D6A@uow.edu.au> from "Andrew Morton" at Apr 3, 0 09:50:19 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 293 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I use 'ping -l 100000' This does not help. Preloading only 100 packets is possible, the rest simply disappear to blackhole. Actually, you can increase tx_queue_len on eth0 to a cosmic value sort of 1500000, then it will be able to generate burst of packets for ~10 seconds. Alexey From owner-netdev@oss.sgi.com Mon Apr 3 08:23:02 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 08:22:52 -0700 Received: from dialup-ad-13-116.camtech.net.au ([203.55.243.244]:9732 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 08:22:36 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Tue, 4 Apr 2000 00:51:17 +0930 (CST) Message-Id: From: Rusty Russell To: torvalds@transmeta.com cc: netdev@oss.sgi.com Subject: [PATCH] netfilter fixes v2.3.99-pre4-2 Date: Tue, 04 Apr 2000 00:51:06 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linus, please apply. Note to all: I am out this week (knee reconstruction). If I post before Thursday, I'm probably on pain medication, so check my patches carefully 8). This fixes: 1) Module unload races. 2) NAT on fragment issues (GameSpy crash). 3) Partially-initialized NAT entries if protocol doesn't like packet. 4) `check' failure one-too-many-down() bug. Some paranoia issues: 1) Now doing an external lookup will never create a new connection. 2) Extra checks for locally-injected truncated IP packets (raw sockets). Enjoy! Rusty. diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/include/linux/netfilter_ipv4/ip_conntrack_protocol.h working/include/linux/netfilter_ipv4/ip_conntrack_protocol.h --- linux-2.3.99-pre4-2/include/linux/netfilter_ipv4/ip_conntrack_protocol.h Sat Apr 8 18:10:12 2000 +++ working/include/linux/netfilter_ipv4/ip_conntrack_protocol.h Mon Apr 3 14:47:59 2000 @@ -37,10 +37,10 @@ struct iphdr *iph, size_t len, enum ip_conntrack_info ctinfo); - /* Called when a new connection for this protocol found; returns - * TRUE if it's OK. If so, packet() called next. */ - int (*new)(struct ip_conntrack *conntrack, - struct iphdr *iph, size_t len); + /* Called when a new connection for this protocol found; + * returns timeout. If so, packet() called next. */ + unsigned long (*new)(struct ip_conntrack *conntrack, + struct iphdr *iph, size_t len); /* Module (if any) which this is connected to. */ struct module *me; diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/include/linux/netfilter_ipv4/ipt_state.h working/include/linux/netfilter_ipv4/ipt_state.h --- linux-2.3.99-pre4-2/include/linux/netfilter_ipv4/ipt_state.h Sat Mar 18 05:26:20 2000 +++ working/include/linux/netfilter_ipv4/ipt_state.h Mon Apr 3 14:18:20 2000 @@ -1,8 +1,7 @@ #ifndef _IPT_STATE_H #define _IPT_STATE_H -#define _IPT_STATE_BIT(ctinfo) (1 << ((ctinfo)+1)) -#define IPT_STATE_BIT(ctinfo) ((ctinfo) >= IP_CT_IS_REPLY ? _IPT_STATE_BIT((ctinfo)-IP_CT_IS_REPLY) : _IPT_STATE_BIT(ctinfo)) +#define IPT_STATE_BIT(ctinfo) (1 << ((ctinfo)%IP_CT_IS_REPLY+1)) #define IPT_STATE_INVALID (1 << 0) struct ipt_state_info diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/ip_fragment.c working/net/ipv4/ip_fragment.c --- linux-2.3.99-pre4-2/net/ipv4/ip_fragment.c Thu Feb 10 14:38:09 2000 +++ working/net/ipv4/ip_fragment.c Mon Apr 3 14:18:20 2000 @@ -387,8 +387,13 @@ */ skb->security = qp->fragments->skb->security; +#ifdef CONFIG_NETFILTER + /* Connection association is same as fragment (if any). */ + skb->nfct = qp->fragments->skb->nfct; + nf_conntrack_get(skb->nfct); #ifdef CONFIG_NETFILTER_DEBUG skb->nf_debug = qp->fragments->skb->nf_debug; +#endif #endif /* Done with all fragments. Fixup the new IP header. */ diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_core.c working/net/ipv4/netfilter/ip_conntrack_core.c --- linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_core.c Sat Apr 1 19:03:03 2000 +++ working/net/ipv4/netfilter/ip_conntrack_core.c Mon Apr 3 14:18:20 2000 @@ -343,6 +343,7 @@ size_t hash, repl_hash; struct ip_conntrack_expect *expected; enum ip_conntrack_info ctinfo; + unsigned long extra_jiffies; int i; if (!invert_tuple(&repl_tuple, tuple, protocol)) { @@ -366,19 +367,24 @@ repl_hash = hash_conntrack(&repl_tuple); memset(conntrack, 0, sizeof(struct ip_conntrack)); - atomic_set(&conntrack->ct_general.use, 1); + atomic_set(&conntrack->ct_general.use, 2); conntrack->ct_general.destroy = destroy_conntrack; conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *tuple; conntrack->tuplehash[IP_CT_DIR_ORIGINAL].ctrack = conntrack; conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = repl_tuple; conntrack->tuplehash[IP_CT_DIR_REPLY].ctrack = conntrack; - for(i=0; i < IP_CT_NUMBER; i++) + for (i=0; i < IP_CT_NUMBER; i++) conntrack->infos[i].master = &conntrack->ct_general; - if (!protocol->new(conntrack, skb->nh.iph, skb->len)) { + extra_jiffies = protocol->new(conntrack, skb->nh.iph, skb->len); + if (!extra_jiffies) { kmem_cache_free(ip_conntrack_cachep, conntrack); return 1; } + conntrack->timeout.data = (unsigned long)conntrack; + conntrack->timeout.function = death_by_timeout; + conntrack->timeout.expires = jiffies + extra_jiffies; + add_timer(&conntrack->timeout); /* Sew in at head of hash list. */ WRITE_LOCK(&ip_conntrack_lock); @@ -421,7 +427,7 @@ } static void -resolve_normal_ct(struct sk_buff *skb) +resolve_normal_ct(struct sk_buff *skb, int create) { struct ip_conntrack_tuple tuple; struct ip_conntrack_tuple_hash *h; @@ -436,7 +442,7 @@ do { /* look for tuple match */ h = ip_conntrack_find_get(&tuple, NULL); - if (!h && init_conntrack(&tuple, proto, skb)) + if (!h && (!create || init_conntrack(&tuple, proto, skb))) return; } while (!h); @@ -464,13 +470,15 @@ } /* Return conntrack and conntrack_info a given skb */ -struct ip_conntrack * -ip_conntrack_get(struct sk_buff *skb, enum ip_conntrack_info *ctinfo) +static struct ip_conntrack * +__ip_conntrack_get(struct sk_buff *skb, + enum ip_conntrack_info *ctinfo, + int create) { if (!skb->nfct) { /* It may be an icmp error... */ if (!icmp_error_track(skb)) - resolve_normal_ct(skb); + resolve_normal_ct(skb, create); } if (skb->nfct) { @@ -485,6 +493,12 @@ return NULL; } +struct ip_conntrack * +ip_conntrack_get(struct sk_buff *skb, enum ip_conntrack_info *ctinfo) +{ + return __ip_conntrack_get(skb, ctinfo, 0); +} + /* Netfilter hook itself. */ unsigned int ip_conntrack_in(unsigned int hooknum, struct sk_buff **pskb, @@ -512,13 +526,13 @@ return NF_STOLEN; } - ct = ip_conntrack_get(*pskb, &ctinfo); - if (!ct) + ct = __ip_conntrack_get(*pskb, &ctinfo, 1); + if (!ct) { /* Not valid part of a connection */ return NF_ACCEPT; + } proto = find_proto((*pskb)->nh.iph->protocol); - /* If this is new, this is first time timer will be set */ ret = proto->packet(ct, (*pskb)->nh.iph, (*pskb)->len, ctinfo); if (ret == -1) { @@ -645,24 +659,16 @@ MOD_DEC_USE_COUNT; } -/* Refresh conntrack for this many jiffies: if noone calls this, - conntrack will vanish with current skb. */ +/* Refresh conntrack for this many jiffies. */ void ip_ct_refresh(struct ip_conntrack *ct, unsigned long extra_jiffies) { + IP_NF_ASSERT(ct->timeout.data == (unsigned long)ct); + WRITE_LOCK(&ip_conntrack_lock); - /* If this hasn't had a timer before, it's still being set up */ - if (ct->timeout.data == 0) { - ct->timeout.data = (unsigned long)ct; - ct->timeout.function = death_by_timeout; + /* Need del_timer for race avoidance (may already be dying). */ + if (del_timer(&ct->timeout)) { ct->timeout.expires = jiffies + extra_jiffies; - atomic_inc(&ct->ct_general.use); add_timer(&ct->timeout); - } else { - /* Need del_timer for race avoidance (may already be dying). */ - if (del_timer(&ct->timeout)) { - ct->timeout.expires = jiffies + extra_jiffies; - add_timer(&ct->timeout); - } } WRITE_UNLOCK(&ip_conntrack_lock); } diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_proto_generic.c working/net/ipv4/netfilter/ip_conntrack_proto_generic.c --- linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_proto_generic.c Sat Mar 18 05:26:20 2000 +++ working/net/ipv4/netfilter/ip_conntrack_proto_generic.c Mon Apr 3 14:18:20 2000 @@ -48,9 +48,10 @@ } /* Called when a new connection for this protocol found. */ -static int new(struct ip_conntrack *conntrack, struct iphdr *iph, size_t len) +static unsigned long +new(struct ip_conntrack *conntrack, struct iphdr *iph, size_t len) { - return 1; + return GENERIC_TIMEOUT; } struct ip_conntrack_protocol ip_conntrack_generic_protocol diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_proto_icmp.c working/net/ipv4/netfilter/ip_conntrack_proto_icmp.c --- linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_proto_icmp.c Sat Apr 1 19:03:03 2000 +++ working/net/ipv4/netfilter/ip_conntrack_proto_icmp.c Mon Apr 3 14:18:20 2000 @@ -86,8 +86,8 @@ } /* Called when a new connection for this protocol found. */ -static int icmp_new(struct ip_conntrack *conntrack, - struct iphdr *iph, size_t len) +static unsigned long icmp_new(struct ip_conntrack *conntrack, + struct iphdr *iph, size_t len) { static u_int8_t valid_new[] = { [ICMP_ECHO] = 1, @@ -103,7 +103,7 @@ DUMP_TUPLE(&conntrack->tuplehash[0].tuple); return 0; } - return 1; + return ICMP_TIMEOUT; } struct ip_conntrack_protocol ip_conntrack_protocol_icmp diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_proto_tcp.c working/net/ipv4/netfilter/ip_conntrack_proto_tcp.c --- linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_proto_tcp.c Sat Apr 1 19:03:03 2000 +++ working/net/ipv4/netfilter/ip_conntrack_proto_tcp.c Mon Apr 3 14:18:20 2000 @@ -189,14 +189,13 @@ conntrack->proto.tcp_state = newconntrack; WRITE_UNLOCK(&tcp_lock); - /* Refresh: need write lock to write to conntrack. */ ip_ct_refresh(conntrack, tcp_timeouts[conntrack->proto.tcp_state]); return NF_ACCEPT; } /* Called when a new connection for this protocol found. */ -static int tcp_new(struct ip_conntrack *conntrack, - struct iphdr *iph, size_t len) +static unsigned long tcp_new(struct ip_conntrack *conntrack, + struct iphdr *iph, size_t len) { enum tcp_conntrack newconntrack; struct tcphdr *tcph = (struct tcphdr *)((u_int32_t *)iph + iph->ihl); @@ -210,11 +209,10 @@ if (newconntrack == TCP_CONNTRACK_MAX) { DEBUGP("ip_conntrack_tcp: invalid new deleting.\n"); return 0; - } else { - conntrack->proto.tcp_state = newconntrack; - ip_ct_refresh(conntrack, tcp_timeouts[conntrack->proto.tcp_state]); } - return 1; + + conntrack->proto.tcp_state = newconntrack; + return tcp_timeouts[conntrack->proto.tcp_state]; } struct ip_conntrack_protocol ip_conntrack_protocol_tcp diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_proto_udp.c working/net/ipv4/netfilter/ip_conntrack_proto_udp.c --- linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_proto_udp.c Sat Apr 1 19:03:03 2000 +++ working/net/ipv4/netfilter/ip_conntrack_proto_udp.c Mon Apr 3 14:18:20 2000 @@ -54,10 +54,10 @@ } /* Called when a new connection for this protocol found. */ -static int udp_new(struct ip_conntrack *conntrack, - struct iphdr *iph, size_t len) +static unsigned long udp_new(struct ip_conntrack *conntrack, + struct iphdr *iph, size_t len) { - return 1; + return UDP_TIMEOUT; } struct ip_conntrack_protocol ip_conntrack_protocol_udp diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_standalone.c working/net/ipv4/netfilter/ip_conntrack_standalone.c --- linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_conntrack_standalone.c Sat Apr 1 19:03:03 2000 +++ working/net/ipv4/netfilter/ip_conntrack_standalone.c Mon Apr 3 14:18:20 2000 @@ -169,8 +169,6 @@ interface. We degfragment them at LOCAL_OUT, however, so we have to refragment them here. */ if ((*pskb)->len > rt->u.dst.pmtu) { - DEBUGP("ip_conntrack: refragm %p (size %u) to %u (okfn %p)\n", - *pskb, (*pskb)->len, rt->u.dst.pmtu, okfn); /* No hook can be after us, so this should be OK. */ ip_fragment(*pskb, okfn); return NF_STOLEN; @@ -178,13 +176,29 @@ return NF_ACCEPT; } +static unsigned int ip_conntrack_local(unsigned int hooknum, + struct sk_buff **pskb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + /* root is playing with raw sockets. */ + if ((*pskb)->len < sizeof(struct iphdr) + || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) { + if (net_ratelimit()) + printk("ipt_hook: happy cracking.\n"); + return NF_ACCEPT; + } + return ip_conntrack_in(hooknum, pskb, in, out, okfn); +} + /* Connection tracking may drop packets, but never alters them, so make it the first hook. */ static struct nf_hook_ops ip_conntrack_in_ops = { { NULL, NULL }, ip_conntrack_in, PF_INET, NF_IP_PRE_ROUTING, NF_IP_PRI_CONNTRACK }; static struct nf_hook_ops ip_conntrack_local_out_ops -= { { NULL, NULL }, ip_conntrack_in, PF_INET, NF_IP_LOCAL_OUT, += { { NULL, NULL }, ip_conntrack_local, PF_INET, NF_IP_LOCAL_OUT, NF_IP_PRI_CONNTRACK }; /* Refragmenter; last chance. */ static struct nf_hook_ops ip_conntrack_out_ops diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_nat_standalone.c working/net/ipv4/netfilter/ip_nat_standalone.c --- linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_nat_standalone.c Sat Apr 1 19:03:03 2000 +++ working/net/ipv4/netfilter/ip_nat_standalone.c Mon Apr 3 14:18:20 2000 @@ -130,6 +130,11 @@ const struct net_device *out, int (*okfn)(struct sk_buff *)) { + /* root is playing with raw sockets. */ + if ((*pskb)->len < sizeof(struct iphdr) + || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) + return NF_ACCEPT; + /* We can hit fragment here; forwarded packets get defragmented by connection tracking coming in, then fragmented (grr) by the forward code. @@ -150,6 +155,21 @@ return ip_nat_fn(hooknum, pskb, in, out, okfn); } +static unsigned int +ip_nat_local_fn(unsigned int hooknum, + struct sk_buff **pskb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + /* root is playing with raw sockets. */ + if ((*pskb)->len < sizeof(struct iphdr) + || (*pskb)->nh.iph->ihl * 4 < sizeof(struct iphdr)) + return NF_ACCEPT; + + return ip_nat_fn(hooknum, pskb, in, out, okfn); +} + /* We must be after connection tracking and before packet filtering. */ /* Before packet filtering, change destination */ @@ -160,7 +180,7 @@ = { { NULL, NULL }, ip_nat_out, PF_INET, NF_IP_POST_ROUTING, NF_IP_PRI_NAT_SRC}; /* Before packet filtering, change destination */ static struct nf_hook_ops ip_nat_local_out_ops -= { { NULL, NULL }, ip_nat_fn, PF_INET, NF_IP_LOCAL_OUT, NF_IP_PRI_NAT_DST }; += { { NULL, NULL }, ip_nat_local_fn, PF_INET, NF_IP_LOCAL_OUT, NF_IP_PRI_NAT_DST }; /* Protocol registration. */ int ip_nat_protocol_register(struct ip_nat_protocol *proto) diff -urN --minimal --exclude *.lds --exclude *.sgml --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_tables.c working/net/ipv4/netfilter/ip_tables.c --- linux-2.3.99-pre4-2/net/ipv4/netfilter/ip_tables.c Sat Apr 1 19:03:03 2000 +++ working/net/ipv4/netfilter/ip_tables.c Mon Apr 3 14:52:21 2000 @@ -39,7 +39,7 @@ #define IP_NF_ASSERT(x) \ do { \ if (!(x)) \ - printk("IPT_ASSERT: %s:%s:%u\n", \ + printk("IP_NF_ASSERT: %s:%s:%u\n", \ __FUNCTION__, __FILE__, __LINE__); \ } while(0) #else @@ -683,7 +683,6 @@ target = find_target_lock(t->u.name, &ret, &ipt_mutex); if (!target) { duprintf("check_entry: `%s' not found\n", t->u.name); - up(&ipt_mutex); return ret; } if (target->me) @@ -1283,17 +1282,16 @@ { int ret; + MOD_INC_USE_COUNT; ret = down_interruptible(&ipt_mutex); if (ret != 0) return ret; - if (list_named_insert(&ipt_target, target)) { - MOD_INC_USE_COUNT; - ret = 0; - } else { + if (!list_named_insert(&ipt_target, target)) { duprintf("ipt_register_target: `%s' already in list!\n", target->name); ret = -EINVAL; + MOD_DEC_USE_COUNT; } up(&ipt_mutex); return ret; @@ -1313,16 +1311,18 @@ { int ret; + MOD_INC_USE_COUNT; ret = down_interruptible(&ipt_mutex); - if (ret != 0) + if (ret != 0) { + MOD_DEC_USE_COUNT; return ret; - + } if (list_named_insert(&ipt_match, match)) { - MOD_INC_USE_COUNT; ret = 0; } else { duprintf("ipt_register_match: `%s' already in list!\n", match->name); + MOD_DEC_USE_COUNT; ret = -EINVAL; } up(&ipt_mutex); @@ -1346,10 +1346,12 @@ static struct ipt_table_info bootstrap = { 0, 0, { 0 }, { 0 }, { }, { } }; + MOD_INC_USE_COUNT; newinfo = vmalloc(sizeof(struct ipt_table_info) + SMP_ALIGN(table->table->size) * smp_num_cpus); if (!newinfo) { ret = -ENOMEM; + MOD_DEC_USE_COUNT; return ret; } memcpy(newinfo->entries, table->table->entries, table->table->size); @@ -1361,12 +1363,14 @@ table->table->underflow); if (ret != 0) { vfree(newinfo); + MOD_DEC_USE_COUNT; return ret; } ret = down_interruptible(&ipt_mutex); if (ret != 0) { vfree(newinfo); + MOD_DEC_USE_COUNT; return ret; } @@ -1386,7 +1390,6 @@ table->lock = RW_LOCK_UNLOCKED; list_prepend(&ipt_tables, table); - MOD_INC_USE_COUNT; unlock: up(&ipt_mutex); @@ -1394,6 +1397,7 @@ free_unlock: vfree(newinfo); + MOD_DEC_USE_COUNT; goto unlock; } -- Hacking time. From owner-netdev@oss.sgi.com Mon Apr 3 08:42:52 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 08:42:42 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:30725 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 3 Apr 2000 08:42:32 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA01268; Mon, 3 Apr 2000 19:42:22 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004031542.TAA01268@ms2.inr.ac.ru> Subject: Re: More questions... To: andrewm@uow.edu.au (Andrew Morton) Date: Mon, 3 Apr 2000 19:42:22 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <38E88597.D6F4FF94@uow.edu.au> from "Andrew Morton" at Apr 3, 0 11:50:47 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1329 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > You miss my point. > > /* Use this variant when it is known for sure that it > * is executing from interrupt context. > */ > extern __inline__ void dev_kfree_skb_irq(struct sk_buff *skb) > { > if (atomic_dec_and_test(&skb->users)) { > int cpu =smp_processor_id(); > unsigned long flags; > > local_irq_save(flags); > skb->next = softnet_data[cpu].completion_queue; > softnet_data[cpu].completion_queue = skb; > __cpu_raise_softirq(cpu, NET_TX_SOFTIRQ); > local_irq_restore(flags); > } > } > > > If "it is known for sure" then why does this function need the > local_irq_save()? No, I understood you. (I repeat) There is no single serialized "interrupt context". When telling "interrupt context", we say really: "from the worst possible context, totally unserialized" The recommendation "for sure" stands there, because this function is pure overhead, when called from BH or process context. Besides that it can result in too long delay until real packet destruction and sockets can starve of write skbs. > Is it because this CPU could take an interrupt for a different device > and, within the nested ISR, find its completion queue in an inconsistent > state? Yes, exactly. Alexey From owner-netdev@oss.sgi.com Mon Apr 3 11:29:53 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 11:29:33 -0700 Received: from wirespeed.solidum.com ([216.13.130.242]:34718 "EHLO solidum.com") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 11:29:20 -0700 Received: from phobos.solidum.com (mcr@phobos.solidum.com [192.168.1.13]) by solidum.com (8.8.7/8.8.7) with ESMTP id OAA04715 for ; Mon, 3 Apr 2000 14:29:08 -0400 Message-Id: <200004031829.OAA04715@solidum.com> To: netdev@oss.sgi.com Subject: Re: Queue and SMP locking discussion (was Re: 3c59x.c) In-Reply-To: Your message of "Sat, 01 Apr 2000 10:28:25 EST." Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Mon, 03 Apr 2000 14:29:08 -0400 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "jamal" == jamal writes: jamal> I think we had this debate during your presentation ;-> Here are jamal> my thoughts: jamal> Bus Latency is not a problem as far as throughput is jamal> concerned. This problem can be equated to *exactly* the high jamal> RTT-BW problem in TCP. You just have to adjust your ring-buffering jamal> accordingly. I dont think processing latency is an issue either; jamal> even with your broken pcnet driver[1] you come up with a number of jamal> 4007 cycles to process a packet. Get yourself a faster processor jamal> ;-> So your assertion that "the 33Mhz, 32 bit PCI bus itself can Further investigation revealed a logic problem that was actually processing the packet to upper layers for that number, i.e. actually queuing the packet to the BH. I need to update the paper. How much faster? jamal> theoretically handle up to one and a half million (1428571 to be jamal> exact) frames per second, or 50 10 Mb/s adaptors" is misleading. jamal> I realize you say it is theoretical; however, ask people who use jamal> Alexey's fast forwarding driver and they'll tell you they jamal> definetly do more than 50Mbps. I said 1,500,000 packets/s. I.e. GB ethernet. That isn't 50Mb/s. I don't dispute 50Mb/s. I agree with it. I also agree that you can drop packets at 150,000 packets/s. The stats say so.. it is a question of optimization. The major problem that the PAX.ware 100 has is that the receive ring must be kept on the PCI memory, which makes processing it very slow. We have considered using a DMA controller to copy it to system memory in the background, but haven't done that yet. (We observe very high hits for accessing the receive ring resident on card) We were able to optomize things to get well below the 4007 reported initially. A respin of card will integrate the classification co-processor much more tightly to the MAC, so the receive ring will reside in system memory, where one doesn't have to pay the PCI-tax to get it. :!mcr!: | Solidum Systems Corporation, http://www.solidum.com Michael Richardson |For a better connected world,where data flows faster Personal: http://www.sandelman.ottawa.on.ca/People/Michael_Richardson/Bio.html mailto:mcr@sandelman.ottawa.on.ca mailto:mcr@solidum.com From owner-netdev@oss.sgi.com Mon Apr 3 15:29:57 2000 Received: by oss.sgi.com id ; Mon, 3 Apr 2000 15:29:37 -0700 Received: from smtprich.nortel.com ([192.135.215.8]:29332 "EHLO smtprich.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 3 Apr 2000 15:29:23 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprich.nortel.com; Mon, 3 Apr 2000 17:12:48 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2BF0Z5KS; Mon, 3 Apr 2000 17:11:33 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id HNN1JP3K; Tue, 4 Apr 2000 08:11:35 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.207.103]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id IAA07007; Tue, 4 Apr 2000 08:11:30 +1000 Message-ID: <38E91853.1481C943@uow.edu.au> Date: Mon, 03 Apr 2000 22:16:51 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.13-7mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: coverage References: <38E8695B.E99D0D6A@uow.edu.au> from "Andrew Morton" at Apr 3, 0 09:50:19 am <200004031512.TAA00873@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > Hello! > > > I use 'ping -l 100000' > > This does not help. Preloading only 100 packets is possible, > the rest simply disappear to blackhole. Actually, you can > increase tx_queue_len on eth0 to a cosmic value sort of > 1500000, then it will be able to generate burst of packets > for ~10 seconds. > > Alexey Well now that's odd. I tried: time ping -l10000 -c1 -s100 -q mnm to send 10,000 100 byte packets. They were all received, according to 'ifconfig eth0'. The command took 4.47 seconds wallclock time. That's a miserable 1.8 mbits/sec. The sender is an ISA ne2k on 10bT. I would expect that there would be a lot of queue pressure on the sending side under these circumstances. The sender is kernel 2.2.13. -- -akpm- From owner-netdev@oss.sgi.com Thu Apr 6 08:15:44 2000 Received: by oss.sgi.com id ; Thu, 6 Apr 2000 08:15:24 -0700 Received: from usat2-00222.usateleport.com ([208.248.183.222]:5622 "EHLO convert rfc822-to-8bit tinuviel.compendium.com.ar") by oss.sgi.com with ESMTP id ; Thu, 6 Apr 2000 08:15:04 -0700 Received: (from horape@localhost) by tinuviel.compendium.com.ar (8.9.3/8.9.3/Debian 8.9.3-6) id MAA14225; Thu, 6 Apr 2000 12:13:55 -0300 From: "Horacio J. =?ISO-8859-1?Q?Pe=F1a"?= Date: Thu, 6 Apr 2000 12:13:54 -0300 To: linux-net@vger.rutgers.edu, ipchains@rustcorp.com, netdev@oss.sgi.com, l-linux@calvo.teleco.ulpgc.es, balug-lst@balug.org.ar Subject: New version of policy routing micro-HOWTO. Message-ID: <20000406121354.A14175@tinuviel.compendium.com.ar> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT User-Agent: Mutt/1.1.2i x-attribution: HoraPe Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing ¡Hola! I've made an update to the policy routing micro-HOWTO. I've added an explanation of how to route by transport protocol headers (using FWMARK) It's at http://compendium.ar.uninet.edu/policy-routing.txt HoraPe --- Horacio J. Peña horape@compendium.com.ar horape@uninet.edu bofh@puntoar.net.ar horape@hcdn.gov.ar From owner-netdev@oss.sgi.com Sat Apr 8 02:37:15 2000 Received: by oss.sgi.com id ; Sat, 8 Apr 2000 02:37:06 -0700 Received: from corellia.franken.de ([193.174.2.129]:26385 "EHLO corellia.sunbeam.franken.de") by oss.sgi.com with ESMTP id ; Sat, 8 Apr 2000 02:36:40 -0700 Received: by corellia.sunbeam.franken.de via sendmail with stdio id for netdev@oss.sgi.com; Sat, 8 Apr 2000 11:36:36 +0200 (MEST) (Smail-3.2 1996-Jul-4 #2 built 1999-Nov-7) Date: Sat, 8 Apr 2000 11:36:36 +0200 From: Harald Welte To: linux-net@vger.rutgers.edu, netdev@oss.sgi.com Subject: Problems using DFE-570 Message-ID: <20000408113636.F6299@corellia.sunbeam.franken.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i X-Operating-System: Linux corellia 2.2.13 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi! Probably I'm off-topic on this mailinglist, but i thought it would rather belong here than on linux-kernel. I'm using a D-Link DFE-570 four-port fast ethernet card. It i using a pci2pci bridge and four DEC DC21142 (rev 65) chips (they're actually produced by intel) and is therefore similar to other designs like adaptec fourport cards. At first I tried the tulip driver. The tulip does recognize all four chips, provides eth0 till eth3. But unfortunately it doesn't work at all. Tcpdump shows no packets from the net on any interface. this is a dump from what happened at insmod tulip: ===================== Mar 10 21:16:36 localhost kernel: tulip.c:v0.89H 5/23/98 becker@cesdis.gsfc.nasa.gov Mar 10 21:16:36 localhost kernel: eth0: Digital DS21142/3 Tulip at 0xe000, 00 80 c8 57 c8 61, IRQ 10. Mar 10 21:16:36 localhost kernel: eth0: EEPROM default media type Autosense. Mar 10 21:16:36 localhost kernel: eth0: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. Mar 10 21:16:36 localhost kernel: eth0: MII transceiver found at MDIO address 1, config 3100 status 7849. Mar 10 21:16:36 localhost kernel: eth1: Digital DS21142/3 Tulip at 0xe400, 00 80 c8 57 c8 62, IRQ 12. Mar 10 21:16:36 localhost kernel: eth1: EEPROM default media type Autosense. Mar 10 21:16:36 localhost kernel: eth1: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. Mar 10 21:16:36 localhost kernel: eth1: MII transceiver found at MDIO address 1, config 3100 status 7849. Mar 10 21:16:36 localhost kernel: eth2: Digital DS21142/3 Tulip at 0xe800, 00 80 c8 57 c8 63, IRQ 5. Mar 10 21:16:36 localhost kernel: eth2: EEPROM default media type Autosense. Mar 10 21:16:36 localhost kernel: eth2: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. Mar 10 21:16:36 localhost kernel: eth2: MII transceiver found at MDIO address 1, config 3100 status 7849. Mar 10 21:16:36 localhost kernel: eth3: Digital DS21142/3 Tulip at 0xec00, 00 80 c8 57 c8 64, IRQ 11. Mar 10 21:16:36 localhost kernel: eth3: EEPROM default media type Autosense. Mar 10 21:16:36 localhost kernel: eth3: Index #0 - Media MII (#11) described by a 21142 MII PHY (3) block. Mar 10 21:16:36 localhost kernel: eth3: MII transceiver found at MDIO address 1, config 3100 status 7849. ======================= I suspect there's a problem initializing the transceivers or sth. like that. If I'm using the de4x5 driver, everything works fine at 100MBit. But I want to use two of the four ports at 10MBit for a transition period. this is a dump from insmod de4x5: ========================= Mar 10 21:41:47 localhost kernel: de4x5.c:V0.544 1999/5/8 davies@maniac.ultranet.com Mar 10 21:41:47 localhost kernel: eth1: DC21143 at 0xe400 (PCI bus 2, device 5), h/w address 00:80:c8:57:c8:62, Mar 10 21:41:47 localhost kernel: and requires IRQ12 (provided by PCI BIOS). Mar 10 21:41:47 localhost kernel: de4x5.c:V0.544 1999/5/8 davies@maniac.ultranet.com Mar 10 21:41:47 localhost kernel: eth2: DC21143 at 0xe800 (PCI bus 2, device 6), h/w address 00:80:c8:57:c8:63, Mar 10 21:41:47 localhost kernel: and requires IRQ5 (provided by PCI BIOS). Mar 10 21:41:47 localhost kernel: de4x5.c:V0.544 1999/5/8 davies@maniac.ultranet.com Mar 10 21:41:47 localhost kernel: eth3: DC21143 at 0xec00 (PCI bus 2, device 7), h/w address 00:80:c8:57:c8:64, Mar 10 21:41:47 localhost kernel: and requires IRQ11 (provided by PCI BIOS). Mar 10 21:41:47 localhost kernel: de4x5.c:V0.544 1999/5/8 davies@maniac.ultranet.com Mar 10 21:42:13 localhost kernel: eth0: media is 100Mb/s. Mar 10 21:47:32 localhost kernel: eth2: media is 100Mb/s. ========================= I read the documentation and already tried to hardcode one ethernet port to 10MBit. But as soon as a 10mbit network is attached to the port, "ethX: media is 100Mb/s" appears. But that network is 10Mb/s for sure. Maybe anyone has a guess what's going on. Thanx in advance. P.S. The machine is a redhat 6.1, redhat 2.2.12-20 kernel, K6-III 400 -- Live long and prosper - Harald Welte / laforge@corellia.franken.de http://sunbeam.franken.de ============================================================================ GCS/E d- s-/ a--- C+++ UL++++$ P+++ L+++$ E+ W+++ N++ K- w--- O M-- PS+ PE++ Y-- PGP++ t+++ 5-- !X !R tv- b+++ DI? D+ G+ e* h++ r++ y+(*) From owner-netdev@oss.sgi.com Sun Apr 9 08:31:41 2000 Received: by oss.sgi.com id ; Sun, 9 Apr 2000 08:31:32 -0700 Received: from rmx05.iname.net ([165.251.8.203]:669 "EHLO rmx05.globecomm.net") by oss.sgi.com with ESMTP id ; Sun, 9 Apr 2000 08:31:05 -0700 Received: from weba2.iname.net by rmx05.globecomm.net (8.9.1/8.8.0) with ESMTP id LAA09260 ; Sun, 9 Apr 2000 11:31:04 -0400 (EDT) From: timka@altavista.net Received: (from root@localhost) by weba2.iname.net (8.9.1a/8.9.2.Alpha2) id LAA07742; Sun, 9 Apr 2000 11:31:04 -0400 (EDT) MIME-Version: 1.0 Message-Id: <000409113104CF.02961@weba2.iname.net> Date: Sun, 9 Apr 2000 11:31:04 -0400 (EDT) Content-Type: Text/Plain Content-Transfer-Encoding: 7bit To: netdev@oss.sgi.com Subject: Kernel panic asosiated with tcp_v4_destroy_sock Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing [1.] One line summary of the problem: Kernel panic asosiated with tcp_v4_destroy_sock [2.] Full description of the problem/report: Sametimes (up to 5 times a day, but samedays never) my PC with RedHat installed ( Alan's kernel 2.2.15pre16, the same problem was with 2.2.15pre4 and 2.2.15pre12 - all I've tryed) shows me: Kernel panic: Attempting to kill the idle task! In swaper task - not syncing Before this in log files appears: kernel: Warning: kfree_skb passed an skb steel on a list (from cXXXXXXXX) It offten hapens after the FTP sesion closed [3.] Keywords (i.e., modules, networking, kernel): tcp_v4_destroy_sock, kfree_skb, Kernel panic [4.] Kernel version (from /proc/version): Linux version 2.2.15pre16 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) Unable to handle kernel NULL pointer derefence of virtual address 00000004 current->tss.cr3=00101000 %cr3=00101000 *pde=00000000 Oops: 0002 CPU: 0 EIP: 0010:[] EFLAGS: 00010206 eax: 00000000 ebx: c11dfd54 ecx: c1a84fe0 edx: c1c04000 esi: c11dfd00 edi: c11dfdb0 ebp: c01c7f4c esp: c01c7f08 ds: 0018 es: 0018 ss: 0018 >>EIP: c016112f [6.] A small shell script or example program which triggers the problem (if possible) I can't suply any script able to trigger the problem [7.] Environment [7.1.] Software (add the output of the ver_linux script here) -- Versions installed: (if some fields are empty or looks -- unusual then possibly you have very old versions) Linux pallada.lviv.net 2.2.15pre16 #1 Sun Apr 2 19:58:19 EEST 2000 i586 unknown Kernel modules 2.1.121 Gnu C egcs-2.91.66 Binutils 2.9.1.0.23 Linux C Library 2.1.1 Dynamic linker ldd (GNU libc) 2.1.1 Procps 2.0.2 Mount 2.9o Net-tools 1.52 Console-tools 1999.03.02 Sh-utils 1.16 Modules Loaded sbni tulip via-rhine [7.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : GenuineIntel cpu family : 5 model : 4 model name : Pentium MMX stepping : 3 cpu MHz : 166.588270 fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : yes coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 mmx bogomips : 332.60 [7.3.] Module information (from /proc/modules): sbni 11536 2 (autoclean) tulip 31716 1 (autoclean) via-rhine 9192 1 (autoclean) [7.4.] SCSI information (from /proc/scsi/scsi) None [7.5.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): None [X.] Other notes, patches, fixes, workarounds: I've installed latest sbni drivers version 3.3.0 from http://www.granch.ru/filez/linuxlet.tgz, becose others have many problems with bilding TCP/IP packets. ---------------------------------------------------------------- Get your free email from AltaVista at http://altavista.iname.com From owner-netdev@oss.sgi.com Sun Apr 9 10:20:32 2000 Received: by oss.sgi.com id ; Sun, 9 Apr 2000 10:20:13 -0700 Received: from lightning.swansea.uk.linux.org ([194.168.151.1]:62484 "EHLO the-village.bc.nu") by oss.sgi.com with ESMTP id ; Sun, 9 Apr 2000 10:19:55 -0700 Received: from alan by the-village.bc.nu with local (Exim 2.12 #1) id 12eLMu-0004UE-00 for netdev@oss.sgi.com; Sun, 9 Apr 2000 18:19:28 +0100 Received: from ganymede.linux.org ([198.182.196.48] ident=root) by the-village.bc.nu with esmtp (Exim 2.12 #1) id 12eJes-0004N5-00 for alan@lxorguk.ukuu.org.uk; Sun, 9 Apr 2000 16:29:56 +0100 Received: from rmx05.globecomm.net (rmx05.iname.net [165.251.8.203]) by ganymede.linux.org (8.9.3/8.9.3) with ESMTP id LAA19260 for ; Sun, 9 Apr 2000 11:30:07 -0400 From: timka@altavista.net Received: from weba7.iname.net by rmx05.globecomm.net (8.9.1/8.8.0) with ESMTP id LAA08824 ; Sun, 9 Apr 2000 11:30:04 -0400 (EDT) Received: (from root@localhost) by weba7.iname.net (8.9.1a/8.9.2.Alpha2) id LAA01137; Sun, 9 Apr 2000 11:30:03 -0400 (EDT) MIME-Version: 1.0 Message-Id: <0004091130011L.26929@weba7.iname.net> Date: Sun, 9 Apr 2000 11:30:01 -0400 (EDT) Content-Type: Text/Plain Content-Transfer-Encoding: 7bit To: Alan.Cox@linux.org Subject: Kernel panic asosiated with tcp_v4_destroy_sock Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing [1.] One line summary of the problem: Kernel panic asosiated with tcp_v4_destroy_sock [2.] Full description of the problem/report: Sametimes (up to 5 times a day, but samedays never) my PC with RedHat installed ( Alan's kernel 2.2.15pre16, the same problem was with 2.2.15pre4 and 2.2.15pre12 - all I've tryed) shows me: Kernel panic: Attempting to kill the idle task! In swaper task - not syncing Before this in log files appears: kernel: Warning: kfree_skb passed an skb steel on a list (from cXXXXXXXX) It offten hapens after the FTP sesion closed [3.] Keywords (i.e., modules, networking, kernel): tcp_v4_destroy_sock, kfree_skb, Kernel panic [4.] Kernel version (from /proc/version): Linux version 2.2.15pre16 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) Unable to handle kernel NULL pointer derefence of virtual address 00000004 current->tss.cr3=00101000 %cr3=00101000 *pde=00000000 Oops: 0002 CPU: 0 EIP: 0010:[] EFLAGS: 00010206 eax: 00000000 ebx: c11dfd54 ecx: c1a84fe0 edx: c1c04000 esi: c11dfd00 edi: c11dfdb0 ebp: c01c7f4c esp: c01c7f08 ds: 0018 es: 0018 ss: 0018 >>EIP: c016112f [6.] A small shell script or example program which triggers the problem (if possible) I can't suply any script able to trigger the problem [7.] Environment [7.1.] Software (add the output of the ver_linux script here) -- Versions installed: (if some fields are empty or looks -- unusual then possibly you have very old versions) Linux pallada.lviv.net 2.2.15pre16 #1 Sun Apr 2 19:58:19 EEST 2000 i586 unknown Kernel modules 2.1.121 Gnu C egcs-2.91.66 Binutils 2.9.1.0.23 Linux C Library 2.1.1 Dynamic linker ldd (GNU libc) 2.1.1 Procps 2.0.2 Mount 2.9o Net-tools 1.52 Console-tools 1999.03.02 Sh-utils 1.16 Modules Loaded sbni tulip via-rhine [7.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : GenuineIntel cpu family : 5 model : 4 model name : Pentium MMX stepping : 3 cpu MHz : 166.588270 fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : yes coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 mmx bogomips : 332.60 [7.3.] Module information (from /proc/modules): sbni 11536 2 (autoclean) tulip 31716 1 (autoclean) via-rhine 9192 1 (autoclean) [7.4.] SCSI information (from /proc/scsi/scsi) None [7.5.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): None [X.] Other notes, patches, fixes, workarounds: I've installed latest sbni drivers version 3.3.0 from http://www.granch.ru/filez/linuxlet.tgz, becose others have many problems with bilding TCP/IP packets. ---------------------------------------------------------------- Get your free email from AltaVista at http://altavista.iname.com From owner-netdev@oss.sgi.com Sun Apr 9 15:04:15 2000 Received: by oss.sgi.com id ; Sun, 9 Apr 2000 15:03:56 -0700 Received: from styx.uwaterloo.ca ([129.97.40.10]:58117 "EHLO styx.uwaterloo.ca") by oss.sgi.com with ESMTP id ; Sun, 9 Apr 2000 15:03:36 -0700 Received: (from mostrows@localhost) by styx.uwaterloo.ca (8.9.3/8.9.3) id SAA26860; Sun, 9 Apr 2000 18:03:25 -0400 From: Michal Ostrowski MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14576.65069.860551.814801@styx.uwaterloo.ca> Date: Sun, 9 Apr 2000 18:03:25 -0400 (EDT) To: netdev@oss.sgi.com Subject: PPPoE Patches X-Mailer: VM 6.72 under 21.1 (patch 4) "Arches" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Finally there is some progress to report on the PPPoE front. I've got two patches: http://www.math.uwaterloo.ca/~mostrows/ppp.090400.patch.gz - Patch against ppp CVS repository (April 9). - Includes the patch posted by Mitch a while back that added greater support for plugin hooks throughout pppd - Includes an adapted version of Jamal's PPPoE negotiation/discovery daemon which runs as plugin inside pppd - To use this patch, use the "plugin" option to load the pppoe.so plugin and follow that up with the device name (e.g.: "eth0") in the options file. http://www.math.uwaterloo.ca/~mostrows/pppoe.2.3.99-pre4-5.gz - Patch against 2.3.99-pre3 (with pre4-5 patch) - Adds support for PF_PPPOX with a PPPoE protocol Using this code we have been able to make and maintain PPPoE connections, BUT: 1. There appears to be a major bug somewhere which is triggered by calls to "ppp_unregister_channel". After a PPPoE socket is closed the kernel will oops on a "Scheduling in interrupt". I've made no progress in tracking this down any further (when the oops is generated "schedule" has been called from entry.S). If anyone has any suggestions about this, I'd be happy to try pretty much anything right now. That being said, the offending call to ppp_unregister_channel is currently disabled, resulting in a memory leak. 2. The PPPoE plugin for pppd needs significant work (in particular getting this code to use AF_PACKET instead of raw AF_INET sockets). This should be done in the next few weeks/days. These patches are not ready yet for general use, though myself and Jamal felt it necessary to get people's comments on them and start heading towards integration. Michal Ostrowski mostrows@styx.uwaterloo.ca From owner-netdev@oss.sgi.com Mon Apr 10 03:35:30 2000 Received: by oss.sgi.com id ; Mon, 10 Apr 2000 03:35:21 -0700 Received: from mars.arts.u-szeged.hu ([160.114.28.163]:64014 "EHLO mars.arts.u-szeged.hu") by oss.sgi.com with ESMTP id ; Mon, 10 Apr 2000 03:35:01 -0700 Received: from localhost (sogor@localhost) by mars.arts.u-szeged.hu (8.9.3/8.9.3/Debian/GNU) with SMTP id MAA13411 for ; Mon, 10 Apr 2000 12:34:25 +0200 Date: Mon, 10 Apr 2000 12:34:25 +0200 (CEST) From: Sogor Laszlo To: netdev@oss.sgi.com Subject: ipv6 address types and routing Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! We want to do a SIIT (Stateless IP/ICMP Translator, RFC2765) for Linux, so we need the IPv6 Translated address type. (to route an IPv6 Mapped destination address packet to an IPv6 Translated address) The Linux kernel don't know the IPv6 Translated address. We put it into the kernel, it works, but it would be better, if it were in the official kernel. it works, but is it correct? into net/ipv6/addrconf.c 210. line: return (IPV6_ADDR_COMPATv4 | IPV6_ADDR_UNICAST); } >if (addr->s6_addr32[2] == __constant_htonl(0xffff0000)) > return (IPV6_ADDR_TRANSLATED|IPV6_ADDR_UNICAST); if (addr->s6_addr32[2] == __constant_htonl(0x0000ffff)) into include/net/ipv6.h 65. line #define IPV6_ADDR_SITELOCAL 0x0040U >#define IPV6_ADDR_TRANSLATED 0x0100U #define IPV6_ADDR_COMPATv4 0x0080U on the other hand, we have a problem with IPv6 packet sending. we need some help about it. after we get the IPv4 packet, we store some information from it, drop the ipv4 header, caclulate the IPv6 source and destination addresses, get a route, put the ipv6 header into the packet, fill it, and send the packet. it crashes the machine, but we don't know, why. because we have no documentation, we don't know what we have to correctly do. we need some description about ip6_route_output. we tried: ... struct flowi fl; struct in6_addr *saddr6; struct in6_addr *daddr6; __u8 protocol; ... fl.proto = protocol; fl.fl6_src = saddr6; fl.fl6_dst = daddr6; fl.fl6_flowlabel = __constant_htonl(0); fl.oif = 0; fl.uli_u.data = 0; dst_release(skb->dst); //drop the old dst_entry skb->dst = NULL; skb->dst = ip6_route_output(NULL, &fl); if (skb->dst->error != 0) { printk(KERN_INFO "siit: (4->6) Couldn't get route\n"); goto tx_error_4; } then we fill the IPv6 header. skb->protocol = __constant_htons(ETH_P_IPV6); skb->dst->output(skb); the last line crashes the kernel, totally. it seems, ip6_route_output gives a correct dst_entry, if we put the following line into the module: printk(KERN_INFO "siit: (4->6) out device=%s\n",skb->dst->dev->name); we get the correct output device (eth0, in this case) shogy (Laszlo Sogor) From owner-netdev@oss.sgi.com Mon Apr 10 10:43:34 2000 Received: by oss.sgi.com id ; Mon, 10 Apr 2000 10:43:05 -0700 Received: from rmx06.iname.net ([165.251.8.205]:18586 "EHLO rmx06.iname.net") by oss.sgi.com with ESMTP id ; Mon, 10 Apr 2000 10:42:36 -0700 Received: from weba7.iname.net by rmx06.iname.net (8.9.1/8.8.0) with ESMTP id NAA08326 ; Mon, 10 Apr 2000 13:42:34 -0400 (EDT) From: timka@altavista.net Received: (from root@localhost) by weba7.iname.net (8.9.1a/8.9.2.Alpha2) id NAA02616; Mon, 10 Apr 2000 13:42:34 -0400 (EDT) MIME-Version: 1.0 Message-Id: <0004101342348K.16402@weba7.iname.net> Date: Mon, 10 Apr 2000 13:42:34 -0400 (EDT) Content-Type: Text/Plain Content-Transfer-Encoding: 7bit To: netdev@oss.sgi.com Subject: Kernel panic assosiated with via-rhine module Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing [1.] One line summary of the problem: Kernel panic assosiated with via-rhine module [2.] Full description of the problem/report: Sametimes (up to 5 times a day, but samedays never) my PC with RedHat installed ( Alan's kernel 2.2.15pre16, the same problem was with 2.2.15pre12 - all I've tryed) shows me: Kernel panic: skput: over : cXXXXXXX:XXXX put : 1514 dev: eth0 In swaper task - not syncing eth0 uses via-rhine module. My Card is in 100Mb mode I'm using bridge also. Another card is in 10Mb mode and uses tulip module Becose of using modules I can't trace cXXXXXXX:XXXX [3.] Keywords (i.e., modules, networking, kernel): via-rhine, skput, Kernel panic [4.] Kernel version (from /proc/version): Linux version 2.2.15pre16 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) No Oops message [6.] A small shell script or example program which triggers the problem (if possible) I can't suply any script able to trigger the problem [7.] Environment [7.1.] Software (add the output of the ver_linux script here) -- Versions installed: (if some fields are empty or looks -- unusual then possibly you have very old versions) Linux pallada.lviv.net 2.2.15pre16 #1 Sun Apr 2 19:58:19 EEST 2000 i586 unknown Kernel modules 2.1.121 Gnu C egcs-2.91.66 Binutils 2.9.1.0.23 Linux C Library 2.1.1 Dynamic linker ldd (GNU libc) 2.1.1 Procps 2.0.2 Mount 2.9o Net-tools 1.52 Console-tools 1999.03.02 Sh-utils 1.16 Modules Loaded sbni tulip via-rhine [7.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : GenuineIntel cpu family : 5 model : 4 model name : Pentium MMX stepping : 3 cpu MHz : 166.588270 fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : yes coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 mmx bogomips : 332.60 [7.3.] Module information (from /proc/modules): sbni 11536 2 (autoclean) tulip 31716 1 (autoclean) via-rhine 9192 1 (autoclean) [7.4.] SCSI information (from /proc/scsi/scsi) None [7.5.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): None [X.] Other notes, patches, fixes, workarounds: I'm using bridge 100-10Mb. I've installed latest sbni drivers version 3.3.0 from http://www.granch.ru/filez/linuxlet.tgz Timur Morozov ---------------------------------------------------------------- Get your free email from AltaVista at http://altavista.iname.com From owner-netdev@oss.sgi.com Tue Apr 11 10:07:44 2000 Received: by oss.sgi.com id ; Tue, 11 Apr 2000 10:07:25 -0700 Received: from rmx13.mail.com ([165.251.32.245]:60080 "EHLO rmx13.mail.com") by oss.sgi.com with ESMTP id ; Tue, 11 Apr 2000 10:07:04 -0700 Received: from weba2.iname.net (weba2.iname.net [165.251.4.12]) by rmx13.mail.com (8.9.1/8.9.3) with ESMTP id NAA00964 for ; Tue, 11 Apr 2000 13:07:03 -0400 (EDT) From: timka@altavista.net Received: (from root@localhost) by weba2.iname.net (8.9.1a/8.9.2.Alpha2) id NAA10216; Tue, 11 Apr 2000 13:07:03 -0400 (EDT) MIME-Version: 1.0 Message-Id: <000411130703DY.13733@weba2.iname.net> Date: Tue, 11 Apr 2000 13:07:03 -0400 (EDT) Content-Type: Text/Plain Content-Transfer-Encoding: 7bit To: netdev@oss.sgi.com Subject: Kernel panic with sbni driver. Linux 2.2.x Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! I'm using sbni drivers from But it looks like it mess out socket structures. I've got 'Kernel panic' messages assosiated kfree_skb, skput over keywords. My PC crashs up to 8 time a day (I work on Linux 2.2.15pre12 and pre16) From Oops i only get : So I've Looked up kcore for any assosiated messages. That is what I've got: <3> socki_lookup: socket file changed! <3> sock_release: fasync list not empty! <7> sock_close: NULL inode <7> sk_free: optmem leakege (%d bytes) detected <3> alloc_skb called nonanatomically from interrupt %p <4> Warning: kfree_skb passed an skb still on a list (from %p) Developers of sbni Dirver did not answer. Could you tel me where to find info for writing net-drivers??? ( I think I should correct one by myself ). Timur Morozov ---------------------------------------------------------------- Get your free email from AltaVista at http://altavista.iname.com From owner-netdev@oss.sgi.com Tue Apr 11 18:39:57 2000 Received: by oss.sgi.com id ; Tue, 11 Apr 2000 18:39:36 -0700 Received: from mcaprov.mcanet.com.br ([200.194.224.1]:5907 "HELO convert rfc822-to-8bit mcaprov.mcanet.com.br") by oss.sgi.com with SMTP id ; Tue, 11 Apr 2000 18:39:24 -0700 Received: from kllklk (unverified [209.206.5.214]) by mcaprov.mcanet.com.br (EMWAC SMTPRS 0.83) with SMTP id ; Tue, 11 Apr 2000 21:34:30 +0000 Message-ID: From: "West" Subject: Make BIG $$$ from Home & get FreeTravel To: bigm54f@oss.sgi.com X-Mailer: Microsoft Outlook Express 4.72.1712.3 X-MimeOLE: Produced By Microsoft MimeOLE V(null).1712.3 Mime-Version: 1.0 Date: Tue, 11 Apr 2000 20:52:09 -0500 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Make $2,000-$3,000 per week & get FREE Travel! If you could simply advertise an 800# and take orders from home could you spend 2 hours per day answering your phone? Each week our Directors are making $2-3,000 using our simple system to sell the nation's #1 vacation package & home-based business. You too can begin making $1000 on a $1295 sale and get free travel. Our reps are signing 3-5 new members weekly. Your prospects call you! Every one loves to travel and everyone could use some extra $$$$. So if you are serious about making $3000 per week starting in the next 30 days call us toll free. Call: 1 800 925 7135 ********************************************************************* ** If you receive this message and have never joined one of our email lists you can be removed by replying to: mailto:mik29@yyhmail.com?subject=remove ********************************************************************* ** From owner-netdev@oss.sgi.com Wed Apr 12 13:06:17 2000 Received: by oss.sgi.com id ; Wed, 12 Apr 2000 13:06:08 -0700 Received: from mail.informatik.uni-ulm.de ([134.60.68.63]:3967 "EHLO mail.informatik.uni-ulm.de") by oss.sgi.com with ESMTP id ; Wed, 12 Apr 2000 13:05:42 -0700 Received: from [134.60.8.63] (helo=ferret.extern.uni-ulm.de ident=user62721) by mail.informatik.uni-ulm.de with esmtp (Exim 3.00 #1) id 12fTOR-00078i-00 for netdev@oss.sgi.com; Wed, 12 Apr 2000 22:05:43 +0200 Received: from blackbird.extern.uni-ulm.de ([172.16.1.10] ident=root) by ferret.extern.uni-ulm.de with esmtp (Exim 3.02 #2) id 12fTKI-000094-00 for netdev@oss.sgi.com; Wed, 12 Apr 2000 22:01:26 +0200 Received: (from stefan@localhost) by blackbird.extern.uni-ulm.de (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id WAA06560 for netdev@oss.sgi.com; Wed, 12 Apr 2000 22:01:47 +0200 Date: Wed, 12 Apr 2000 22:01:47 +0200 From: Stefan Schlott To: netdev@oss.sgi.com Subject: Unload ipv6 module? Message-ID: <20000412220147.A5730@blackbird.extern.uni-ulm.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.1.7i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi everyone, a (stupid?) question: Is it possible to remove the ipv6 module? After mod- probing, rmmod always returns "device busy". I discovered a module parameter (unloadable=1) which overrides this check, but I fear it might be there for some good reason :-) On the other hand, rebooting the system just to test a new version of the module is really annoying... any suggestions? Stefan. -- *--- please cut here... -------------------------------------- thanks! ---* |-> E-Mail: stefan.schlott@student.uni-ulm.de DH-PGP-Key: 0x2F36F4FE <-| | Win2k: "It's not so much that it's only 65,000 bugs, it's just that | | they stopped at 65,535 to prevent an overflow." | | -- Seen on Slashdot (21.03.2000) | *-------------------------------------------------------------------------* From owner-netdev@oss.sgi.com Wed Apr 12 13:21:09 2000 Received: by oss.sgi.com id ; Wed, 12 Apr 2000 13:20:58 -0700 Received: from mea.tmt.tele.fi ([194.252.70.162]:55051 "EHLO mea.tmt.tele.fi") by oss.sgi.com with ESMTP id ; Wed, 12 Apr 2000 13:20:54 -0700 Received: by mea.tmt.tele.fi id ; Wed, 12 Apr 2000 23:20:37 +0300 Date: Wed, 12 Apr 2000 23:20:37 +0300 From: Matti Aarnio To: Stefan Schlott Cc: netdev@oss.sgi.com Subject: Re: Unload ipv6 module? Message-ID: <20000412232037.Q13396@mea.tmt.tele.fi> References: <20000412220147.A5730@blackbird.extern.uni-ulm.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit In-Reply-To: <20000412220147.A5730@blackbird.extern.uni-ulm.de>; from Stefan Schlott on Wed, Apr 12, 2000 at 10:01:47PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Apr 12, 2000 at 10:01:47PM +0200, Stefan Schlott wrote: > Hi everyone, > > a (stupid?) question: Is it possible to remove the ipv6 module? After mod- > probing, rmmod always returns "device busy". > I discovered a module parameter (unloadable=1) which overrides this check, > but I fear it might be there for some good reason :-) On the other hand, > rebooting the system just to test a new version of the module is really > annoying... any suggestions? Currently it is likely impossible to remove ipv6 module even if you are foolhardly enough to set that parameter. You will be able to "rmmod", but some jiffies latter you will get Oops.. Main motivation was to have the ipv6 module loadable into the kernel after boot. Linus had for a while modularity disabled entirely, but this 'unloadable' trick convinced him that "right, it can stay there that way." -- and let hackers to have a look at the problem set sometime. > Stefan. > -- > |-> E-Mail: stefan.schlott@student.uni-ulm.de DH-PGP-Key: 0x2F36F4FE <-| /Matti Aarnio From owner-netdev@oss.sgi.com Wed Apr 12 18:07:33 2000 Received: by oss.sgi.com id ; Wed, 12 Apr 2000 18:07:24 -0700 Received: from pizda.ninka.net ([216.101.162.242]:5760 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Wed, 12 Apr 2000 18:07:03 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id SAA00996; Wed, 12 Apr 2000 18:01:12 -0700 Date: Wed, 12 Apr 2000 18:01:12 -0700 Message-Id: <200004130101.SAA00996@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: rusty@linuxcare.com.au CC: torvalds@transmeta.com, netdev@oss.sgi.com In-reply-to: (message from Rusty Russell on Tue, 04 Apr 2000 00:51:06 +0930) Subject: Re: [PATCH] netfilter fixes v2.3.99-pre4-2 References: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing From: Rusty Russell Date: Tue, 04 Apr 2000 00:51:06 +0930 Linus, please apply. Meanwhile, since it hasn't been applied yet, I've put this into my tree and will push it along to Linus under seperate cover. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Apr 12 19:42:43 2000 Received: by oss.sgi.com id ; Wed, 12 Apr 2000 19:42:34 -0700 Received: from blackbird.intercode.com.au ([203.32.101.10]:7174 "EHLO blackbird.intercode.com.au") by oss.sgi.com with ESMTP id ; Wed, 12 Apr 2000 19:42:09 -0700 Received: from localhost (jmorris@localhost) by blackbird.intercode.com.au (8.9.3/8.9.3) with ESMTP id MAA22156 for ; Thu, 13 Apr 2000 12:41:56 +1000 X-Authentication-Warning: blackbird.intercode.com.au: jmorris owned process doing -bs Date: Thu, 13 Apr 2000 12:41:56 +1000 (EST) From: James Morris To: netdev@oss.sgi.com Subject: [PATCH] minor fix for sock_no_recvmsg, 2.3.99-pre-6-1 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Please see below. - James -- James Morris diff -urN linux-2.3.99-pre6-1/include/net/sock.h linux/include/net/sock.h --- linux-2.3.99-pre6-1/include/net/sock.h Thu Apr 13 11:54:16 2000 +++ linux/include/net/sock.h Thu Apr 13 12:21:32 2000 @@ -817,7 +817,7 @@ struct msghdr *, int, struct scm_cookie *); extern int sock_no_recvmsg(struct socket *, - struct msghdr *, int, + struct msghdr *, int, int, struct scm_cookie *); extern int sock_no_mmap(struct file *file, struct socket *sock, diff -urN linux-2.3.99-pre6-1/net/core/sock.c linux/net/core/sock.c --- linux-2.3.99-pre6-1/net/core/sock.c Thu Apr 13 11:54:16 2000 +++ linux/net/core/sock.c Thu Apr 13 12:20:48 2000 @@ -1068,7 +1068,7 @@ return -EOPNOTSUPP; } -int sock_no_recvmsg(struct socket *sock, struct msghdr *m, int flags, +int sock_no_recvmsg(struct socket *sock, struct msghdr *m, int len, int flags, struct scm_cookie *scm) { return -EOPNOTSUPP; From owner-netdev@oss.sgi.com Wed Apr 12 20:14:53 2000 Received: by oss.sgi.com id ; Wed, 12 Apr 2000 20:14:44 -0700 Received: from pizda.ninka.net ([216.101.162.242]:43136 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Wed, 12 Apr 2000 20:14:22 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id UAA13834; Wed, 12 Apr 2000 20:08:20 -0700 Date: Wed, 12 Apr 2000 20:08:20 -0700 Message-Id: <200004130308.UAA13834@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: jmorris@intercode.com.au CC: netdev@oss.sgi.com In-reply-to: (message from James Morris on Thu, 13 Apr 2000 12:41:56 +1000 (EST)) Subject: Re: [PATCH] minor fix for sock_no_recvmsg, 2.3.99-pre-6-1 References: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Thu, 13 Apr 2000 12:41:56 +1000 (EST) From: James Morris Please see below. Thanks, I've applied your patch. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Apr 14 04:45:17 2000 Received: by oss.sgi.com id ; Fri, 14 Apr 2000 04:44:57 -0700 Received: from [195.117.30.142] ([195.117.30.142]:38926 "EHLO convert rfc822-to-8bit www.pressmedia.com.pl") by oss.sgi.com with ESMTP id ; Fri, 14 Apr 2000 04:44:38 -0700 Received: from kllklk (cranston-ip-1-47.dynamic.ziplink.net [209.206.4.47]) by www.pressmedia.com.pl with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 26FHSR8S; Fri, 14 Apr 2000 13:43:53 +0200 From: "Charlie Brine" Subject: Real Time To: pro98j@oss.sgi.com X-Mailer: Mozilla 4.70 [en] (Win95; I) Mime-Version: 1.0 Date: Fri, 14 Apr 2000 06:48:26 -0500 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Message-Id: <20000414114450Z305159-390+180@oss.sgi.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing WE MAKE IT EASY & AFFORDABLE TO ACCEPT CREDIT CARDS FOR YOUR BUSINESS ! INTERNET (Auction Vendors & Online Mall Stores Too!) STOREFRONT OR MAIL ORDER MERCHANTS WE SPECIALIZE IN APPROVING YOU! APPLY TODAY AND START FOR JUST $9.95! FREE APPLICATION!! FREE PROGRAMMING!! DON'T LOSE ANOTHER SALE! APPLY TO ACCEPT CREDIT CARDS AND CALL (888) 264-9272 DON'T FORGET TO ASK ABOUT OUR WEB DESIGN AND HOSTING PACKAGE !!! ********************************************************************* ** If you receive this message and have never joined one of our email lists you can be removed by replying to: mailto:mhip@doramail.com?subject=remove ********************************************************************* ** From owner-netdev@oss.sgi.com Fri Apr 14 14:40:33 2000 Received: by oss.sgi.com id ; Fri, 14 Apr 2000 14:40:14 -0700 Received: from adsl-216-103-211-230.dsl.snfc21.pacbell.net ([216.103.211.230]:52230 "EHLO xidus.net") by oss.sgi.com with ESMTP id ; Fri, 14 Apr 2000 14:40:01 -0700 Received: from bork.weath (bork [192.129.100.107]) by xidus.net (8.9.3/8.9.3) with ESMTP id OAA13530 for ; Fri, 14 Apr 2000 14:39:56 -0700 Date: Fri, 14 Apr 2000 14:39:56 -0700 (PDT) From: Jeremy Weatherford X-Sender: xidus@bork.weath To: netdev@oss.sgi.com Subject: Re: Real Time In-Reply-To: <20000414114450Z305159-390+180@oss.sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing What on earth is going on? This is the second piece of spam I've gotten that's sent through this list. Why isn't the list closed to subscribers only? Jeremy Weatherford xidus@xidus.net http://xidus.net From owner-netdev@oss.sgi.com Fri Apr 14 18:58:58 2000 Received: by oss.sgi.com id ; Fri, 14 Apr 2000 18:58:48 -0700 Received: from bart.inter.net.il ([192.116.202.15]:61644 "EHLO convert rfc822-to-8bit bart.inter.net.il") by oss.sgi.com with ESMTP id ; Fri, 14 Apr 2000 18:58:31 -0700 Received: from kllklk (cranston-ip-2-102.dynamic.ziplink.net [209.206.5.102]) by bart.inter.net.il (8.9.3/8.9.3) with ESMTP id EAA03251; Sat, 15 Apr 2000 04:56:34 +0300 (IDT) Message-Id: <200004150156.EAA03251@bart.inter.net.il> From: "Frank Inine" Subject: Your Choice To: now776g@bart.inter.net.il X-Mailer: Mozilla 4.70 [en] (Win95; I) Mime-Version: 1.0 Date: Fri, 14 Apr 2000 21:03:07 -0500 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing FREE E-COMMERCE WHEN YOU HOST WITH US!! Tired of expensive e-commerce software, set up fees and leasing contracts? Here is the deal: You host your site with us and you get free E -Commerce, including a merchant account, real-time software and shopping cart. NO ONE CAN BEAT OUR PRICE! IF YOU FIND A BETTER DEAL FOR OUR PACKAGE ANYWHERE ELSE, WE WILL MATCH OUR COMPETITOR'S PRICE PLUS GIVE YOU THE FIRST MONTH FOR FREE. While others charge you hundreds of dollars to get a merchant account or put you on a 48 months non-cancelable lease agreement we charge you NOTHING for E-commerce when you sign up for our e-commerce hosting plan. If you wish to stay with your current hosting company you still can get the same deal. Check it out first and make an informed decision. You have never seen a package deal like this before: * Your own merchant account with one of the lowest rates in the industry * Real-Time software to accept VISA, MASTERCARD, AMEX, DISCOVER/NOVUS, DINERSCLUB/CARTE BLANCHE, JCB * Direct deposit within 48 hrs into your checking account * Shopping Cart store front software with an easy to use web based interface * Real-Time Credit Card Processing software * Virtual terminal for phone/fax/mail orders * Automated E-mail receipts to your clients * Recurring billing feature with batch uploads * Password generator for membership sites * Automatic batch closing * Address verification system (AVS) * Back office to 24/7 access account history * 75 MB (megabytes) of disk space * 30 GB (gigabyte) of data transfer per month * 25 POP3 E-mail accounts * Unlimited alias E-mail addresses * Live web site statistics * Unlimited FTP uploads * Anonymous FTP * CGI directory for your own scripts * Site control panel * Installation included * Tech support included All this and more when you sign up for our E-Commerce Hosting plan for ONLY $69.95 per month and a one time set up fee of $199.00. That 's right. NO ADDITIONAL SET UP FEES or application fees for your merchant account real-time software or shopping cart storefront. A one-stop E-Commerce solution. And the best is: NO LEASING, NO LONG TERM COMMITMENT. YOU CAN CANCEL ANYTIME. THIS PRICE APPLIES TO U.S. BASED COMPANIES OR INDIVIDUALS ONLY! BUT WE ALSO HAVE A SOLUTION FOR INTERNATIONAL MERCHANTS! Please reply to: mailto:bjor@iwon.com?subject=INFO-PLEASE to receive our FREE information package without obligations. ********************************************************* Remove at mailto:no77p@safe-mail.net?subject=remove ********************************************************* From owner-netdev@oss.sgi.com Sun Apr 16 20:08:48 2000 Received: by oss.sgi.com id ; Sun, 16 Apr 2000 20:08:39 -0700 Received: from styx.uwaterloo.ca ([129.97.40.10]:16400 "EHLO styx.uwaterloo.ca") by oss.sgi.com with ESMTP id ; Sun, 16 Apr 2000 20:08:21 -0700 Received: (from mostrows@localhost) by styx.uwaterloo.ca (8.9.3/8.9.3) id XAA11588; Sun, 16 Apr 2000 23:08:06 -0400 From: Michal Ostrowski MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14586.32790.611994.174092@styx.uwaterloo.ca> Date: Sun, 16 Apr 2000 23:08:06 -0400 (EDT) To: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com Subject: PPPoE Driver For 2.3.99-pre5 X-Mailer: VM 6.72 under 21.1 (patch 4) "Arches" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I've just made available a patch against 2.3.99-pre5 that adds support for PPPoE to linux using the new PPP-channels architecture. Along with this patch is a patch for ppp-2.4.0b2. Both patches may be found at http://www.math.uwaterloo.ca/~mostrows along with basic installation instructions. These patches solve all of the major problems with the patches I made available last week and overall are stable enough for general use. Currently these patches support PPPoE clients only, though PPPoE server support will be incorporated shortly. Comments, feedback, patches and bug reports are welcome and encouraged. Michal Ostrowski mostrows@styx.uwaterloo.ca From owner-netdev@oss.sgi.com Mon Apr 17 14:26:37 2000 Received: by oss.sgi.com id ; Mon, 17 Apr 2000 14:26:28 -0700 Received: from atol.icm.edu.pl ([212.87.0.35]:14853 "EHLO atol.icm.edu.pl") by oss.sgi.com with ESMTP id ; Mon, 17 Apr 2000 14:26:10 -0700 Received: from burza.icm.edu.pl ([148.81.208.198]:56759 "EHLO burza.icm.edu.pl" ident: "IDENT-NONSENSE") by atol.icm.edu.pl with ESMTP id ; Mon, 17 Apr 2000 23:25:43 +0200 Received: (from rzm@localhost) by burza.icm.edu.pl (8.9.3/8.9.3/rzm-2.6/icm) id XAA07849; Mon, 17 Apr 2000 23:26:36 +0200 (MET DST) Date: Mon, 17 Apr 2000 23:26:36 +0200 From: Rafal Maszkowski To: netdev@oss.sgi.com Cc: 6bone-pl@sunsite.icm.edu.pl Subject: v6 mysteriously unreachable routes (2.2.15pre15) Message-ID: <20000417232635.A7074@burza.icm.edu.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing 3ffe:8010:22:1::3 is unreachable: root@6bone-gw:~,0# /usr/inet6/bin/ping 3ffe:8010:22:1::3 PING 3ffe:8010:22:1::3 (3ffe:8010:22:1::3): 56 data bytes ping: sendmsg: Network is unreachable ping: wrote 3ffe:8010:22:1::3 64 chars, ret=-1 ping: sendmsg: Network is unreachable ping: wrote 3ffe:8010:22:1::3 64 chars, ret=-1 --- 3ffe:8010:22:1::3 ping statistics --- 2 packets transmitted, 0 packets received, 100% packet loss even though there is a route there: root@6bone-gw:~,0# route -A inet6 | grep agar 3ffe:8010:22::2/128 3ffe:8010:22::2 UC 0 975 1 agaran 3ffe:8010:22::2/128 :: U 1024 0 0 agaran 3ffe:8010:22::/126 :: UA 256 99 0 agaran 3ffe:8010:22::/48 fe80::d519:a90a UG 1024 0 0 agaran fe80::/10 :: UA 256 0 0 agaran ff00::/8 :: UA 256 0 0 agaran Deleting /48 twice and adding back helps. I am trying to investigate another prefixes with the same problem. ip rou list table all shows e.g.: local 3ffe:8010:28::1 via :: dev lo metric 0 mtu 3924 rtt 300 3ffe:8010:28::2 via 3ffe:8010:28::2 dev bnet metric 0 cache users 1 used 80 age 32sec mtu 1480 rtt 300 3ffe:8010:28::2 dev bnet metric 1024 mtu 1480 rtt 300 3ffe:8010:28::/126 via :: dev bnet proto kernel metric 256 mtu 1480 rtt 300 unreachable 3ffe:8010:28::/48 dev lo metric 1024 error -51 mtu 3924 rtt 300 3ffe:8010:28::/48 via fe80::d4a0:bc01 dev bnet metric 1024 mtu 1480 rtt 300 Where this unreachable /48 could be coming from? Is it my naughty mrtd or what? Or could it be cached as such when the BGP session wasn't sending it yet? But why it stays when we finally got it from BGP? thanks R. -- Ale kto by my³ rêce po przywitaniu siê z mê¿em? - A. Fedorczyk From owner-netdev@oss.sgi.com Tue Apr 18 04:41:25 2000 Received: by oss.sgi.com id ; Tue, 18 Apr 2000 04:41:14 -0700 Received: from mailhostnew.tbit.dk ([194.182.135.150]:60920 "EHLO mailhostnew.tbit.dk") by oss.sgi.com with ESMTP id ; Tue, 18 Apr 2000 04:40:48 -0700 Received: from ric.tbit.dk (ric.tbit.dk [194.182.135.53]) by mailhostnew.tbit.dk (8.9.3+Sun/8.9.3) with ESMTP id NAA24481 for ; Tue, 18 Apr 2000 13:40:46 +0200 (MET DST) Received: (from ric@localhost) by ric.tbit.dk (8.9.3/8.9.3) id NAA01363; Tue, 18 Apr 2000 13:40:46 +0200 To: netdev@oss.sgi.com Subject: Non-fragmented ICMPv6 packets with an IPv6 fragment header From: "Richard =?iso-8859-1?q?J=F8rgensen?=" Reply-To: ric@tbit.dk Date: 18 Apr 2000 13:40:46 +0200 Message-ID: Lines: 137 User-Agent: Gnus/5.070098 (Pterodactyl Gnus v0.98) Emacs/20.3 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linux IPv6 stack seems to have a problem receiving ICMP6 packets, if the IPv6 packet contain a fragment header, but is not fragmentet (i.e. the entire packet is in _one_ fragment. It seems only to be a problem with ICMP6 - having an "unused" fragment header in a TCP-packet does not seem to give any problems. I have tested this with kernel 2.2.14 and 2.3.99-pre5, using ICMP6 echo-request and ICMP6 echo reply. To illustrate: +--------+ |IPv6 hdr+ +--------+ | ICMP6 + +--------+ Fig 1: A "normal" echo request, which is accepted: +-------------+ | IPv6 hdr + +-------------+ |Fragment hdr + +-------------+ | ICMP6 + +-------------+ Fig 2: A "one-fragment" echo request, which is *not* accepted: Now, before you scream "why on earth would you put a fragment header on a non-fragmented packet" i better explain my background. I'm writing a NAT-PT translator (RFC-2766) for the Telebit router, and the Protocol Translation part (defined in RFC-2765) defines translation of IP/ICMP in the following way: [...] IPv4 packets with DF not set will always result in a fragment header being added to the [IPv6] packet [...] In other words: The value of the DF (Dont't Fragment) bit in the IPv4 header is translated to the existence/non-existence of a fragment-header in IPv6. Now, when i send an echo-request through the NAT-PT, the following happens on linux (the full packets are included at the end of this mail): 1 0.000000 3ffe:110:0:1::c0a8:a842 -> 3ffe:110:0:1::c0a8:a835 ICMPv6 Echo request 2 6.456773 3ffe:110:0:1::c0a8:a835 -> 3ffe:110:0:1::c0a8:a842 ICMPv6 Time exceeded (Reassembly) My guess is that the following happens in Linux IPv6 stack: * Linux receives the echo-request. * Linux notes the fragmentation header, and calls a defragmentation routine * The defragmentation routine waits for more packets, without checking first is all fragments are already received. * Defragmentation times out, and send a ICMP6 Time exceeded. But then I don't know why IPv6-TCP is unaffected by one-piece packets with a fragmentation header. I hope someone on this list knows the IPv6 networking code well enough to find an explanation and hopefully a bugfix. /ric ************************************************************************** *** The following are the ICMP6 echo-request and Time exceeded packets *** ************************************************************************** Frame 1 (126 on wire, 126 captured) Arrival Time: Apr 18, 2000 10:27:19.3065 Time delta from previous packet: 0.000000 seconds Frame Number: 1 Packet Length: 126 bytes Capture Length: 126 bytes Ethernet II Destination: 00:10:4b:3d:d2:72 (Richard) Source: 00:c0:33:0c:00:16 (Telebit_0c:00:16) Type: IPv6 (0x86dd) Internet Protocol Version 6 Version: 6 Traffic class: 0x00 Flowlabel: 0x00000 Payload length: 72 Next header: IPv6 fragment (0x2c) Hop limit: 63 Source address: 3ffe:110:0:1::c0a8:a842 Destination address: 3ffe:110:0:1::c0a8:a835 IPv6 fragment Next header: ICMPv6 (0x3a) Fragment offset: 0 More fragments: Not set Identification: 0xea50 Internet Control Message Protocol v6 Type: 0x80 (Echo request) Checksum: 0xe989 ID: 0xb472 Sequence: 0x0000 Data (56 bytes) Frame 2 (174 on wire, 174 captured) Arrival Time: Apr 18, 2000 10:27:25.7633 Time delta from previous packet: 6.456773 seconds Frame Number: 2 Packet Length: 174 bytes Capture Length: 174 bytes Ethernet II Destination: 00:c0:33:0c:00:16 (Telebit_0c:00:16) Source: 00:10:4b:3d:d2:72 (Richard) Type: IPv6 (0x86dd) Internet Protocol Version 6 Version: 6 Traffic class: 0x00 Flowlabel: 0x00000 Payload length: 120 Next header: ICMPv6 (0x3a) Hop limit: 64 Source address: 3ffe:110:0:1::c0a8:a835 Destination address: 3ffe:110:0:1::c0a8:a842 Internet Control Message Protocol v6 Type: 0x03 (Time exceeded) Code: 0x01 (Reassembly) Checksum: 0xf792 Internet Protocol Version 6 Version: 6 Traffic class: 0x00 Flowlabel: 0x00000 Payload length: 72 Next header: IPv6 fragment (0x2c) Hop limit: 254 Source address: 3ffe:110:0:1::c0a8:a842 Destination address: 3ffe:110:0:1::c0a8:a835 Internet Control Message Protocol v6 Type: 0x81 (Echo reply) Checksum: 0xc3a0 ID: 0x9a03 Sequence: 0x0000 Data (56 bytes) -- Richard Jørgensen System Developer, M. Sc. Ericsson Telebit A/S Tel: +45 86 28 81 76 Fabrikvej 11 Fax: +45 86 28 81 86 DK-8260 Viby J, Denmark E-mail: ric@tbit.dk From owner-netdev@oss.sgi.com Tue Apr 18 12:40:18 2000 Received: by oss.sgi.com id ; Tue, 18 Apr 2000 12:39:58 -0700 Received: from sgigate.SGI.COM ([204.94.209.1]:26400 "EHLO gate-sgigate.sgi.com") by oss.sgi.com with ESMTP id ; Tue, 18 Apr 2000 12:39:54 -0700 Received: by lappi.waldorf-gmbh.de id ; Mon, 17 Apr 2000 22:49:23 -0700 Date: Mon, 17 Apr 2000 22:49:23 -0700 From: Ralf Baechle To: Jeremy Weatherford Cc: netdev@oss.sgi.com Subject: Re: Real Time Message-ID: <20000417224922.A709@uni-koblenz.de> References: <20000414114450Z305159-390+180@oss.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from xidus@xidus.net on Fri, Apr 14, 2000 at 02:39:56PM -0700 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, Apr 14, 2000 at 02:39:56PM -0700, Jeremy Weatherford wrote: > What on earth is going on? This is the second piece of spam I've gotten > that's sent through this list. Why isn't the list closed to subscribers > only? If you've got any issue with spam, then please deal with this offline from this list at netdev-owner@oss.sgi.com. (Second? Oh my god, the end is near ;-) Ralf From owner-netdev@oss.sgi.com Tue Apr 18 14:03:09 2000 Received: by oss.sgi.com id ; Tue, 18 Apr 2000 14:02:48 -0700 Received: from kogge.hanse.de ([192.76.134.17]:8196 "EHLO kogge.Hanse.DE") by oss.sgi.com with ESMTP id ; Tue, 18 Apr 2000 14:02:35 -0700 Received: (from uucp@localhost) by kogge.Hanse.DE (8.9.3/8.9.1) with UUCP id XAA33158 for netdev@oss.sgi.com; Tue, 18 Apr 2000 23:05:18 +0200 (CEST) (envelope-from eis@baty.hanse.de) Received: (from eis@localhost) by baty.hanse.de (8.9.3/8.9.3) id VAA30821; Tue, 18 Apr 2000 21:24:33 +0200 From: Henner Eisen Message-Id: <200004181924.VAA30821@baty.hanse.de> To: netdev@oss.sgi.com Subject: Re: PPPoE Driver For 2.3.99-pre5 References: <14586.32790.611994.174092@styx.uwaterloo.ca> Date: 17 Apr 2000 23:03:41 +0200 In-Reply-To: Michal Ostrowski's message of "Sun, 16 Apr 2000 23:08:06 -0400 (EDT)" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Michal" == Michal Ostrowski writes: Michal> Comments, feedback, patches and bug reports are welcome Michal> and encouraged. o.k.. I did not run it, just tried to look at the code. Thus, I did no verify whether the problems identified theoretically are really exploitable and I might be wrong with some comments. Anyway, the code looks clean to me and is fairly easy to follow. > >diff -r -N -u linux.orig/drivers/net/Config.in linux/drivers/net/Config.in >--- linux.orig/drivers/net/Config.in Sat Apr 15 15:58:21 2000 >+++ linux/drivers/net/Config.in Sat Apr 15 08:21:15 2000 > ^^^^^^^^^^^ As this is implemented as a new protocol family, I think it should be moved somewhere in network protocol directory, e.g. linux/net/pppox/. >diff -r -N -u linux.orig/drivers/net/ppp_generic.c linux/drivers/net/ppp_generic.c >--- linux.orig/drivers/net/ppp_generic.c Sat Apr 15 15:58:21 2000 >+++ linux/drivers/net/ppp_generic.c Sun Apr 16 18:55:48 2000 >@@ -2382,14 +2382,16 @@ > if (ppp != 0) { > /* remove it from the ppp unit's list */ > pch->ppp = NULL; >- ppp_lock(ppp); >+ ppp_lock(ppp); /* This disables bh twice */ > list_del(&pch->clist); > --ppp->n_channels; >- if (ppp->dev == 0 && ppp->n_channels == 0) >+ if (ppp->dev == 0 && ppp->n_channels == 0){ > /* Last disconnect from a ppp unit > that is already dead: free it. */ > kfree(ppp); >- else >+ local_bh_enable(); /* Must re-enable bh twice */ >+ local_bh_enable(); >+ }else > ppp_unlock(ppp); > err = 0; > } Well, this twice-enabling looks somewhat strange, but anyhow, that's probably just a preliminary work around for a ppp_genric problem you'e encontered. I have not further analyzed this, but maybe a possible solution would be to use an atomic_t usage counter for each ppp_channel which is incremented whenever a channel or a device is attached and the kfree triggered by an atomic_decrement_and_test() ... >diff -r -N -u linux.orig/drivers/net/pppoe.c linux/drivers/net/pppoe.c >--- linux.orig/drivers/net/pppoe.c Wed Dec 31 19:00:00 1969 >+++ linux/drivers/net/pppoe.c Sun Apr 16 20:24:51 2000 +/************************************************************************ + * Receive a PPPoE Session frame. + ***********************************************************************/ +static int pppoe_rcv(struct sk_buff *skb, + if ( sk->state & PPPOX_BOUND ) { + skb_pull(skb, sizeof(struct pppoe_hdr)); + + ppp_input(&po->chan, skb); + + }else{ + sock_queue_rcv_skb(sk, skb); Be aware of a possible memory leak if queueing failed here ... + } + return 1; >+ >+/*********************************************************************** >+ * Initialize a new struct sock. >+ **********************************************************************/ >+static int pppoe_create(struct socket *sock) >+{ >+ sk->type = SOCK_STREAM; SOCK_STREAM is probably a bad choice here because it does not fulfill POSIX sematics of a reliable byte stream. I think SOCK_DGRAM would be more appropriate here. >+ >+ /* Delete the protinfo when it is time to do so. */ >+ sk->protinfo.destruct_hook = sk->protinfo.pppox; This should be obsolete because sk->protinfo is a union and destruct_hook is just a synonym for pppox. >+int pppoe_release(struct socket *sock) >+{ >+ >+ skb_queue_purge(&sk->receive_queue); >+ >+ sock_put(sk); There seems to be no handler for defered socket destroy. This could cause problems, e.g. if tx skb's are still in the lower layer device queues while socket is destroyed, and those skb's still hold a referece to that socket. This will currently not be a problem with the ppp tx frames because you never account them to the socket. But it might hit you if you queue frames from user space and than suddenly close the socket. Maybe this explains the kernel crashes that you have reported with earlier versions of the driver? >+/************************************************************************ >+ * >+ * xmit function called by generic PPP driver >+ * sends PPP frame over PPPoE socket >+ * >+ ***********************************************************************/ >+int pppoe_xmit(struct ppp_channel *chan, struct sk_buff *skb) >+{ >+ >+ dev_queue_xmit(skb); >+ >+ /* Ready to xmit next packet --- or should the dev queue be checked >+ * to make sure it isn't full? >+ */ >+ >+ /* New ppp_generic code really doesn't like this... >+ ppp_output_wakeup(chan); >+ */ Just wondering: you completely bypass socket (wmem accounting) based flow control here (you never set skb ownership for pppoe_xmit sk_buffs). Subsequently, you cannot use this to trigger ppp_generic flow control and your ppp channel is always willing to accept frames. Thus, I think if you send to fast, packets will be lost. tcp will probably realize this and adjust transfer speed, but how do other protocols behave? BTW: in 2.3.x, dev_queue_xmit() returns a meaningful int indicating whether queueing was successfull. I guess you could easily use this for triggering ppp_generic flow control - no need for checking the dev queue explicitly. Maybe this is sufficient for flow control and you really can (as you already do) avoid socket wmem accounting based flow control. (but then the harder task will be to wake up the channel again) Henner From owner-netdev@oss.sgi.com Tue Apr 18 14:03:18 2000 Received: by oss.sgi.com id ; Tue, 18 Apr 2000 14:03:09 -0700 Received: from kogge.hanse.de ([192.76.134.17]:9220 "EHLO kogge.Hanse.DE") by oss.sgi.com with ESMTP id ; Tue, 18 Apr 2000 14:02:52 -0700 Received: (from uucp@localhost) by kogge.Hanse.DE (8.9.3/8.9.1) with UUCP id XAA33161; Tue, 18 Apr 2000 23:05:32 +0200 (CEST) (envelope-from eis@baty.hanse.de) Received: (from eis@localhost) by baty.hanse.de (8.9.3/8.9.3) id WAA30962; Tue, 18 Apr 2000 22:50:10 +0200 Date: Tue, 18 Apr 2000 22:50:10 +0200 From: Henner Eisen Message-Id: <200004182050.WAA30962@baty.hanse.de> To: hadi@cyberus.ca CC: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com In-reply-to: (message from jamal on Mon, 17 Apr 2000 22:48:22 -0400 (EDT)) Subject: Re: PPPOE Was (Re: >=pre5 OOPS on boot failure to open /dev/console References: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "jamal" == jamal writes: jamal> If i understand you correctly you are just tunneling data, jamal> it just happens to be ppp, right? What about jamal> call/connection setup, negotiation etc? If this is jamal> irrelevant then i agree with you that it doesnt matter what jamal> you use. call/connection setup will be done by protocol's standard mechanisms. e.g. a user space process will do a connect() or accept() on a socket and then do a PPPIOCATTCH ioctl in order to attach the data path to a ppp channel (or somthing similar in order to attach it to a tunnel device). jamal> What about connection setup/teardown/general control? We jamal> already have pppd which suffices for the ppp jamal> negotiations. Most of this protocols have their own jamal> negotiations before they start ppp setup. Yes, the framework exactly follows that paradigm. It only provides the functionality needed to attach the connected socket's data path to a ppp channel. All other work is left to the existing ppp_generic module (with its co-worker pppd). I have not dived into the details of the new pppd plugin mechanism. But I hope that making a plugin which does a connect() on socket when pppd wants to open a ppp connection on top of the `carrier' protocol is feasible. >> Existing network protocol stacks differ in various areas >> (e.g. which parts of the protocol processing need process >> context, how can ppp_channel flow control be interfaced to the >> carrier protocol's flow control mechanism). jamal> Flow control/setup is the slow path of the whole jamal> transaction. Naturaly it makes a lot of sense to move this jamal> part out of the kernel because it tends to be rich, adds jamal> tons of code to the kernel and might be subject to frequent jamal> changes. The interfacing to the "carrier protocol's" flow jamal> control mechanism is done outside (in user space). The jamal> connect() and disconnect() pppd hooks for example tie to jamal> the "carrier protocol's" connection setup and teardown. I'm not sure what you mean by 'flow control', but it seems that we have different things in mind when talking about flow control. Of course, the end user's process, which has e.g. an open tcp connection which just happens to be routed over the ppp connection, will be flow controlled by means of the standard kernels mechanisms (the ppp / tunnel layer is not even aware of this). What I was thinking about was the low (device)-layer flow, which is controlled by netif_{start,stop,wake}_queue() for linux network devices or ppp_output_wakeup() for generic ppp_channels. E.g. X.25 (the same holds for most connection oriented sockets) uses a sliding window mechanism. If the send window is full, then we are not allowed to send further frames to the peer. Thus, we should do a netif_stop_queue() for a network device tunnel interface or return 'busy' from our ppp_channel's ppp_start_xmit() method. And likewise, we want to do a netif_wake_queue() or a ppp_output_wakeup when there is space in the send window again. It's that kind of flow control which I want to support. Of course we could also just discard any tx packet while the send window is full. But this will likly result in worse performance. It's probably better to flow control the upper (net_device tunnel or ppp_generic channel), because those upper layers can be much smarter about what to do with the packet which we temporarily cannot accept for transmission. I don't see the framework as a competitor to the AF_PPPOX project. The latter is appropriate for implementing special ppp encapsulations in a very efficient/straight manner. My intended framework primarily focuses on existing, mainly connection oriented, protocol families which are usally used directly by user space processes (accessed via a socket interface). If the protocol maintainer intends to use this protocol as carrier for ppp frames (or to directly tunnel ip over it), then the framework aims at making the implementation easier and sharing some common code amoung different protocol families. Henner From owner-netdev@oss.sgi.com Wed Apr 19 05:41:22 2000 Received: by oss.sgi.com id ; Wed, 19 Apr 2000 05:41:13 -0700 Received: from twilight.cs.hut.fi ([130.233.40.5]:36631 "EHLO twilight.cs.hut.fi") by oss.sgi.com with ESMTP id ; Wed, 19 Apr 2000 05:40:52 -0700 Received: (from localhost user: 'skivisaa' uid#25309 fake: STDIN (skivisaa@jolly.cs.hut.fi)) by mail.niksula.cs.hut.fi id ; Wed, 19 Apr 2000 15:40:29 +0300 Date: Wed, 19 Apr 2000 15:40:29 +0300 From: Sami Sakari Kivisaari To: netdev@oss.sgi.com Subject: Bugs in IPv6 code Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, We are writing this to report two bugs we have found from the Linux 2.3.99-pre5 IPv6 code. The first one relates to incorrect ICMP behaviour if extension headers are present (e.g. fragment header or destination options header). The second one considers an incorrect way to free a skb. Patches are attached and at least they work fine with our test cases. Bug 1. ====== In raw.c (icmpv6_filter), the following expression expects that ICMP header immediately follows the IPv6 header, which obviously is not always the case: icmph = (struct icmp6hdr *) (skb->nh.ipv6h + 1); a more correct expression would be: icmph = (struct icmp6hdr *) skb->h; Bug 2. ====== In ip6_output.c (ip6_xmit), kfree is used to free a skb. The correct way would obviously be to use kfree_skb. Sami Kivisaari === diff -urN v2.3.99-pre5/net/ipv6/ip6_output.c linux/net/ipv6/ip6_output.c --- v2.3.99-pre5/net/ipv6/ip6_output.c Thu Mar 2 21:41:11 2000 +++ linux/net/ipv6/ip6_output.c Wed Apr 19 15:30:27 2000 @@ -200,7 +200,7 @@ if (skb_headroom(skb) < head_room) { struct sk_buff *skb2 = skb_realloc_headroom(skb, head_room); - kfree(skb); + kfree_skb(skb); skb = skb2; if (skb == NULL) return -ENOBUFS; diff -urN v2.3.99-pre5/net/ipv6/raw.c linux/net/ipv6/raw.c --- v2.3.99-pre5/net/ipv6/raw.c Mon Feb 28 04:45:10 2000 +++ linux/net/ipv6/raw.c Fri Apr 14 21:02:30 2000 @@ -115,7 +115,7 @@ struct raw6_opt *opt; opt = &sk->tp_pinfo.tp_raw; - icmph = (struct icmp6hdr *) (skb->nh.ipv6h + 1); + icmph = (struct icmp6hdr *) skb->h.raw; return test_bit(icmph->icmp6_type, &opt->filter); } From owner-netdev@oss.sgi.com Wed Apr 19 07:42:33 2000 Received: by oss.sgi.com id ; Wed, 19 Apr 2000 07:42:24 -0700 Received: from ns1.research.bell-labs.com ([204.178.16.6]:54543 "HELO dirty.research.bell-labs.com") by oss.sgi.com with SMTP id ; Wed, 19 Apr 2000 07:42:07 -0700 Received: from scummy.research.bell-labs.com ([135.104.2.10]) by dirty; Wed Apr 19 10:41:04 EDT 2000 Received: from aura.research.bell-labs.com ([135.104.46.10]) by scummy; Wed Apr 19 10:41:03 EDT 2000 Received: from research.bell-labs.com (IDENT:root@1123pc5.research.bell-labs.com [135.104.46.218]) by aura.research.bell-labs.com (8.9.1/8.9.1) with ESMTP id KAA07184 for ; Wed, 19 Apr 2000 10:41:03 -0400 (EDT) Message-ID: <38FDC660.7342C14A@research.bell-labs.com> Date: Wed, 19 Apr 2000 10:44:48 -0400 From: Sumit Garg Organization: Lucent Bell Labs X-Mailer: Mozilla 4.51 [en] (X11; I; Linux 2.2.12 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: sk_buff resizing Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello, I am trying to modify the contents of a well formed sk_buff in a function called from dev_queue_xmit() with properly filled in sk_buff. I may need to add or delete or simply overwrite certain portions of the data in skbuff buffer. Now .. If I plainly overwrite a chunk of data in buffer , and update the checksums of TCP, IP headers.. things work fine. However if I try to ADD new data or DELETE... the same sk_buff keeps transmitting.. though it gets acks By the way, I am trying to modify sk_buffs generated by a specific telnet session. and once I do the modifications.. that telnet session hangs up. (though I can start new telnet connections) Any insight? The code: diff the the amount by which I need to change the size. I need to replace the data at (o_buf) of length (o_len) in sk_buff (skb) by new data in a buffer pointed to by n_buf of length n_len. o_offset=o_buf - (char*) skb->data; skb_len=skb->len; if (skb_tailroom(skb)>=diff) { if (diff>0) { // grow skb_put(skb,diff); memmove(o_buf + n_len, o_buf + o_len, skb_len - (o_offset + o_len) ); memmove(o_buf, n_buf, n_len); } else { // shrink memmove(o_buf+n_len, o_buf+o_len, skb->len - (o_offset + o_len)); skb_trim(skb,skb->len+diff); memmove(o_buf,n_buf,n_len); } Thanks Sumit From owner-netdev@oss.sgi.com Wed Apr 19 10:49:23 2000 Received: by oss.sgi.com id ; Wed, 19 Apr 2000 10:49:13 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:2575 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 19 Apr 2000 10:48:56 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA06745; Wed, 19 Apr 2000 21:47:57 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004191747.VAA06745@ms2.inr.ac.ru> Subject: Re: Bugs in IPv6 code To: skivisaa@niksula.HUt.FI (Sami Sakari Kivisaari) Date: Wed, 19 Apr 2000 21:47:57 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Sami Sakari Kivisaari" at Apr 19, 0 05:13:18 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 395 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > We are writing this to report two bugs we have found from the Linux > 2.3.99-pre5 IPv6 code. The first one relates to incorrect ICMP behaviour > if extension headers are present (e.g. fragment header or destination > options header). The second one considers an incorrect way to free a skb. > Patches are attached and at least they work fine with our test cases. Thank you! Alexey From owner-netdev@oss.sgi.com Wed Apr 19 12:24:14 2000 Received: by oss.sgi.com id ; Wed, 19 Apr 2000 12:24:04 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:28433 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 19 Apr 2000 12:23:45 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA07126; Wed, 19 Apr 2000 22:29:26 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004191829.WAA07126@ms2.inr.ac.ru> Subject: Re: Non-fragmented ICMPv6 packets with an IPv6 fragment header To: ric@tbit.DK Date: Wed, 19 Apr 2000 22:29:26 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Richard =?iso-8859-1?q?J=F8rgensen?=" at Apr 18, 0 04:13:36 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 505 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Linux IPv6 stack seems to have a problem receiving ICMP6 packets, if > the IPv6 packet contain a fragment header, but is not fragmentet > (i.e. the entire packet is in _one_ fragment. Yes... They are never "reassembled". Thank you, it will be fixed soon. > It seems only to be a problem with ICMP6 - having an "unused" fragment > header in a TCP-packet does not seem to give any problems. It is impossible. Please, check this more creafully. Probably, it indicates some another bug. Alexey From owner-netdev@oss.sgi.com Wed Apr 19 18:26:59 2000 Received: by oss.sgi.com id ; Wed, 19 Apr 2000 18:26:39 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:53738 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Wed, 19 Apr 2000 18:26:22 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id VAA04623; Wed, 19 Apr 2000 21:25:56 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id VAA10403; Wed, 19 Apr 2000 21:25:56 -0400 (EDT) Date: Wed, 19 Apr 2000 21:25:56 -0400 (EDT) From: jamal To: Henner Eisen cc: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: PPPOE Was (Re: >=pre5 OOPS on boot failure to open /dev/console In-Reply-To: <200004182050.WAA30962@baty.hanse.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 18 Apr 2000, Henner Eisen wrote: > call/connection setup will be done by protocol's standard mechanisms. > e.g. a user space process will do a connect() or accept() on a socket > and then do a PPPIOCATTCH ioctl in order to attach the data path to > a ppp channel (or somthing similar in order to attach it to a tunnel device). > Ok. So same thing as in pppox. > I'm not sure what you mean by 'flow control', but it seems that we > have different things in mind when talking about flow control. Of course, Indeed we do. > the end user's process, which has e.g. an open tcp connection which just > happens to be routed over the ppp connection, will be flow controlled by means > of the standard kernels mechanisms (the ppp / tunnel layer is not even > aware of this). > > What I was thinking about was the low (device)-layer flow, which > is controlled by netif_{start,stop,wake}_queue() for linux > network devices or ppp_output_wakeup() for generic ppp_channels. Ok so now i understand you ;-> > E.g. X.25 (the same holds for most connection oriented sockets) uses > a sliding window mechanism. If the send window is full, then we are > not allowed to send further frames to the peer. Thus, we should do > a netif_stop_queue() for a network device tunnel interface or return 'busy' > from our ppp_channel's ppp_start_xmit() method. And likewise, we want > to do a netif_wake_queue() or a ppp_output_wakeup when there is space > in the send window again. It's that kind of flow control which I want to > support. > > Of course we could also just discard any tx packet while the send window > is full. But this will likly result in worse performance. It's > probably better to flow control the upper (net_device tunnel or > ppp_generic channel), because those upper layers can be much smarter > about what to do with the packet which we temporarily cannot accept for > transmission. > I wonder if Alexey is reading this ;-> (ok there, he is cc'ed now ;-> ) So how are you sending the feedback all the way to the transport protocol? Say, TCP where it might be really useful to distinguish between local congestion vs "somewhere along the end2end path" congestion; I havent looked at your code but i suspect you are using the NET_XMIT_* codes as the source of your information about the local congestion. And the big question is: so what to do when you get this information? And if you were to throttle, for how long? etc etc... I think it is a good idea, but you need to be bullet proof ... cheers, jamal From owner-netdev@oss.sgi.com Thu Apr 20 14:01:01 2000 Received: by oss.sgi.com id ; Thu, 20 Apr 2000 14:00:41 -0700 Received: from kogge.hanse.de ([192.76.134.17]:2311 "EHLO kogge.Hanse.DE") by oss.sgi.com with ESMTP id ; Thu, 20 Apr 2000 14:00:37 -0700 Received: (from uucp@localhost) by kogge.Hanse.DE (8.9.3/8.9.1) with UUCP id XAA10776; Thu, 20 Apr 2000 23:03:08 +0200 (CEST) (envelope-from eis@baty.hanse.de) Received: (from eis@localhost) by baty.hanse.de (8.9.3/8.9.3) id UAA03950; Thu, 20 Apr 2000 20:59:53 +0200 Date: Thu, 20 Apr 2000 20:59:53 +0200 From: Henner Eisen Message-Id: <200004201859.UAA03950@baty.hanse.de> To: hadi@cyberus.ca CC: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru In-reply-to: (message from jamal on Wed, 19 Apr 2000 21:25:56 -0400 (EDT)) Subject: Re: PPPOE Was (Re: >=pre5 OOPS on boot failure to open /dev/console References: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "jamal" == jamal writes: jamal> cc'ed now ;-> ) So how are you sending the feedback all the jamal> way to the transport protocol? Say, TCP where it might be The trick is, I don't :-). Well, of course I do, but not directly to tcp but only to the well-defined interface. The idea is to use the same measure for local `congestion' as is also used to flow control socket user-space applciations -- using the socket's wmem account. The outline is as follows: ppp->start_xmit() or dev->hard_start_xmit() first checks for the lower layer if(socket sock_wspace(sk) == 0). If there is no write space, then the frame is rejected. Clearing a busy condition is done by hooking into the the sk->write_space() callback. This needs to perform the same check as the xmit methods above, and if successful, the busy conditions is cleared by netif_wake_queue() or ppp_output_wakeup(). jamal> really useful to distinguish between local congestion vs jamal> "somewhere along the end2end path" congestion; I havent jamal> looked at your code but i suspect you are using the I hav'nt uploaded it yet because I have not done any testing yet. Hopefully, in a few days I can do it. Henner From owner-netdev@oss.sgi.com Fri Apr 21 08:55:52 2000 Received: by oss.sgi.com id ; Fri, 21 Apr 2000 08:55:32 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:44468 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 21 Apr 2000 08:55:31 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id LAA25558; Fri, 21 Apr 2000 11:55:06 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id LAA13807; Fri, 21 Apr 2000 11:55:05 -0400 (EDT) Date: Fri, 21 Apr 2000 11:55:04 -0400 (EDT) From: jamal To: linux-kernel@vger.rutgers.edu cc: netdev@oss.sgi.com, tytso@MIT.EDU, hans@grumbeer.inka.de, linux-kernel@vger.rutgers.edu, fusion94@valinux.com, tytso@MIT.EDU, marc@mbsi.ca Subject: P-MTU discovery Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing You know, it would be really nice to have networking related issues being discussed on netdev instead ... [tytso's nice description of pmtu blackholes deleted] tytso>access to the network via your singleton PPP connection. It also tytso>comes up with those folks using DSL where the providers are using tytso>the bomination also known as PPP over Ethernet (PPPOE). I will have to disagree with you on this one, Ted. PPPOE is a great simple protocol. But i think this is out of context of this discussion. tytso> Could anyone please explain this? Is there a better solution than tytso> disabling MTU discovery? ... ... tytso>The final approach, which apparently some of the DSL providers tytso>use, is that on the cable modem box or on the DSLAM in the telco's tytso> central office, they are actively messing with the outgoing tytso> packets, by looking for TCP packets with the SYN bit set (which tytso> indicates the beginning of a TCP stream), and change the max MSS tytso> option in the IP header to be smaller than max MTU caused by the tytso> PPP over Ethernet overhead. This is incredibly ugly, and violates tytso> the IP protocol's end-to-end argument. It also breaks in the tytso> presence of IPSEC, since the DSL provider won't be able to muck tytso> with the packet without breaking the cryptographic checksums. I dont know whether telcos are already doing this, but we certainly are in Linux. I point the finger to Marc Boucher. He did it! The reason is very simple: NAT that good old friend of IPSEC. When you have lotsa boxes that you are masquareding for it is hell to go around and start changing their MTU values or doing any sort of per-box changes. Disabling PMTU at the masquareding box also doesnt help because PPPOE adds an extra shim header to the packet. It will break IPSEC in most cases (maybe not in the case where your masquareding box is also your IPSEC gateway). >From a philosophical angle: there is no panacea for these kind of problems. I wonder how long youve been chasing them. You will continously chase people to try and fix things for IPSEC's sake ;-> I wonder how you plan to deal with all those "content switching" startups (since that is the greatest thing since sliced bread these days). Is the end2end arguement really a dead horse? (I am ducking ahead of time). Maybe what the IETF needs is to take alls chairs into some end2end non-breakage indoctrination and give them a qualifying test first. Having said that, there could be an alternative solution in Linux. The PPPOE code could be made, after dropping the packet, to generate ICMP "too big" messages back to the masquareded boxes instead (when packet-size >PMTU-shim_header). Hopefully, the win* boxes know what to do with these messages. And this will work also for UDP. Marc? Now if the telcos are doing this (ouch!) how are you planning to dissuade them? cheers, jamal From owner-netdev@oss.sgi.com Fri Apr 21 12:37:22 2000 Received: by oss.sgi.com id ; Fri, 21 Apr 2000 12:37:03 -0700 Received: from TSX-PRIME.MIT.EDU ([18.86.0.76]:63374 "HELO tsx-prime.MIT.EDU") by oss.sgi.com with SMTP id ; Fri, 21 Apr 2000 12:36:34 -0700 Received: by tsx-prime.MIT.EDU with sendmail-SMI-8.6/1.2, id PAA28200; Fri, 21 Apr 2000 15:35:58 -0400 Date: Fri, 21 Apr 2000 15:35:58 -0400 Message-Id: <200004211935.PAA28200@tsx-prime.MIT.EDU> From: "Theodore Y. Ts'o" To: jamal CC: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com, hans@grumbeer.inka.de, linux-kernel@vger.rutgers.edu, fusion94@valinux.com, marc@mbsi.ca In-reply-to: jamal's message of Fri, 21 Apr 2000 11:55:04 -0400 (EDT), Subject: Re: P-MTU discovery Phone: (781) 391-3464 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Fri, 21 Apr 2000 11:55:04 -0400 (EDT) From: jamal I dont know whether telcos are already doing this, but we certainly are in Linux. I point the finger to Marc Boucher. He did it! Ah, I hadn't realized someone had done it already. Is it in ipchains? The reason is very simple: NAT that good old friend of IPSEC. When you have lotsa boxes that you are masquareding for it is hell to go around and start changing their MTU values or doing any sort of per-box changes. Actually, the hack is useful even if you're not doing NAT; any time you have a configuration where you have a gateway box which is doing some kind of tunnelling (either PPPOE or IP-IP or something else), and you have lots of client machines behind the tunnel end-pointing, making lots of per-box changes a pain. If you're using dhcp, something you can do to avoid having to change all of the boxes one at a time is to set the interface-mtu using dhcp to 1400 or 1450. The disadvantage of doing this is that *all* packets get sent with the restricted MTU, not just ones going out through the tunnel/gateway. (You'd really like to be able to set a per-route MSS, but dhcp doesn't appear to have a way of doing that right now.) Disabling PMTU at the masquareding box also doesnt help because PPPOE adds an extra shim header to the packet. It will break IPSEC in most cases (maybe not in the case where your masquareding box is also your IPSEC gateway). Right; that that's the problem; PPPOE, because it adds a shim header, constricts the link MTU, and so you need to do PMTU discovery at the endpoints. And in either case, doing PMTU doesn't help if you have something in the path which is filtering the ICMP messages. From a philosophical angle: there is no panacea for these kind of problems. I wonder how long youve been chasing them. You will continously chase people to try and fix things for IPSEC's sake ;-> I wonder how you plan to deal with all those "content switching" startups (since that is the greatest thing since sliced bread these days). Is the end2end arguement really a dead horse? (I am ducking ahead of time). Maybe what the IETF needs is to take alls chairs into some end2end non-breakage indoctrination and give them a qualifying test first. Here's the problem. End2end is great design principle, but it fundamentally assumes that the intelligence is at the endpoints, and the middle of the network isn't supposed to do anything special/magical. But as the internet gets bigger and bigger, trying to change all of the endpoints to add security, or to handle paths with long latencies efficiently, gets harder and harder. And so, it gets easier to make changes in the middle of the network. And most of the (to use Rusty's phrase) "packet fucking" techniques come from this dilemma: NAT's (easier than IPV6), firewalls (easier than doing real end-point security), tcp ack spoofing (easier than upgrading Windows TCP stacks to make them work correctly over satellite links), etc. One could argue that by violating the IP architecture, they're engaging in hill-climbing optimizations that in the long-run will cause someone a lot of pain. Some things simply won't work if you play such games, and as long as you acknowledge that fact, use them in good health. So I've used NAT's before, even though I think that fundamentally they're evil, because it solved the limited problem I needed to solve at the time. But I didn't consider them first class objects, but treated them rather as kludges. So if things broke because of the NAT, I knew it was coming to me, and I would deal. One of the ways I dealt was to get myself a /27 at home, but I realize not everyone can get that. The problem is that more and more users are using things like NAT's and MSS adjusters, etc., and they don't understand that they're kludges. So when other protocols start breaking, they blame those other protocols instead of correctly placing the blame where it belongs. Having said that, there could be an alternative solution in Linux. The PPPOE code could be made, after dropping the packet, to generate ICMP "too big" messages back to the masquareded boxes instead (when packet-size >PMTU-shim_header). Hopefully, the win* boxes know what to do with these messages. And this will work also for UDP. Marc? That doesn't help. We're doing this today already; it's required by the RFC's, after all. The problem is that the sender of the big packet has to receive the ICMP, and if there's something filtering the ICMP message, you're stuck. - Ted From owner-netdev@oss.sgi.com Fri Apr 21 13:24:43 2000 Received: by oss.sgi.com id ; Fri, 21 Apr 2000 13:24:22 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:57737 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 21 Apr 2000 13:24:09 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id QAA22391; Fri, 21 Apr 2000 16:23:52 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id QAA15140; Fri, 21 Apr 2000 16:23:53 -0400 (EDT) Date: Fri, 21 Apr 2000 16:23:53 -0400 (EDT) From: jamal To: "Theodore Y. Ts'o" cc: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com, hans@grumbeer.inka.de, linux-kernel@vger.rutgers.edu, fusion94@valinux.com, marc@mbsi.ca Subject: Re: P-MTU discovery In-Reply-To: <200004211935.PAA28200@tsx-prime.MIT.EDU> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 21 Apr 2000, Theodore Y. Ts'o wrote: > Date: Fri, 21 Apr 2000 11:55:04 -0400 (EDT) > From: jamal > > I dont know whether telcos are already doing this, but we certainly are in > Linux. I point the finger to Marc Boucher. He did it! > > Ah, I hadn't realized someone had done it already. Is it in ipchains? > Both ipchains and netfilter: http://www.davin.ottawa.on.ca/pppoe/ I know it is also a separate package in netwinder.org somewhere look for something with "mssfwclamp" on this pppoed packaging. > The reason is very simple: NAT that good old friend of IPSEC. > When you have lotsa boxes that you are masquareding for it is hell to go > around and start changing their MTU values or doing any sort of per-box > changes. > > Actually, the hack is useful even if you're not doing NAT; any time you > have a configuration where you have a gateway box which is doing some > kind of tunnelling (either PPPOE or IP-IP or something else), and you > have lots of client machines behind the tunnel end-pointing, making lots > of per-box changes a pain. > > Here's the problem. End2end is great design principle, but it > fundamentally assumes that the intelligence is at the endpoints, and the > middle of the network isn't supposed to do anything special/magical. > But as the internet gets bigger and bigger, trying to change all of the > endpoints to add security, or to handle paths with long latencies > efficiently, gets harder and harder. It gets easier when some big end system boy does it (as in the re-birth of RSVP). Just a std disclaimer these are my own personal comments and have nothing whatsoever to do with my employer. > And so, it gets easier to make > changes in the middle of the network. And most of the (to use Rusty's > phrase) "packet fucking" techniques come from this dilemma: NAT's > (easier than IPV6), firewalls (easier than doing real end-point > security), tcp ack spoofing (easier than upgrading Windows TCP stacks to > make them work correctly over satellite links), etc. I think protocol layering violation will continue for a long time because of these kludges which are a result of fixes for "immediate problems". The solutions can be deployed faster. Put a box infront of all these end systems and they dont have to know anything about it. And these boxes stay forever and then it becomes quiet a simple rule: If it aint broken dont fix it. NAT might really delay IPV6 for example. So i think, maybe instead of preaching end2end principles its best to preach to protocol authors to be on the lookout for these kind of hacks. You are not gonna stop all those "content switching/application routing" startups by preaching religion. They are out there to make a lot of money. > > Having said that, there could be an alternative solution in Linux. The > PPPOE code could be made, after dropping the packet, to generate ICMP "too > big" messages back to the masquareded boxes instead (when packet-size > >PMTU-shim_header). Hopefully, the win* boxes know what to do with these > messages. And this will work also for UDP. Marc? > > That doesn't help. We're doing this today already; it's required by the > RFC's, after all. The problem is that the sender of the big packet has > to receive the ICMP, and if there's something filtering the ICMP > message, you're stuck. True. I stand corrected. Forget what i said, Marc. cheers, jamal From owner-netdev@oss.sgi.com Fri Apr 21 22:05:04 2000 Received: by oss.sgi.com id ; Fri, 21 Apr 2000 22:04:54 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:35854 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Fri, 21 Apr 2000 22:04:35 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id BAA31354; Sat, 22 Apr 2000 01:06:49 -0500 Message-ID: <20000422010649.E31298@doit.wisc.edu> Date: Sat, 22 Apr 2000 01:06:49 -0500 From: "James R. Leu" To: Sumit Garg , netdev@oss.sgi.com Subject: Re: sk_buff resizing Reply-To: jleu@mindspring.com References: <38FDC660.7342C14A@research.bell-labs.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 In-Reply-To: <38FDC660.7342C14A@research.bell-labs.com>; from Sumit Garg on Wed, Apr 19, 2000 at 10:44:48AM -0400 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I have to do a very similar process in my MPLS for Linux implementation. Here is a solution for 2.3.99-pre5. If anyone sees a better way to do this (or errors in my implementation) please let me know. This has been taken out of it's context so code has been added to help clarify. The goal is to add a 4 byte shim to head of an SKB (take note that I cannot take full credit/blame for this code :-) int mpls_opcode_push(struct sk_buff **skb, more unrelated parameters) { struct sk_buff *orig_skb = *skb; struct sk_buff *new_skb = NULL; /* * you'll notice that I have added a field to the skb_buf to keep track of * any space I create due to other MPLS related activities */ if(orig_skb->mpls_gap >= sizeof(u32)) { /* * if we have room between data and end of mac.raw * just shift the data,n.raw,nh.raw pointers and use the room * this would happen if we had a pop previous to this */ printk("%s: using gap\n",fn_name); skb_push(orig_skb,sizeof(u32)); orig_skb->h.raw -= sizeof(u32); orig_skb->nh.raw -= sizeof(u32); orig_skb->mpls_gap -= sizeof(u32); } else if((orig_skb->end - orig_skb->tail) > sizeof(u32)) { /* * if we have tailroom, just move tha data down enough room for * the shim */ printk("%s: using tailroom\n",fn_name); memmove(orig_skb->data+sizeof(u32),orig_skb->data,orig_skb->len); orig_skb->len += sizeof(u32); printk("%s: done using tailroom\n",fn_name); } else { /* * we have no room in the inn, go ahead and create a new sk_buff * with enough extra room for one shim */ printk("%s: creating larger packet\n",fn_name); new_skb = alloc_skb(orig_skb->truesize + sizeof(u32),GFP_ATOMIC); if(new_skb == NULL) return -ENOMEM; /* I hate hard-coding numbers like this: Maybe use mpls_gap Revist --JHS */ skb_reserve(new_skb,16); skb_put(new_skb,sizeof(u32)+orig_skb->len); memmove(new_skb->data+sizeof(u32), orig_skb->data, orig_skb->len); /* * this is what skb_grow does */ copy_skb_header(new_skb,orig_skb); kfree_skb(orig_skb); orig_skb = *skb = new_skb; } I hope this helps. Jim On Wed, Apr 19, 2000 at 10:44:48AM -0400, Sumit Garg wrote: > Hello, > I am trying to modify the contents of a well formed sk_buff in a > function called from > dev_queue_xmit() with properly filled in sk_buff. > I may need to add or delete or simply overwrite certain portions of the > data in skbuff buffer. > Now .. If I plainly overwrite a chunk of data in buffer , and update the > checksums of TCP, IP headers.. things work fine. > However if I try to ADD new data or DELETE... the same sk_buff keeps > transmitting.. though it gets acks > > By the way, I am trying to modify sk_buffs generated by a specific > telnet session. and once I do the modifications.. that telnet session > hangs up. (though I can start new telnet connections) > > Any insight? > > The code: diff the the amount by which I need to change the size. I > need to replace the data at > (o_buf) of length (o_len) in sk_buff (skb) by new data in a buffer > pointed to by n_buf of length n_len. > > o_offset=o_buf - (char*) skb->data; > skb_len=skb->len; > > if (skb_tailroom(skb)>=diff) { > if (diff>0) { // grow > skb_put(skb,diff); > memmove(o_buf + n_len, o_buf + o_len, > skb_len - (o_offset + o_len) ); > memmove(o_buf, n_buf, n_len); > } else { // shrink > memmove(o_buf+n_len, o_buf+o_len, skb->len - (o_offset + > o_len)); > skb_trim(skb,skb->len+diff); > memmove(o_buf,n_buf,n_len); > } > > > > > Thanks > Sumit > -- James R. Leu From owner-netdev@oss.sgi.com Mon Apr 24 06:09:11 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 06:09:01 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:62905 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 06:08:34 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch1.nortel.com; Mon, 24 Apr 2000 08:07:40 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 269WVTT6; Mon, 24 Apr 2000 08:07:31 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2P4Y11F0; Mon, 24 Apr 2000 23:07:35 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.19]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id XAA03270 for ; Mon, 24 Apr 2000 23:07:33 +1000 Message-ID: <39044711.B95BE0DE@uow.edu.au> Date: Mon, 24 Apr 2000 23:07:29 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: Hardware IP checksums Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I spotted this discussion on the main kernel list: -------- Original Message -------- Subject: lockless poll() (was Re: namei() query) Date: Mon, 24 Apr 2000 21:36:00 +0900 From: kumon@flab.fujitsu.co.jp Reply-To: kumon@flab.fujitsu.co.jp To: Linus Torvalds CC: Manfred Spraul , kumon@flab.fujitsu.co.jp,linux-kernel@vger.rutgers.edu,kumon@flab.fujitsu.co.jp References: ,<39007DC5.44C147C4@colorfullife.com> ... In the heavy duty case, csum_partial_copy_generic() becomes the new winner of the worst time consuming function with the poll() optimization. We are arranging the global figure now. Though csum_partial_copy_generic() is highly optimized with hand-crafted code, it eats lots of time. It may be inevitable, but may be reducible. We are now investigating why it does. Has much thought been given to using hardware checksums on transmit? If someone could sketch out how it should be architected I'll give it a shot. From owner-netdev@oss.sgi.com Mon Apr 24 08:45:42 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 08:45:22 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:24339 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 24 Apr 2000 08:45:09 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA08437; Mon, 24 Apr 2000 19:44:44 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004241544.TAA08437@ms2.inr.ac.ru> Subject: Re: Hardware IP checksums To: andrewm@uow.EDU.AU (Andrew Morton) Date: Mon, 24 Apr 2000 19:44:44 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <39044711.B95BE0DE@uow.edu.au> from "Andrew Morton" at Apr 24, 0 06:13:11 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 914 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Has much thought been given to using hardware checksums on transmit? It appeared pretty useless. Jes noticed this first time, I had to check this experimentally, and yes... No visible improvements. HW checksum on transmit looks really not useful without zero copy. > If someone could sketch out how it should be architected I'll give it a > shot. Device exports a flag, telling that it is able to checksum in hardware. If IP sees it, it checksums only headers (TCP, IP and MAC, if the last is added by IP) preparing partial checksum and mark packet with ip_summed: - CHECKSUM_UNNECESSARY --- protocol did full checksum. - CHECKSUM_HW --- protocol checksummed only headers. Well, that's all... Driver needs only to program itself depending on ip_summed. Only offset to put checksum should be passed to, probably via ip_summed too with negative values flagging completed software checksum. Alexey From owner-netdev@oss.sgi.com Mon Apr 24 10:15:12 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 10:15:03 -0700 Received: from pa139.warszawa.ppp.tpnet.pl ([212.160.52.139]:4032 "HELO geocities.com") by oss.sgi.com with SMTP id ; Mon, 24 Apr 2000 10:14:40 -0700 Received: (qmail 378 invoked from network); 24 Apr 2000 17:14:30 -0000 Received: from localhost (HELO olaf) (NONE-OF-YOUR-BUSINESS@127.0.0.1) by localhost with SMTP; 24 Apr 2000 17:14:30 -0000 Message-ID: <39048082.3EE153BF@geocities.com> Date: Mon, 24 Apr 2000 19:12:34 +0200 From: Artur Skawina X-Mailer: Mozilla 3.04 (X11; U; Linux 2.3.99-pre6pre5as-smp i686) MIME-Version: 1.0 To: Andrew Morton CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: Hardware IP checksums References: <200004241544.TAA08437@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > It appeared pretty useless. Jes noticed this first time, > I had to check this experimentally, and yes... No visible improvements. > HW checksum on transmit looks really not useful without zero copy. for example on p2 the extra checksumming cost (vs a plain copy) is ~7%, and that's the worst case, ie everything cached. Add cache misses and the difference won't be visible... From owner-netdev@oss.sgi.com Mon Apr 24 11:02:32 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 11:02:23 -0700 Received: from surya.crhc.uiuc.edu ([130.126.142.117]:28172 "EHLO localhost.localdomain") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 11:02:06 -0700 Received: from timely.crhc.uiuc.edu (IDENT:kwlee@localhost.localdomain [127.0.0.1]) by localhost.localdomain (8.9.3/8.9.3) with ESMTP id LAA10204 for ; Mon, 24 Apr 2000 11:22:46 GMT Message-ID: <39042E86.E9BAF76D@timely.crhc.uiuc.edu> Date: Mon, 24 Apr 2000 11:22:46 +0000 From: Kang-Won Lee X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.12-20 i586) X-Accept-Language: ko, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: mailing list archive? Content-Type: text/plain; charset=EUC-KR Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Is there an archive of this mailing list? Thanks, -- Kang-Won Lee 438 Coordinated Science Laboratory 1308 West Main Street Urbana, IL 61801-2307 Phone: (217) 244-1746 Email: kwlee@timely.crhc.uiuc.edu From owner-netdev@oss.sgi.com Mon Apr 24 14:44:17 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 14:44:07 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:41205 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 14:43:44 -0700 Received: from fred.muc.de (none@ns1139.munich.netsurf.de [195.180.235.139]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id XAA14897; Mon, 24 Apr 2000 23:43:22 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12jqgR-0000NV-00; Mon, 24 Apr 2000 23:46:23 +0200 Date: Mon, 24 Apr 2000 23:46:23 +0200 From: Andi Kleen To: davem@redhat.com Cc: "A.N.Kuznetsov" , netdev@oss.sgi.com Subject: [PATCH] Move shaper control information into skb->cb Message-ID: <20000424234623.A1446@fred.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This moves the shaper local data into the skb->cb struct, removing ugly ifdefs from the sk_buff. I also documented the purpose of the control buffer better. For 2.3.x. -Andi --- linux/include/linux/skbuff.h-shaper Thu Mar 16 23:55:08 2000 +++ linux/include/linux/skbuff.h Sun Apr 23 00:40:05 2000 @@ -99,6 +99,12 @@ struct dst_entry *dst; + /* + * This is the control buffer. It is free to use for every + * layer. Please put your private variables there. If you + * want to keep them across layers you have to do a skb_clone() + * first. This is owned by whoever has the skb queued ATM. + */ char cb[48]; unsigned int len; /* Length of actual data */ @@ -132,13 +138,6 @@ unsigned int nf_debug; #endif #endif /*CONFIG_NETFILTER*/ -#if defined(CONFIG_SHAPER) || defined(CONFIG_SHAPER_MODULE) - __u32 shapelatency; /* Latency on frame */ - __u32 shapeclock; /* Time it should go out */ - __u32 shapelen; /* Frame length in clocks */ - __u32 shapestamp; /* Stamp for shaper */ - __u16 shapepend; /* Pending */ -#endif #if defined(CONFIG_HIPPI) union{ --- linux/drivers/net/shaper.c-shaper Wed Mar 15 20:27:11 2000 +++ linux/drivers/net/shaper.c Tue Apr 4 09:33:48 2000 @@ -64,8 +64,12 @@ * Device statistics (tx_pakets, tx_bytes, * tx_drops: queue_over_time and collisions: max_queue_exceded) * 1999/06/18 Jordi Murgo + * + * Use skb->cb for private data. + * 2000/03 Andi Kleen */ +#include #include #include #include @@ -84,6 +88,15 @@ #include #include +struct shaper_cb { + __u32 shapelatency; /* Latency on frame */ + __u32 shapeclock; /* Time it should go out */ + __u32 shapelen; /* Frame length in clocks */ + __u32 shapestamp; /* Stamp for shaper */ + __u16 shapepend; /* Pending */ +}; +#define SHAPERCB(skb) ((struct shaper_cb *) ((skb)->cb)) + int sh_debug; /* Debug flag */ #define SHAPER_BANNER "CymruNet Traffic Shaper BETA 0.04 for Linux 2.1\n" @@ -148,7 +161,7 @@ static int shaper_qframe(struct shaper *shaper, struct sk_buff *skb) { struct sk_buff *ptr; - + /* * Get ready to work on this shaper. Lock may fail if its * an interrupt and locked. @@ -162,25 +175,25 @@ * Set up our packet details */ - skb->shapelatency=0; - skb->shapeclock=shaper->recovery; - if(time_before(skb->shapeclock, jiffies)) - skb->shapeclock=jiffies; + SHAPERCB(skb)->shapelatency=0; + SHAPERCB(skb)->shapeclock=shaper->recovery; + if(time_before(SHAPERCB(skb)->shapeclock, jiffies)) + SHAPERCB(skb)->shapeclock=jiffies; skb->priority=0; /* short term bug fix */ - skb->shapestamp=jiffies; + SHAPERCB(skb)->shapestamp=jiffies; /* * Time slots for this packet. */ - skb->shapelen= shaper_clocks(shaper,skb); + SHAPERCB(skb)->shapelen= shaper_clocks(shaper,skb); #ifdef SHAPER_COMPLEX /* and broken.. */ while(ptr && ptr!=(struct sk_buff *)&shaper->sendq) { if(ptr->pripri - && jiffies - ptr->shapeclock < SHAPER_MAXSLIP) + && jiffies - SHAPERCB(ptr)->shapeclock < SHAPER_MAXSLIP) { struct sk_buff *tmp=ptr->prev; @@ -189,14 +202,14 @@ * of the new frame. */ - ptr->shapeclock+=skb->shapelen; - ptr->shapelatency+=skb->shapelen; + SHAPERCB(ptr)->shapeclock+=SHAPERCB(skb)->shapelen; + SHAPERCB(ptr)->shapelatency+=SHAPERCB(skb)->shapelen; /* * The packet may have slipped so far back it * fell off. */ - if(ptr->shapelatency > SHAPER_LATENCY) + if(SHAPERCB(ptr)->shapelatency > SHAPER_LATENCY) { skb_unlink(ptr); dev_kfree_skb(ptr); @@ -217,7 +230,7 @@ * this loop. */ for(tmp=skb_peek(&shaper->sendq); tmp!=NULL && tmp!=ptr; tmp=tmp->next) - skb->shapeclock+=tmp->shapelen; + SHAPERCB(skb)->shapeclock+=tmp->shapelen; skb_append(ptr,skb); } #else @@ -229,11 +242,11 @@ */ for(tmp=skb_peek(&shaper->sendq); tmp!=NULL && tmp!=(struct sk_buff *)&shaper->sendq; tmp=tmp->next) - skb->shapeclock+=tmp->shapelen; + SHAPERCB(skb)->shapeclock+=SHAPERCB(tmp)->shapelen; /* * Queue over time. Spill packet. */ - if(skb->shapeclock-jiffies > SHAPER_LATENCY) { + if(SHAPERCB(skb)->shapeclock-jiffies > SHAPER_LATENCY) { dev_kfree_skb(skb); shaper->stats.tx_dropped++; } else @@ -324,22 +337,23 @@ */ if(sh_debug) - printk("Clock = %d, jiffies = %ld\n", skb->shapeclock, jiffies); - if(time_before_eq(skb->shapeclock - jiffies, SHAPER_BURST)) + printk("Clock = %d, jiffies = %ld\n", SHAPERCB(skb)->shapeclock, jiffies); + if(time_before_eq(SHAPERCB(skb)->shapeclock - jiffies, SHAPER_BURST)) { /* * Pull the frame and get interrupts back on. */ skb_unlink(skb); - if (shaper->recovery < skb->shapeclock + skb->shapelen) - shaper->recovery = skb->shapeclock + skb->shapelen; + if (shaper->recovery < + SHAPERCB(skb)->shapeclock + SHAPERCB(skb)->shapelen) + shaper->recovery = SHAPERCB(skb)->shapeclock + SHAPERCB(skb)->shapelen; /* * Pass on to the physical target device via * our low level packet thrower. */ - skb->shapepend=0; + SHAPERCB(skb)->shapepend=0; shaper_queue_xmit(shaper, skb); /* Fire */ } else @@ -351,7 +365,7 @@ */ if(skb!=NULL) - mod_timer(&shaper->timer, skb->shapeclock); + mod_timer(&shaper->timer, SHAPERCB(skb)->shapeclock); clear_bit(0, &shaper->locked); } -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Mon Apr 24 16:13:19 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 16:13:09 -0700 Received: from pizda.ninka.net ([216.101.162.242]:43136 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 16:12:49 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA02602; Mon, 24 Apr 2000 16:06:21 -0700 Date: Mon, 24 Apr 2000 16:06:21 -0700 Message-Id: <200004242306.QAA02602@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: ak@muc.de CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com In-reply-to: <20000424234623.A1446@fred.muc.de> (message from Andi Kleen on Mon, 24 Apr 2000 23:46:23 +0200) Subject: Re: [PATCH] Move shaper control information into skb->cb References: <20000424234623.A1446@fred.muc.de> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Mon, 24 Apr 2000 23:46:23 +0200 From: Andi Kleen This moves the shaper local data into the skb->cb struct, removing ugly ifdefs from the sk_buff. I also documented the purpose of the control buffer better. For 2.3.x. I really like this. Just one question, did you verify that this won't clobber CB data used by whoever generated the packet? Just say yes, and if I see no other objections I'll put this into the tree. I know intuitively that once, for example, tcp_transmit_skb has built the TCP header the control block can be clobbered by any further usage. We should really document this, at least in a comment above that function. Sounds like something which would be nice to audit in our tree. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Apr 24 16:47:20 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 16:47:10 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:51643 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 16:46:49 -0700 Received: from fred.muc.de (none@ns1058.munich.netsurf.de [195.180.235.58]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id BAA24542; Tue, 25 Apr 2000 01:46:32 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12jscC-0000Rj-00; Tue, 25 Apr 2000 01:50:08 +0200 Date: Tue, 25 Apr 2000 01:50:08 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [PATCH] Move shaper control information into skb->cb Message-ID: <20000425015008.A1689@fred.muc.de> References: <20000424234623.A1446@fred.muc.de> <200004242306.QAA02602@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <200004242306.QAA02602@pizda.ninka.net>; from David S. Miller on Tue, Apr 25, 2000 at 01:12:40AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Apr 25, 2000 at 01:12:40AM +0200, David S. Miller wrote: > Date: Mon, 24 Apr 2000 23:46:23 +0200 > From: Andi Kleen > > This moves the shaper local data into the skb->cb struct, removing > ugly ifdefs from the sk_buff. I also documented the purpose of > the control buffer better. For 2.3.x. > > I really like this. > > Just one question, did you verify that this won't clobber CB data > used by whoever generated the packet? Just say yes, and if I see > no other objections I'll put this into the tree. Actually I didn't (because I ``knew'' that only TCP/IP uses it), but of course i was wrong: econet and decnet use it. econet does not queue, and decnet seems to always call skb_clone() before submitting. Other than that I cannot find any other cb users. > > I know intuitively that once, for example, tcp_transmit_skb has > built the TCP header the control block can be clobbered by any > further usage. We should really document this, at least in a > comment above that function. I documented it in skbuff.h (``is owned by whoever has the skb queued'') BTW, the hippi private fields should be probably moved there too. > Sounds like something which would be nice to audit in our tree. Done (with grep, hopefully I didn't miss anything) -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Mon Apr 24 16:54:20 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 16:54:10 -0700 Received: from pizda.ninka.net ([216.101.162.242]:19073 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 16:54:00 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA02669; Mon, 24 Apr 2000 16:47:32 -0700 Date: Mon, 24 Apr 2000 16:47:32 -0700 Message-Id: <200004242347.QAA02669@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: ak@muc.de CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com In-reply-to: <20000425015008.A1689@fred.muc.de> (message from Andi Kleen on Tue, 25 Apr 2000 01:50:08 +0200) Subject: Re: [PATCH] Move shaper control information into skb->cb References: <20000424234623.A1446@fred.muc.de> <200004242306.QAA02602@pizda.ninka.net> <20000425015008.A1689@fred.muc.de> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Tue, 25 Apr 2000 01:50:08 +0200 From: Andi Kleen > I know intuitively that once, for example, tcp_transmit_skb has > built the TCP header the control block can be clobbered by any > further usage. We should really document this, at least in a > comment above that function. I documented it in skbuff.h (``is owned by whoever has the skb queued'') BTW, the hippi private fields should be probably moved there too. This brings up an important issue. What if then, we'd like to shape packets over HIPPI? It sounds really stupid, I know, but the point is that once we start allowing software or hardware devices to use the CB for their private per-packet state, we can run into problems if one is a pseudo device in front of another. If shaper mucks with it's CB fields, and once it has sent the packet off to the real device it never references that skb header again, then at least in this case there is no problem. Is that what is happening here? _This_ is the issue now that really concerns me. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Apr 24 16:58:39 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 16:58:19 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:21700 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 16:58:06 -0700 Received: from fred.muc.de (none@ns1058.munich.netsurf.de [195.180.235.58]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id BAA25341; Tue, 25 Apr 2000 01:57:59 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12jsnH-0000Sd-00; Tue, 25 Apr 2000 02:01:35 +0200 Date: Tue, 25 Apr 2000 02:01:35 +0200 From: Andi Kleen To: "David S. Miller" Cc: ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [PATCH] Move shaper control information into skb->cb Message-ID: <20000425020135.A1762@fred.muc.de> References: <20000424234623.A1446@fred.muc.de> <200004242306.QAA02602@pizda.ninka.net> <20000425015008.A1689@fred.muc.de> <200004242347.QAA02669@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <200004242347.QAA02669@pizda.ninka.net>; from David S. Miller on Tue, Apr 25, 2000 at 01:53:51AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Apr 25, 2000 at 01:53:51AM +0200, David S. Miller wrote: > Date: Tue, 25 Apr 2000 01:50:08 +0200 > From: Andi Kleen > > > I know intuitively that once, for example, tcp_transmit_skb has > > built the TCP header the control block can be clobbered by any > > further usage. We should really document this, at least in a > > comment above that function. > > I documented it in skbuff.h (``is owned by whoever has the skb queued'') > > BTW, the hippi private fields should be probably moved there too. > > This brings up an important issue. What if then, we'd like to shape > packets over HIPPI? It sounds really stupid, I know, but the point > is that once we start allowing software or hardware devices to use the > CB for their private per-packet state, we can run into problems if one > is a pseudo device in front of another. > > If shaper mucks with it's CB fields, and once it has sent the packet > off to the real device it never references that skb header again, then > at least in this case there is no problem. Is that what is happening > here? > Not a problem. Shaper calls skb_clone before submitting the data. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Mon Apr 24 18:46:01 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 18:45:51 -0700 Received: from lrcsun15.epfl.ch ([128.178.156.77]:48616 "EHLO lrcsun15.epfl.ch") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 18:45:25 -0700 Received: (from almesber@localhost) by lrcsun15.epfl.ch (8.8.X/EPFL-8.1a) id DAA11544; Tue, 25 Apr 2000 03:45:24 +0200 (MET DST) From: Werner Almesberger Message-Id: <200004250145.DAA11544@lrcsun15.epfl.ch> Subject: Re: [PATCH] Move shaper control information into skb->cb To: ak@muc.de (Andi Kleen) Date: Tue, 25 Apr 2000 03:45:24 +0200 (MET DST) Cc: davem@redhat.com (David S. Miller), ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com In-Reply-To: <20000425015008.A1689@fred.muc.de> from "Andi Kleen" at Apr 25, 2000 01:50:08 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andi Kleen wrote: > Other than that I cannot find any other cb users. Some ATM drivers use it, e.g. see at the end of drivers/atm/eni.h - Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH werner.almesberger@ica.epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Mon Apr 24 19:23:31 2000 Received: by oss.sgi.com id ; Mon, 24 Apr 2000 19:23:11 -0700 Received: from smtprich.nortel.com ([192.135.215.8]:47305 "EHLO smtprich.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 24 Apr 2000 19:22:55 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprich.nortel.com; Mon, 24 Apr 2000 21:23:27 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id JSPANY33; Mon, 24 Apr 2000 21:22:17 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2P4Y11M4; Tue, 25 Apr 2000 12:22:21 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.19]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id MAA10338; Tue, 25 Apr 2000 12:22:14 +1000 Message-ID: <39050154.B0C85278@uow.edu.au> Date: Tue, 25 Apr 2000 12:22:12 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Artur Skawina CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: Hardware IP checksums References: <200004241544.TAA08437@ms2.inr.ac.ru> <39048082.3EE153BF@geocities.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Artur Skawina wrote: > > > It appeared pretty useless. Jes noticed this first time, > > I had to check this experimentally, and yes... No visible improvements. > > HW checksum on transmit looks really not useful without zero copy. > > for example on p2 the extra checksumming cost (vs a plain copy) is ~7%, > and that's the worst case, ie everything cached. Add cache misses and > the difference won't be visible... Ah. That's pretty convincing. But it would be a huge win for zero-copy Tx, as Alexey points out. I need to pay a bit more attention to what's happening on that front. Thanks. -- -akpm- From owner-netdev@oss.sgi.com Tue Apr 25 02:22:16 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 02:21:46 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:48595 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Tue, 25 Apr 2000 02:21:27 -0700 Received: from fred.muc.de (none@ns1167.munich.netsurf.de [195.180.235.167]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id LAA07220; Tue, 25 Apr 2000 11:21:09 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12k1Z9-0000F1-00; Tue, 25 Apr 2000 11:23:35 +0200 Date: Tue, 25 Apr 2000 11:23:35 +0200 From: Andi Kleen To: Werner Almesberger Cc: Andi Kleen , "David S. Miller" , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [PATCH] Move shaper control information into skb->cb Message-ID: <20000425112335.A911@fred.muc.de> References: <20000425015008.A1689@fred.muc.de> <200004250145.DAA11544@lrcsun15.epfl.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <200004250145.DAA11544@lrcsun15.epfl.ch>; from Werner Almesberger on Tue, Apr 25, 2000 at 03:45:17AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Apr 25, 2000 at 03:45:17AM +0200, Werner Almesberger wrote: > Andi Kleen wrote: > > Other than that I cannot find any other cb users. > > Some ATM drivers use it, e.g. see at the end of drivers/atm/eni.h Unless they play games with skb->users and try to push them to other layers without skb_clone while having them in some internal list, that is ok. As far as I can see they don't. David, can you install the patch if you haven't already ? -Andi From owner-netdev@oss.sgi.com Tue Apr 25 02:30:35 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 02:30:16 -0700 Received: from pizda.ninka.net ([216.101.162.242]:15488 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Tue, 25 Apr 2000 02:30:12 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id CAA04529; Tue, 25 Apr 2000 02:23:47 -0700 Date: Tue, 25 Apr 2000 02:23:47 -0700 Message-Id: <200004250923.CAA04529@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: ak@muc.de CC: almesber@lrc.epfl.ch, ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com In-reply-to: <20000425112335.A911@fred.muc.de> (message from Andi Kleen on Tue, 25 Apr 2000 11:23:35 +0200) Subject: Re: [PATCH] Move shaper control information into skb->cb References: <20000425015008.A1689@fred.muc.de> <200004250145.DAA11544@lrcsun15.epfl.ch> <20000425112335.A911@fred.muc.de> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Tue, 25 Apr 2000 11:23:35 +0200 From: Andi Kleen David, can you install the patch if you haven't already ? I'll do this before going to sleep. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Apr 25 05:55:26 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 05:55:06 -0700 Received: from pd159.warszawa.ppp.tpnet.pl ([212.160.55.159]:5312 "HELO geocities.com") by oss.sgi.com with SMTP id ; Tue, 25 Apr 2000 05:54:47 -0700 Received: (qmail 917 invoked from network); 25 Apr 2000 12:54:11 -0000 Received: from localhost (HELO olaf) (NONE-OF-YOUR-BUSINESS@127.0.0.1) by localhost with SMTP; 25 Apr 2000 12:54:11 -0000 Message-ID: <39058C76.2CB366DB@geocities.com> Date: Tue, 25 Apr 2000 14:15:50 +0200 From: Artur Skawina X-Mailer: Mozilla 3.04 (X11; U; Linux 2.3.99-pre6pre5as-smp i686) MIME-Version: 1.0 To: Andrew Morton CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: Hardware IP checksums References: <200004241544.TAA08437@ms2.inr.ac.ru> <39048082.3EE153BF@geocities.com> <39050154.B0C85278@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew Morton wrote: > > Artur Skawina wrote: > > > > for example on p2 the extra checksumming cost (vs a plain copy) is ~7%, > > and that's the worst case, ie everything cached. Add cache misses and > > the difference won't be visible... i didn't have the numbers in front of me and managed to remember the wrong figure :( Sorry. It really looks more like this: TIME-N+S TIME32 TIME33 TIME1480 TIMEXXXX FUNCTION 22109 9978 13303 18693 25684 csum_partial_copy_generic_686as2 17609 12194 12748 10207 22893 kernel_memcpy686as2 ie it depends on the size of frame, and for eth sized chunks overhead is ~83%. > Ah. That's pretty convincing. But it would be a huge win for zero-copy hmm, the quick check i just did seems to suggest the difference might not disappear with a cold cache either. Did anyone try replacing the checksum-copy with a plain copy, made the kernel ignore checksum errors and benchmarked against a normal one? That would give an idea of the true sw checksum impact (ignoring the extra hw support overhead). From owner-netdev@oss.sgi.com Tue Apr 25 05:57:26 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 05:57:16 -0700 Received: from pizda.ninka.net ([216.101.162.242]:20096 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Tue, 25 Apr 2000 05:57:08 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id FAA04747; Tue, 25 Apr 2000 05:50:37 -0700 Date: Tue, 25 Apr 2000 05:50:37 -0700 Message-Id: <200004251250.FAA04747@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: ak@muc.de CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com In-reply-to: <20000424234623.A1446@fred.muc.de> (message from Andi Kleen on Mon, 24 Apr 2000 23:46:23 +0200) Subject: Re: [PATCH] Move shaper control information into skb->cb References: <20000424234623.A1446@fred.muc.de> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Mon, 24 Apr 2000 23:46:23 +0200 From: Andi Kleen This moves the shaper local data into the skb->cb struct, removing ugly ifdefs from the sk_buff. I also documented the purpose of the control buffer better. For 2.3.x. Patch applied, thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Apr 25 07:49:16 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 07:49:07 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:29198 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 25 Apr 2000 07:48:59 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id SAA18239; Tue, 25 Apr 2000 18:48:13 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004251448.SAA18239@ms2.inr.ac.ru> Subject: Re: [PATCH] Move shaper control information into skb->cb To: almesber@lrc.epfl.ch (Werner Almesberger) Date: Tue, 25 Apr 2000 18:48:13 +0400 (MSK DST) Cc: ak@muc.de, davem@redhat.com, netdev@oss.sgi.com In-Reply-To: <200004250145.DAA11544@lrcsun15.epfl.ch> from "Werner Almesberger" at Apr 25, 0 03:45:24 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 654 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Some ATM drivers use it, e.g. see at the end of drivers/atm/eni.h They will not correlate. In fact, cb can be consideerd invalid when it intersects dev->hard_start_xmit(). So that both shaper and any other device may use it as they desire, provided they do not assume anything about its content on entry. BTW, Werner, it is exaclty why I objected against putting skb->tc_index to cb. BTW#2, qdiscs sort of sch_atm could use cb to pass some tags, but in this case they are supposed to have a private entry to corresponding device, and should guarantee that this tag will not be used occasionally on packet, not passed though sch_atm. Alexey From owner-netdev@oss.sgi.com Tue Apr 25 08:33:08 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 08:32:58 -0700 Received: from smtprtp1.ntcom.nortel.net ([137.118.22.14]:14790 "EHLO smtprtp1.ntcom.nortel.net") by oss.sgi.com with ESMTP id ; Tue, 25 Apr 2000 08:32:39 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by smtprtp1.ntcom.nortel.net; Tue, 25 Apr 2000 11:27:19 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2BF0F75J; Tue, 25 Apr 2000 23:27:14 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id 2P4Y11R0; Wed, 26 Apr 2000 01:27:17 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id BAA13711; Wed, 26 Apr 2000 01:27:09 +1000 Message-ID: <3905B94B.E5E95141@uow.edu.au> Date: Wed, 26 Apr 2000 01:27:07 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Kang-Won Lee CC: netdev@oss.sgi.com Subject: Re: mailing list archive? References: <39042E86.E9BAF76D@timely.crhc.uiuc.edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Kang-Won Lee wrote: > > Is there an archive of this mailing list? Thanks, http://www.wcug.wwu.edu/lists/netdev/ From owner-netdev@oss.sgi.com Tue Apr 25 09:26:11 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 09:25:51 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:53775 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 25 Apr 2000 09:25:30 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA19166; Tue, 25 Apr 2000 20:24:51 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004251624.UAA19166@ms2.inr.ac.ru> Subject: Re: Hardware IP checksums To: skawina@geocities.com (Artur Skawina) Date: Tue, 25 Apr 2000 20:24:51 +0400 (MSK DST) Cc: andrewm@uow.edu.au, netdev@oss.sgi.com In-Reply-To: <39058C76.2CB366DB@geocities.com> from "Artur Skawina" at Apr 25, 0 02:15:50 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 966 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > ie it depends on the size of frame, and for eth sized chunks overhead is ~83%. > > > Ah. That's pretty convincing. But it would be a huge win for zero-copy > > hmm, the quick check i just did seems to suggest the difference might not > disappear with a cold cache either. Did anyone try replacing the > checksum-copy with a plain copy, made the kernel ignore checksum errors > and benchmarked against a normal one? That would give an idea of the > true sw checksum impact (ignoring the extra hw support overhead). I've said, I tried. On loopback improvement is invisible inside statistical bounds. On 100Mbit ethernet the difference does not affect throughput (it is saturated in any case), so that I looked at CPU consumption, which is very random. Again, the difference is inside statistical errors. Things can change drastically on Giga ethernet, but it is almost impossible to measure because we are bottlenecked at receiver side. Voila. Alexey From owner-netdev@oss.sgi.com Tue Apr 25 13:04:53 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 13:04:44 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:47631 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Tue, 25 Apr 2000 13:04:31 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id QAA00585 for netdev@oss.sgi.com; Tue, 25 Apr 2000 16:06:24 -0500 Message-ID: <20000425160624.B571@doit.wisc.edu> Date: Tue, 25 Apr 2000 16:06:24 -0500 From: "James R. Leu" To: netdev@oss.sgi.com Subject: Re: [PATCH] Move shaper control information into skb->cb Reply-To: jleu@mindspring.com References: <20000424234623.A1446@fred.muc.de> <200004242306.QAA02602@pizda.ninka.net> <20000425015008.A1689@fred.muc.de> <200004242347.QAA02669@pizda.ninka.net> <20000425020135.A1762@fred.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 In-Reply-To: <20000425020135.A1762@fred.muc.de>; from Andi Kleen on Tue, Apr 25, 2000 at 02:01:35AM +0200 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing With all this talk of the CB field. What is the correct way to use the CB field when implementing a new protocol? Thanks, Jim On Tue, Apr 25, 2000 at 02:01:35AM +0200, Andi Kleen wrote: > On Tue, Apr 25, 2000 at 01:53:51AM +0200, David S. Miller wrote: > > Date: Tue, 25 Apr 2000 01:50:08 +0200 > > From: Andi Kleen > > > > > I know intuitively that once, for example, tcp_transmit_skb has > > > built the TCP header the control block can be clobbered by any > > > further usage. We should really document this, at least in a > > > comment above that function. > > > > I documented it in skbuff.h (``is owned by whoever has the skb queued'') > > > > BTW, the hippi private fields should be probably moved there too. > > > > This brings up an important issue. What if then, we'd like to shape > > packets over HIPPI? It sounds really stupid, I know, but the point > > is that once we start allowing software or hardware devices to use the > > CB for their private per-packet state, we can run into problems if one > > is a pseudo device in front of another. > > > > If shaper mucks with it's CB fields, and once it has sent the packet > > off to the real device it never references that skb header again, then > > at least in this case there is no problem. Is that what is happening > > here? > > > Not a problem. Shaper calls skb_clone before submitting the data. > > > -Andi > > -- > This is like TV. I don't like TV. -- James R. Leu From owner-netdev@oss.sgi.com Tue Apr 25 13:19:53 2000 Received: by oss.sgi.com id ; Tue, 25 Apr 2000 13:19:44 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:9952 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Tue, 25 Apr 2000 13:19:28 -0700 Received: from fred.muc.de (none@ns1231.munich.netsurf.de [195.180.235.231]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id WAA07584; Tue, 25 Apr 2000 22:19:24 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12kBrJ-0000I5-00; Tue, 25 Apr 2000 22:23:01 +0200 Date: Tue, 25 Apr 2000 22:23:01 +0200 From: Andi Kleen To: "James R. Leu" Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Move shaper control information into skb->cb Message-ID: <20000425222301.A1114@fred.muc.de> References: <20000424234623.A1446@fred.muc.de> <200004242306.QAA02602@pizda.ninka.net> <20000425015008.A1689@fred.muc.de> <200004242347.QAA02669@pizda.ninka.net> <20000425020135.A1762@fred.muc.de> <20000425160624.B571@doit.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <20000425160624.B571@doit.wisc.edu>; from James R. Leu on Tue, Apr 25, 2000 at 10:05:59PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Apr 25, 2000 at 10:05:59PM +0200, James R. Leu wrote: > With all this talk of the CB field. What is the correct way to use the CB > field when implementing a new protocol? Like the new comment says: /* * This is the control buffer. It is free to use for every * layer. Please put your private variables there. If you * want to keep them across layers you have to do a skb_clone() * first. This is owned by whoever has the skb queued ATM. */ As long as you have it queued skb->cb is yours. If you want to queue packets over dev_queue_xmit you have to do a skb_clone() anyways [playing games with skb->users is not recommended anymore], which does a new scratch copy of skb->cb -Andi From owner-netdev@oss.sgi.com Wed Apr 26 02:03:01 2000 Received: by oss.sgi.com id ; Wed, 26 Apr 2000 02:02:42 -0700 Received: from mailhostnew.tbit.dk ([194.182.135.150]:16099 "EHLO mailhostnew.tbit.dk") by oss.sgi.com with ESMTP id ; Wed, 26 Apr 2000 02:02:11 -0700 Received: from ric.tbit.dk (ric.tbit.dk [194.182.135.53]) by mailhostnew.tbit.dk (8.9.3+Sun/8.9.3) with ESMTP id LAA06543 for ; Wed, 26 Apr 2000 11:02:09 +0200 (MET DST) Received: (from ric@localhost) by ric.tbit.dk (8.9.3/8.9.3) id LAA02890; Wed, 26 Apr 2000 11:02:08 +0200 To: netdev@oss.sgi.com Subject: Re: Non-fragmented ICMPv6 packets with an IPv6 fragment header References: <200004191829.WAA07126@ms2.inr.ac.ru> From: "Richard =?iso-8859-1?q?J=F8rgensen?=" Reply-To: ric@tbit.dk Date: 26 Apr 2000 11:02:08 +0200 In-Reply-To: kuznet@ms2.inr.ac.ru's message of "Wed, 19 Apr 2000 22:29:26 +0400 (MSK DST)" Message-ID: Lines: 21 User-Agent: Gnus/5.070098 (Pterodactyl Gnus v0.98) Emacs/20.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > > Linux IPv6 stack seems to have a problem receiving ICMP6 packets, if > > the IPv6 packet contain a fragment header, but is not fragmentet > > (i.e. the entire packet is in _one_ fragment. > > Yes... They are never "reassembled". Thank you, it will be fixed soon. Great! > > It seems only to be a problem with ICMP6 - having an "unused" fragment > > header in a TCP-packet does not seem to give any problems. > > It is impossible. Please, check this more creafully. > Probably, it indicates some another bug. Sorry, that was me. TCP-packets also newer get "reassembled". I used telnet(1) to generate the TCP packets and didn't nottice that telnet set the "don't fragment" bit - in which case the fragment header is not added by the NAT-PT translation. /ric From owner-netdev@oss.sgi.com Thu Apr 27 03:01:19 2000 Received: by oss.sgi.com id ; Thu, 27 Apr 2000 03:00:51 -0700 Received: from dialup-ad-12-77.camtech.net.au ([203.55.242.77]:3332 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 27 Apr 2000 03:00:32 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Thu, 27 Apr 2000 19:30:32 +0930 (CST) Message-Id: From: Rusty Russell To: torvalds@transmeta.com cc: netdev@oss.sgi.com, netfilter@lists.samba.org Subject: [PATCH] Increased DoS protection. Date: Thu, 27 Apr 2000 19:30:32 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linus, please apply v2.3.99-pre6. We used to simply hit a wall when the firewalling code was tracking too many connections: this patch applies a number of strategies to mitigate that: 1) Don't keep track of connections when packets dropped. 2) Forget connections which have only seen a RST reply. 3) Do randomish/LRU drop on unreplied connections when we're under stress. We still have an issue with being able to chew up serious amounts of CPU even over 10baseT, and there are more tricks we can do when we have TCP window tracking. Also includes some cleanups in the fast `connection already established' path. Rusty. diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/Documentation/Configure.help working/Documentation/Configure.help --- linux-2.3.99-pre-6-2-rusty/Documentation/Configure.help Fri Apr 14 17:33:30 2000 +++ working/Documentation/Configure.help Mon Apr 24 13:00:30 2000 @@ -1771,7 +1771,7 @@ CONFIG_IP_NF_MATCH_LIMIT limit matching allows you to control the rate at which a rule can be matched: mainly useful in combination with the LOG target ("LOG - target support", below). + target support", below) and to avoid some Denial of Service attacks. If you want to compile it as a module, say M here and read Documentation/modules.txt. If unsure, say `N'. diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/include/linux/netfilter_ipv4/ip_conntrack.h working/include/linux/netfilter_ipv4/ip_conntrack.h --- linux-2.3.99-pre-6-2-rusty/include/linux/netfilter_ipv4/ip_conntrack.h Mon Apr 17 16:25:08 2000 +++ working/include/linux/netfilter_ipv4/ip_conntrack.h Sun Apr 23 22:41:38 2000 @@ -51,7 +51,10 @@ IPS_EXPECTED = 0x01, /* We've seen packets both ways: bit 1 set. Can be set, not unset. */ - IPS_SEEN_REPLY = 0x02 + IPS_SEEN_REPLY = 0x02, + + /* Packet seen leaving box: bit 2 set. Can be set, not unset. */ + IPS_CONFIRMED = 0x04 }; struct ip_conntrack_expect @@ -88,7 +91,7 @@ struct ip_conntrack_tuple_hash tuplehash[IP_CT_DIR_MAX]; /* Have we seen traffic both ways yet? (bitset) */ - unsigned int status; + volatile unsigned int status; /* Timer function; drops refcnt when it goes off. */ struct timer_list timeout; diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/include/linux/netfilter_ipv4/ip_conntrack_core.h working/include/linux/netfilter_ipv4/ip_conntrack_core.h --- linux-2.3.99-pre-6-2-rusty/include/linux/netfilter_ipv4/ip_conntrack_core.h Mon Apr 17 16:25:08 2000 +++ working/include/linux/netfilter_ipv4/ip_conntrack_core.h Thu Apr 20 13:00:38 2000 @@ -20,8 +20,9 @@ extern struct ip_conntrack_protocol *__find_proto(u_int8_t protocol); extern struct list_head protocol_list; -/* Returns TRUE if it dealt with ICMP, and filled in skb->nfct */ -int icmp_error_track(struct sk_buff *skb); +/* Returns conntrack if it dealt with ICMP, and filled in skb->nfct */ +extern struct ip_conntrack *icmp_error_track(struct sk_buff *skb, + enum ip_conntrack_info *ctinfo); extern int get_tuple(const struct iphdr *iph, size_t len, struct ip_conntrack_tuple *tuple, struct ip_conntrack_protocol *protocol); @@ -30,6 +31,9 @@ struct ip_conntrack_tuple_hash * ip_conntrack_find_get(const struct ip_conntrack_tuple *tuple, const struct ip_conntrack *ignored_conntrack); + +/* Confirm a connection */ +void ip_conntrack_confirm(struct ip_conntrack *ct); extern unsigned int ip_conntrack_htable_size; extern struct list_head *ip_conntrack_hash; diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_conntrack_core.c working/net/ipv4/netfilter/ip_conntrack_core.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_conntrack_core.c Fri Apr 14 17:41:01 2000 +++ working/net/ipv4/netfilter/ip_conntrack_core.c Sun Apr 23 22:59:02 2000 @@ -157,10 +157,47 @@ } static void +clean_from_lists(struct ip_conntrack *ct) +{ + MUST_BE_WRITE_LOCKED(&ip_conntrack_lock); + /* Remove from both hash lists */ + LIST_DELETE(&ip_conntrack_hash + [hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)], + &ct->tuplehash[IP_CT_DIR_ORIGINAL]); + LIST_DELETE(&ip_conntrack_hash + [hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple)], + &ct->tuplehash[IP_CT_DIR_REPLY]); + /* If our expected is in the list, take it out. */ + if (ct->expected.expectant) { + IP_NF_ASSERT(list_inlist(&expect_list, &ct->expected)); + IP_NF_ASSERT(ct->expected.expectant == ct); + LIST_DELETE(&expect_list, &ct->expected); + } +} + +static void destroy_conntrack(struct nf_conntrack *nfct) { struct ip_conntrack *ct = (struct ip_conntrack *)nfct; + /* Unconfirmed connections haven't been cleaned up by the + timer: hence they cannot be simply deleted here. */ + if (!(ct->status & IPS_CONFIRMED)) { + WRITE_LOCK(&ip_conntrack_lock); + /* Race check: they can't get a reference if noone has + one and we have the write lock. */ + if (atomic_read(&ct->ct_general.use) == 0) { + clean_from_lists(ct); + WRITE_UNLOCK(&ip_conntrack_lock); + } else { + /* Either a last-minute confirmation (ie. ct + now has timer attached), or a last-minute + new skb has reference (still unconfirmed). */ + WRITE_UNLOCK(&ip_conntrack_lock); + return; + } + } + IP_NF_ASSERT(atomic_read(&nfct->use) == 0); IP_NF_ASSERT(!timer_pending(&ct->timeout)); @@ -178,19 +215,7 @@ struct ip_conntrack *ct = (void *)ul_conntrack; WRITE_LOCK(&ip_conntrack_lock); - /* Remove from both hash lists */ - LIST_DELETE(&ip_conntrack_hash - [hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)], - &ct->tuplehash[IP_CT_DIR_ORIGINAL]); - LIST_DELETE(&ip_conntrack_hash - [hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple)], - &ct->tuplehash[IP_CT_DIR_REPLY]); - /* If our expected is in the list, take it out. */ - if (ct->expected.expectant) { - IP_NF_ASSERT(list_inlist(&expect_list, &ct->expected)); - IP_NF_ASSERT(ct->expected.expectant == ct); - LIST_DELETE(&expect_list, &ct->expected); - } + clean_from_lists(ct); WRITE_UNLOCK(&ip_conntrack_lock); ip_conntrack_put(ct); } @@ -235,6 +260,26 @@ return h; } +/* Confirm a connection */ +void +ip_conntrack_confirm(struct ip_conntrack *ct) +{ + DEBUGP("Confirming conntrack %p\n", ct); + WRITE_LOCK(&ip_conntrack_lock); + /* Race check */ + if (!(ct->status & IPS_CONFIRMED)) { + IP_NF_ASSERT(!timer_pending(&ct->timeout)); + ct->status |= IPS_CONFIRMED; + /* Timer relative to confirmation time, not original + setting time, otherwise we'd get timer wrap in + wierd delay cases. */ + ct->timeout.expires += jiffies; + add_timer(&ct->timeout); + atomic_inc(&ct->ct_general.use); + } + WRITE_UNLOCK(&ip_conntrack_lock); +} + /* Returns true if a connection correspondings to the tuple (required for NAT). */ int @@ -250,24 +295,28 @@ return h != NULL; } -/* Returns TRUE if it dealt with ICMP, and filled in skb fields */ -int icmp_error_track(struct sk_buff *skb) +/* Returns conntrack if it dealt with ICMP, and filled in skb fields */ +struct ip_conntrack * +icmp_error_track(struct sk_buff *skb, enum ip_conntrack_info *ctinfo) { - const struct iphdr *iph = skb->nh.iph; - struct icmphdr *hdr = (struct icmphdr *)((u_int32_t *)iph + iph->ihl); + const struct iphdr *iph; + struct icmphdr *hdr; struct ip_conntrack_tuple innertuple, origtuple; - struct iphdr *inner = (struct iphdr *)(hdr + 1); - size_t datalen = skb->len - iph->ihl*4 - sizeof(*hdr); + struct iphdr *inner; + size_t datalen; struct ip_conntrack_protocol *innerproto; struct ip_conntrack_tuple_hash *h; - enum ip_conntrack_info ctinfo; - if (iph->protocol != IPPROTO_ICMP) - return 0; + IP_NF_ASSERT(iph->protocol == IPPROTO_ICMP); + + iph = skb->nh.iph; + hdr = (struct icmphdr *)((u_int32_t *)iph + iph->ihl); + inner = (struct iphdr *)(hdr + 1); + datalen = skb->len - iph->ihl*4 - sizeof(*hdr); if (skb->len < iph->ihl * 4 + sizeof(struct icmphdr)) { DEBUGP("icmp_error_track: too short\n"); - return 1; + return NULL; } if (hdr->type != ICMP_DEST_UNREACH @@ -275,12 +324,12 @@ && hdr->type != ICMP_TIME_EXCEEDED && hdr->type != ICMP_PARAMETERPROB && hdr->type != ICMP_REDIRECT) - return 0; + return NULL; /* Ignore it if the checksum's bogus. */ if (ip_compute_csum((unsigned char *)hdr, sizeof(*hdr) + datalen)) { DEBUGP("icmp_error_track: bad csum\n"); - return 1; + return NULL; } innerproto = find_proto(inner->protocol); @@ -290,28 +339,68 @@ DEBUGP("icmp_error: ! get_tuple p=%u (%u*4+%u dlen=%u)\n", inner->protocol, inner->ihl, 8, datalen); - return 1; + return NULL; } /* Ordinarily, we'd expect the inverted tupleproto, but it's been preserved inside the ICMP. */ if (!invert_tuple(&innertuple, &origtuple, innerproto)) { DEBUGP("icmp_error_track: Can't invert tuple\n"); - return 1; + return NULL; } h = ip_conntrack_find_get(&innertuple, NULL); if (!h) { DEBUGP("icmp_error_track: no match\n"); - return 1; + return NULL; + } + if (!(h->ctrack->status & IPS_CONFIRMED)) { + DEBUGP("icmp_error_track: unconfirmed\n"); + ip_conntrack_put(h->ctrack); + return NULL; } - ctinfo = IP_CT_RELATED; + *ctinfo = IP_CT_RELATED; if (DIRECTION(h) == IP_CT_DIR_REPLY) - ctinfo += IP_CT_IS_REPLY; + *ctinfo += IP_CT_IS_REPLY; /* Update skb to refer to this connection */ - skb->nfct = &h->ctrack->infos[ctinfo]; - return 1; + skb->nfct = &h->ctrack->infos[*ctinfo]; + return h->ctrack; +} + +/* There's a small race here where we may free a just-replied to + connection. Too bad: we're in trouble anyway. */ +static inline int unreplied(const struct ip_conntrack_tuple_hash *i) +{ + /* Unconfirmed connections either really fresh or transitory + anyway */ + if (!(i->ctrack->status & IPS_SEEN_REPLY) + && (i->ctrack->status & IPS_CONFIRMED)) + return 1; + return 0; +} + +static int early_drop(struct list_head *chain) +{ + /* Traverse backwards: gives us oldest, which is roughly LRU */ + struct ip_conntrack_tuple_hash *h; + int dropped = 0; + + READ_LOCK(&ip_conntrack_lock); + h = LIST_FIND(chain, unreplied, struct ip_conntrack_tuple_hash *); + if (h) + atomic_inc(&h->ctrack->ct_general.use); + READ_UNLOCK(&ip_conntrack_lock); + + if (!h) + return dropped; + + if (del_timer(&h->ctrack->timeout)) { + death_by_timeout((unsigned long)h->ctrack); + dropped = 1; + } + ip_conntrack_put(h->ctrack); + return dropped; } static inline int helper_cmp(const struct ip_conntrack_helper *i, @@ -345,29 +434,38 @@ enum ip_conntrack_info ctinfo; unsigned long extra_jiffies; int i; + static unsigned int drop_next = 0; - if (!invert_tuple(&repl_tuple, tuple, protocol)) { - DEBUGP("Can't invert tuple.\n"); - return 1; - } + hash = hash_conntrack(tuple); - if(ip_conntrack_max && - (atomic_read(&ip_conntrack_count) >= ip_conntrack_max)) { + if (ip_conntrack_max && + atomic_read(&ip_conntrack_count) >= ip_conntrack_max) { if (net_ratelimit()) - printk(KERN_WARNING "ip_conntrack: maximum limit of %d entries exceeded\n", ip_conntrack_max); + printk(KERN_WARNING "ip_conntrack: maximum limit of" + " %d entries exceeded\n", ip_conntrack_max); + + /* Try dropping from random chain, or else from the + chain about to put into (in case they're trying to + bomb one hash chain). */ + if (!early_drop(&ip_conntrack_hash[drop_next++]) + && !early_drop(&ip_conntrack_hash[hash])) + return 1; + } + + if (!invert_tuple(&repl_tuple, tuple, protocol)) { + DEBUGP("Can't invert tuple.\n"); return 1; } + repl_hash = hash_conntrack(&repl_tuple); conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC); if (!conntrack) { DEBUGP("Can't allocate conntrack.\n"); return 1; } - hash = hash_conntrack(tuple); - repl_hash = hash_conntrack(&repl_tuple); memset(conntrack, 0, sizeof(struct ip_conntrack)); - atomic_set(&conntrack->ct_general.use, 2); + atomic_set(&conntrack->ct_general.use, 1); conntrack->ct_general.destroy = destroy_conntrack; conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *tuple; conntrack->tuplehash[IP_CT_DIR_ORIGINAL].ctrack = conntrack; @@ -381,17 +479,17 @@ kmem_cache_free(ip_conntrack_cachep, conntrack); return 1; } + /* Don't set timer yet: wait for confirmation */ + init_timer(&conntrack->timeout); conntrack->timeout.data = (unsigned long)conntrack; conntrack->timeout.function = death_by_timeout; - conntrack->timeout.expires = jiffies + extra_jiffies; - add_timer(&conntrack->timeout); + conntrack->timeout.expires = extra_jiffies; /* Sew in at head of hash list. */ WRITE_LOCK(&ip_conntrack_lock); /* Check noone else beat us in the race... */ if (__ip_conntrack_find(tuple, NULL)) { WRITE_UNLOCK(&ip_conntrack_lock); - printk("ip_conntrack: Wow someone raced us!\n"); kmem_cache_free(ip_conntrack_cachep, conntrack); return 0; } @@ -417,70 +515,70 @@ &conntrack->tuplehash[IP_CT_DIR_ORIGINAL]); list_prepend(&ip_conntrack_hash[repl_hash], &conntrack->tuplehash[IP_CT_DIR_REPLY]); + atomic_inc(&ip_conntrack_count); WRITE_UNLOCK(&ip_conntrack_lock); /* Update skb to refer to this connection */ skb->nfct = &conntrack->infos[ctinfo]; - atomic_inc(&ip_conntrack_count); return 1; } -static void -resolve_normal_ct(struct sk_buff *skb, int create) +/* On success, returns conntrack ptr, sets skb->nfct and ctinfo */ +static inline struct ip_conntrack * +resolve_normal_ct(struct sk_buff *skb, + struct ip_conntrack_protocol *proto, + enum ip_conntrack_info *ctinfo) { struct ip_conntrack_tuple tuple; struct ip_conntrack_tuple_hash *h; - struct ip_conntrack_protocol *proto; - enum ip_conntrack_info ctinfo; - proto = find_proto(skb->nh.iph->protocol); if (!get_tuple(skb->nh.iph, skb->len, &tuple, proto)) - return; + return NULL; /* Loop around search/insert race */ do { /* look for tuple match */ h = ip_conntrack_find_get(&tuple, NULL); - if (!h && (!create || init_conntrack(&tuple, proto, skb))) - return; + if (!h && init_conntrack(&tuple, proto, skb)) + return NULL; } while (!h); /* It exists; we have (non-exclusive) reference. */ if (DIRECTION(h) == IP_CT_DIR_REPLY) { - ctinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY; + /* Reply on unconfirmed connection => unclassifiable */ + if (!(h->ctrack->status & IPS_CONFIRMED)) { + DEBUGP("Reply on unconfirmed connection\n"); + ip_conntrack_put(h->ctrack); + return NULL; + } + + *ctinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY; h->ctrack->status |= IPS_SEEN_REPLY; } else { /* Once we've had two way comms, always ESTABLISHED. */ if (h->ctrack->status & IPS_SEEN_REPLY) { DEBUGP("ip_conntrack_in: normal packet for %p\n", h->ctrack); - ctinfo = IP_CT_ESTABLISHED; + *ctinfo = IP_CT_ESTABLISHED; } else if (h->ctrack->status & IPS_EXPECTED) { DEBUGP("ip_conntrack_in: related packet for %p\n", h->ctrack); - ctinfo = IP_CT_RELATED; + *ctinfo = IP_CT_RELATED; } else { DEBUGP("ip_conntrack_in: new packet for %p\n", h->ctrack); - ctinfo = IP_CT_NEW; + *ctinfo = IP_CT_NEW; } } - skb->nfct = &h->ctrack->infos[ctinfo]; + skb->nfct = &h->ctrack->infos[*ctinfo]; + return h->ctrack; } /* Return conntrack and conntrack_info a given skb */ -static struct ip_conntrack * -__ip_conntrack_get(struct sk_buff *skb, - enum ip_conntrack_info *ctinfo, - int create) +inline struct ip_conntrack * +ip_conntrack_get(struct sk_buff *skb, enum ip_conntrack_info *ctinfo) { - if (!skb->nfct) { - /* It may be an icmp error... */ - if (!icmp_error_track(skb)) - resolve_normal_ct(skb, create); - } - if (skb->nfct) { struct ip_conntrack *ct = (struct ip_conntrack *)skb->nfct->master; @@ -493,11 +591,6 @@ return NULL; } -struct ip_conntrack * -ip_conntrack_get(struct sk_buff *skb, enum ip_conntrack_info *ctinfo) -{ - return __ip_conntrack_get(skb, ctinfo, 0); -} /* Netfilter hook itself. */ unsigned int ip_conntrack_in(unsigned int hooknum, @@ -526,15 +619,19 @@ return NF_STOLEN; } - ct = __ip_conntrack_get(*pskb, &ctinfo, 1); - if (!ct) { - /* Not valid part of a connection */ - return NF_ACCEPT; + proto = find_proto((*pskb)->nh.iph->protocol); + + /* It may be an icmp error... */ + if ((*pskb)->nh.iph->protocol != IPPROTO_ICMP + || !(ct = icmp_error_track(*pskb, &ctinfo))) { + if (!(ct = resolve_normal_ct(*pskb, proto, &ctinfo))) { + /* Not valid part of a connection */ + return NF_ACCEPT; + } } + IP_NF_ASSERT((*pskb)->nfct); - proto = find_proto((*pskb)->nh.iph->protocol); ret = proto->packet(ct, (*pskb)->nh.iph, (*pskb)->len, ctinfo); - if (ret == -1) { /* Invalid */ nf_conntrack_put((*pskb)->nfct); @@ -665,10 +762,15 @@ IP_NF_ASSERT(ct->timeout.data == (unsigned long)ct); WRITE_LOCK(&ip_conntrack_lock); - /* Need del_timer for race avoidance (may already be dying). */ - if (del_timer(&ct->timeout)) { - ct->timeout.expires = jiffies + extra_jiffies; - add_timer(&ct->timeout); + /* Timer may not be active yet */ + if (!(ct->status & IPS_CONFIRMED)) + ct->timeout.expires = extra_jiffies; + else { + /* Need del_timer for race avoidance (may already be dying). */ + if (del_timer(&ct->timeout)) { + ct->timeout.expires = jiffies + extra_jiffies; + add_timer(&ct->timeout); + } } WRITE_UNLOCK(&ip_conntrack_lock); } @@ -740,6 +842,17 @@ /* Time to push up daises... */ if (del_timer(&h->ctrack->timeout)) death_by_timeout((unsigned long)h->ctrack); + else if (!(h->ctrack->status & IPS_CONFIRMED)) { + /* Unconfirmed connection. Clean from lists, + mark confirmed so it gets cleaned as soon + as packet comes back. */ + WRITE_LOCK(&ip_conntrack_lock); + if (!(h->ctrack->status & IPS_CONFIRMED)) { + clean_from_lists(h->ctrack); + h->ctrack->status |= IPS_CONFIRMED; + } + WRITE_UNLOCK(&ip_conntrack_lock); + } /* ... else the timer will get him soon. */ ip_conntrack_put(h->ctrack); diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_conntrack_proto_tcp.c working/net/ipv4/netfilter/ip_conntrack_proto_tcp.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_conntrack_proto_tcp.c Fri Apr 14 17:41:01 2000 +++ working/net/ipv4/netfilter/ip_conntrack_proto_tcp.c Sat Apr 22 16:38:50 2000 @@ -23,6 +23,10 @@ /* FIXME: Examine ipfilter's timeouts and conntrack transitions more closely. They're more complex. --RR */ +/* We steal a bit to indicate no reply yet (can't use status, because + it's set before we get into packet handling). */ +#define TCP_REPLY_BIT 0x1000 + /* Actually, I believe that neither ipmasq (where this code is stolen from) nor ipfilter do it exactly right. A new conntrack machine taking into account packet loss (which creates uncertainty as to exactly @@ -141,7 +145,7 @@ enum tcp_conntrack state; READ_LOCK(&tcp_lock); - state = conntrack->proto.tcp_state; + state = (conntrack->proto.tcp_state & ~TCP_REPLY_BIT); READ_UNLOCK(&tcp_lock); return sprintf(buffer, "%s ", tcp_conntrack_names[state]); @@ -161,7 +165,7 @@ struct iphdr *iph, size_t len, enum ip_conntrack_info ctinfo) { - enum tcp_conntrack newconntrack; + enum tcp_conntrack newconntrack, oldtcpstate; struct tcphdr *tcph = (struct tcphdr *)((u_int32_t *)iph + iph->ihl); /* We're guaranteed to have the base header, but maybe not the @@ -172,10 +176,11 @@ } WRITE_LOCK(&tcp_lock); + oldtcpstate = conntrack->proto.tcp_state; newconntrack = tcp_conntracks [CTINFO2DIR(ctinfo)] - [get_conntrack_index(tcph)][conntrack->proto.tcp_state]; + [get_conntrack_index(tcph)][oldtcpstate & ~TCP_REPLY_BIT]; /* Invalid */ if (newconntrack == TCP_CONNTRACK_MAX) { @@ -187,9 +192,22 @@ } conntrack->proto.tcp_state = newconntrack; + if ((oldtcpstate & TCP_REPLY_BIT) + || ctinfo >= IP_CT_IS_REPLY) + conntrack->proto.tcp_state |= TCP_REPLY_BIT; + WRITE_UNLOCK(&tcp_lock); - ip_ct_refresh(conntrack, tcp_timeouts[conntrack->proto.tcp_state]); + /* If only reply is a RST, we can consider ourselves not to + have an established connection: this is a fairly common + problem case, so we can delete the conntrack + immediately. --RR */ + if (!(oldtcpstate & TCP_REPLY_BIT) && tcph->rst) { + if (del_timer(&conntrack->timeout)) + conntrack->timeout.function((unsigned long)conntrack); + } else + ip_ct_refresh(conntrack, tcp_timeouts[newconntrack]); + return NF_ACCEPT; } diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_conntrack_standalone.c working/net/ipv4/netfilter/ip_conntrack_standalone.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_conntrack_standalone.c Fri Apr 14 17:41:01 2000 +++ working/net/ipv4/netfilter/ip_conntrack_standalone.c Sat Apr 22 16:18:19 2000 @@ -86,6 +86,12 @@ len += print_tuple(buffer + len, &conntrack->tuplehash[IP_CT_DIR_REPLY].tuple, proto); +#if 0 + if (!(conntrack->status & IPS_CONFIRMED)) + len += sprintf(buffer + len, "[UNCONFIRMED] "); + len += sprintf(buffer + len, "use=%u ", + atomic_read(&conntrack->ct_general.use)); +#endif len += sprintf(buffer + len, "\n"); return len; @@ -157,6 +163,22 @@ return len; } +static unsigned int ip_confirm(unsigned int hooknum, + struct sk_buff **pskb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + /* We've seen it coming out the other side: confirm */ + if ((*pskb)->nfct) { + struct ip_conntrack *ct + = (struct ip_conntrack *)(*pskb)->nfct->master; + if (!(ct->status & IPS_CONFIRMED)) + ip_conntrack_confirm(ct); + } + return NF_ACCEPT; +} + static unsigned int ip_refrag(unsigned int hooknum, struct sk_buff **pskb, const struct net_device *in, @@ -165,6 +187,14 @@ { struct rtable *rt = (struct rtable *)(*pskb)->dst; + /* We've seen it coming out the other side: confirm */ + if ((*pskb)->nfct) { + struct ip_conntrack *ct + = (struct ip_conntrack *)(*pskb)->nfct->master; + if (!(ct->status & IPS_CONFIRMED)) + ip_conntrack_confirm(ct); + } + /* Local packets are never produced too large for their interface. We degfragment them at LOCAL_OUT, however, so we have to refragment them here. */ @@ -203,6 +233,8 @@ /* Refragmenter; last chance. */ static struct nf_hook_ops ip_conntrack_out_ops = { { NULL, NULL }, ip_refrag, PF_INET, NF_IP_POST_ROUTING, NF_IP_PRI_LAST }; +static struct nf_hook_ops ip_conntrack_local_in_ops += { { NULL, NULL }, ip_confirm, PF_INET, NF_IP_LOCAL_IN, NF_IP_PRI_LAST-1 }; static int init_or_cleanup(int init) { @@ -230,10 +262,17 @@ printk("ip_conntrack: can't register post-routing hook.\n"); goto cleanup_inandlocalops; } + ret = nf_register_hook(&ip_conntrack_local_in_ops); + if (ret < 0) { + printk("ip_conntrack: can't register local in hook.\n"); + goto cleanup_inoutandlocalops; + } return ret; cleanup: + nf_unregister_hook(&ip_conntrack_local_in_ops); + cleanup_inoutandlocalops: nf_unregister_hook(&ip_conntrack_out_ops); cleanup_inandlocalops: nf_unregister_hook(&ip_conntrack_local_out_ops); diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_fw_compat.c working/net/ipv4/netfilter/ip_fw_compat.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_fw_compat.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ip_fw_compat.c Mon Apr 24 00:20:54 2000 @@ -13,6 +13,7 @@ #include #include #include +#include static struct firewall_ops *fwops; @@ -60,6 +61,18 @@ return 0; } +static inline void +confirm_connection(struct sk_buff *skb) +{ + if (skb->nfct) { + struct ip_conntrack *ct + = (struct ip_conntrack *)skb->nfct->master; + + if (!(ct->status & IPS_CONFIRMED)) + ip_conntrack_confirm(ct); + } +} + static unsigned int fw_in(unsigned int hooknum, struct sk_buff **pskb, @@ -105,10 +118,14 @@ ret = fwops->fw_output(fwops, PF_INET, (struct net_device *)out, (*pskb)->nh.raw, &redirpt, pskb); - if (fwops->fw_acct_out && (ret == FW_ACCEPT || ret == FW_SKIP)) - fwops->fw_acct_out(fwops, PF_INET, - (struct net_device *)in, - (*pskb)->nh.raw, &redirpt, pskb); + if (ret == FW_ACCEPT || ret == FW_SKIP) { + if (fwops->fw_acct_out) + fwops->fw_acct_out(fwops, PF_INET, + (struct net_device *)in, + (*pskb)->nh.raw, &redirpt, + pskb); + confirm_connection(*pskb); + } break; } @@ -155,6 +172,16 @@ } } +static unsigned int fw_confirm(unsigned int hooknum, + struct sk_buff **pskb, + const struct net_device *in, + const struct net_device *out, + int (*okfn)(struct sk_buff *)) +{ + confirm_connection(*pskb); + return NF_ACCEPT; +} + extern int ip_fw_ctl(int optval, void *user, unsigned int len); static int sock_fn(struct sock *sk, int optval, void *user, unsigned int len) @@ -174,6 +201,9 @@ static struct nf_hook_ops forward_ops = { { NULL, NULL }, fw_in, PF_INET, NF_IP_FORWARD, NF_IP_PRI_FILTER }; +static struct nf_hook_ops local_in_ops += { { NULL, NULL }, fw_confirm, PF_INET, NF_IP_LOCAL_IN, NF_IP_PRI_LAST - 1 }; + static struct nf_sockopt_ops sock_ops = { { NULL, NULL }, PF_INET, 64, 64 + 1024 + 1, &sock_fn, 0, 0, NULL, 0, NULL }; @@ -202,6 +232,7 @@ nf_register_hook(&preroute_ops); nf_register_hook(&postroute_ops); nf_register_hook(&forward_ops); + nf_register_hook(&local_in_ops); return ret; @@ -209,6 +240,7 @@ nf_unregister_hook(&preroute_ops); nf_unregister_hook(&postroute_ops); nf_unregister_hook(&forward_ops); + nf_unregister_hook(&local_in_ops); masq_cleanup(); diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_fw_compat_masq.c working/net/ipv4/netfilter/ip_fw_compat_masq.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_fw_compat_masq.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ip_fw_compat_masq.c Thu Apr 20 13:04:24 2000 @@ -103,6 +103,7 @@ struct ip_conntrack_protocol *protocol; struct ip_conntrack_tuple_hash *h; enum ip_conntrack_info ctinfo; + struct ip_conntrack *ct; int ret; protocol = find_proto(iph->protocol); @@ -113,31 +114,18 @@ switch (iph->protocol) { case IPPROTO_ICMP: /* ICMP errors. */ - if (icmp_error_track(*pskb)) { - /* If it is valid, tranlsate it */ - if ((*pskb)->nfct) { - struct ip_conntrack *ct - = (struct ip_conntrack *) - (*pskb)->nfct->master; - enum ip_conntrack_dir dir; - - if ((*pskb)->nfct-ct->infos >= IP_CT_IS_REPLY) - dir = IP_CT_DIR_REPLY; - else - dir = IP_CT_DIR_ORIGINAL; - - icmp_reply_translation(*pskb, - ct, - NF_IP_PRE_ROUTING, - dir); - } + if ((ct = icmp_error_track(*pskb, &ctinfo))) { + icmp_reply_translation(*pskb, ct, + NF_IP_PRE_ROUTING, + CTINFO2DIR(ctinfo)); return NF_ACCEPT; } /* Fall thru... */ case IPPROTO_TCP: case IPPROTO_UDP: if (!get_tuple(iph, (*pskb)->len, &tuple, protocol)) { - printk("ip_fw_compat_masq: Couldn't get tuple\n"); + if (net_ratelimit()) + printk("ip_fw_compat_masq: Can't get tuple\n"); return NF_ACCEPT; } break; @@ -166,8 +154,9 @@ NF_IP_PRE_ROUTING, pskb); } else - printk("ip_fw_compat_masq: conntrack" - " didn't like\n"); + if (net_ratelimit()) + printk("ip_fw_compat_masq: conntrack" + " didn't like\n"); } } else { if (h) -- Hacking time. From owner-netdev@oss.sgi.com Thu Apr 27 03:01:23 2000 Received: by oss.sgi.com id ; Thu, 27 Apr 2000 03:00:59 -0700 Received: from dialup-ad-12-77.camtech.net.au ([203.55.242.77]:7172 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 27 Apr 2000 03:00:43 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Thu, 27 Apr 2000 19:30:41 +0930 (CST) Message-Id: From: Rusty Russell To: torvalds@transmeta.com cc: netdev@oss.sgi.com, netfilter@lists.samba.org Subject: [PATCH] Destructor patch for iptables Date: Thu, 27 Apr 2000 19:30:35 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linus, please apply v2.3.99-pre6 Some people are writing funky iptables extensions which require destructors on rules. I didn't need them before, so didn't implement them before. Rusty. diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/include/linux/netfilter_ipv4/ip_tables.h working/include/linux/netfilter_ipv4/ip_tables.h --- linux-2.3.99-pre-6-2-rusty/include/linux/netfilter_ipv4/ip_tables.h Mon Apr 17 21:59:34 2000 +++ working/include/linux/netfilter_ipv4/ip_tables.h Wed Apr 19 15:28:49 2000 @@ -346,6 +346,9 @@ unsigned int matchinfosize, unsigned int hook_mask); + /* Called when entry of this type deleted. */ + void (*destroy)(void *matchinfo, unsigned int matchinfosize); + /* Set this to THIS_MODULE if you are a module, otherwise NULL */ struct module *me; }; @@ -374,6 +377,9 @@ void *targinfo, unsigned int targinfosize, unsigned int hook_mask); + + /* Called when entry of this type deleted. */ + void (*destroy)(void *targinfo, unsigned int targinfosize); /* Set this to THIS_MODULE if you are a module, otherwise NULL */ struct module *me; diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_tables.c working/net/ipv4/netfilter/ip_tables.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_tables.c Fri Apr 14 17:41:01 2000 +++ working/net/ipv4/netfilter/ip_tables.c Wed Apr 19 15:35:04 2000 @@ -589,6 +589,9 @@ if (i && (*i)-- == 0) return 1; + if (m->u.match->destroy) + m->u.match->destroy(m->data, m->match_size - sizeof(*m)); + if (m->u.match->me) __MOD_DEC_USE_COUNT(m->u.match->me); @@ -769,6 +772,8 @@ /* Cleanup all matches */ IPT_MATCH_ITERATE(e, cleanup_match, NULL); t = ipt_get_target(e); + if (t->u.target->destroy) + t->u.target->destroy(t->data, t->target_size - sizeof(*t)); if (t->u.target->me) __MOD_DEC_USE_COUNT(t->u.target->me); diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_LOG.c working/net/ipv4/netfilter/ipt_LOG.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_LOG.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_LOG.c Wed Apr 19 19:35:25 2000 @@ -345,7 +345,8 @@ } static struct ipt_target ipt_log_reg -= { { NULL, NULL }, "LOG", ipt_log_target, ipt_log_checkentry, THIS_MODULE }; += { { NULL, NULL }, "LOG", ipt_log_target, ipt_log_checkentry, NULL, + THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_MARK.c working/net/ipv4/netfilter/ipt_MARK.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_MARK.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_MARK.c Wed Apr 19 19:35:28 2000 @@ -47,7 +47,7 @@ } static struct ipt_target ipt_mark_reg -= { { NULL, NULL }, "MARK", target, checkentry, THIS_MODULE }; += { { NULL, NULL }, "MARK", target, checkentry, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_MASQUERADE.c working/net/ipv4/netfilter/ipt_MASQUERADE.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_MASQUERADE.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_MASQUERADE.c Sun Apr 23 23:38:40 2000 @@ -142,7 +142,7 @@ }; static struct ipt_target masquerade -= { { NULL, NULL }, "MASQUERADE", masquerade_target, masquerade_check, += { { NULL, NULL }, "MASQUERADE", masquerade_target, masquerade_check, NULL, THIS_MODULE }; static int __init init(void) diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_MIRROR.c working/net/ipv4/netfilter/ipt_MIRROR.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_MIRROR.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_MIRROR.c Wed Apr 19 19:36:27 2000 @@ -113,7 +113,7 @@ } static struct ipt_target ipt_mirror_reg -= { { NULL, NULL }, "MIRROR", ipt_mirror_target, ipt_mirror_checkentry, += { { NULL, NULL }, "MIRROR", ipt_mirror_target, ipt_mirror_checkentry, NULL, THIS_MODULE }; static int __init init(void) diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_REDIRECT.c working/net/ipv4/netfilter/ipt_REDIRECT.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_REDIRECT.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_REDIRECT.c Wed Apr 19 19:35:44 2000 @@ -86,7 +86,8 @@ } static struct ipt_target redirect_reg -= { { NULL, NULL }, "REDIRECT", redirect_target, redirect_check, THIS_MODULE }; += { { NULL, NULL }, "REDIRECT", redirect_target, redirect_check, NULL, + THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_REJECT.c working/net/ipv4/netfilter/ipt_REJECT.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_REJECT.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_REJECT.c Wed Apr 19 19:35:49 2000 @@ -120,7 +120,7 @@ } static struct ipt_target ipt_reject_reg -= { { NULL, NULL }, "REJECT", reject, check, THIS_MODULE }; += { { NULL, NULL }, "REJECT", reject, check, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_TOS.c working/net/ipv4/netfilter/ipt_TOS.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_TOS.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_TOS.c Wed Apr 19 19:35:52 2000 @@ -66,7 +66,7 @@ } static struct ipt_target ipt_tos_reg -= { { NULL, NULL }, "TOS", target, checkentry, THIS_MODULE }; += { { NULL, NULL }, "TOS", target, checkentry, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_limit.c working/net/ipv4/netfilter/ipt_limit.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_limit.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_limit.c Wed Apr 19 18:13:04 2000 @@ -124,7 +124,7 @@ } static struct ipt_match ipt_limit_reg -= { { NULL, NULL }, "limit", ipt_limit_match, ipt_limit_checkentry, += { { NULL, NULL }, "limit", ipt_limit_match, ipt_limit_checkentry, NULL, THIS_MODULE }; static int __init init(void) diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_mac.c working/net/ipv4/netfilter/ipt_mac.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_mac.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_mac.c Wed Apr 19 19:35:56 2000 @@ -46,7 +46,7 @@ } static struct ipt_match mac_match -= { { NULL, NULL }, "mac", &match, &ipt_mac_checkentry, THIS_MODULE }; += { { NULL, NULL }, "mac", &match, &ipt_mac_checkentry, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_mark.c working/net/ipv4/netfilter/ipt_mark.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_mark.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_mark.c Wed Apr 19 19:35:59 2000 @@ -34,7 +34,7 @@ } static struct ipt_match mark_match -= { { NULL, NULL }, "mark", &match, &checkentry, THIS_MODULE }; += { { NULL, NULL }, "mark", &match, &checkentry, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_multiport.c working/net/ipv4/netfilter/ipt_multiport.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_multiport.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_multiport.c Wed Apr 19 19:36:02 2000 @@ -84,7 +84,7 @@ } static struct ipt_match multiport_match -= { { NULL, NULL }, "multiport", &match, &checkentry, THIS_MODULE }; += { { NULL, NULL }, "multiport", &match, &checkentry, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_owner.c working/net/ipv4/netfilter/ipt_owner.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_owner.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_owner.c Wed Apr 19 19:36:05 2000 @@ -118,7 +118,7 @@ } static struct ipt_match owner_match -= { { NULL, NULL }, "owner", &match, &checkentry, THIS_MODULE }; += { { NULL, NULL }, "owner", &match, &checkentry, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_state.c working/net/ipv4/netfilter/ipt_state.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_state.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_state.c Wed Apr 19 19:36:08 2000 @@ -42,7 +42,7 @@ } static struct ipt_match state_match -= { { NULL, NULL }, "state", &match, &check, THIS_MODULE }; += { { NULL, NULL }, "state", &match, &check, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_tos.c working/net/ipv4/netfilter/ipt_tos.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_tos.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_tos.c Wed Apr 19 19:36:10 2000 @@ -35,7 +35,7 @@ } static struct ipt_match tos_match -= { { NULL, NULL }, "tos", &match, &checkentry, THIS_MODULE }; += { { NULL, NULL }, "tos", &match, &checkentry, NULL, THIS_MODULE }; static int __init init(void) { diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_unclean.c working/net/ipv4/netfilter/ipt_unclean.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_unclean.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_unclean.c Wed Apr 19 19:36:14 2000 @@ -558,7 +558,7 @@ } static struct ipt_match unclean_match -= { { NULL, NULL }, "unclean", &match, &checkentry, THIS_MODULE }; += { { NULL, NULL }, "unclean", &match, &checkentry, NULL, THIS_MODULE }; static int __init init(void) { -- Hacking time. From owner-netdev@oss.sgi.com Thu Apr 27 03:01:24 2000 Received: by oss.sgi.com id ; Thu, 27 Apr 2000 03:00:59 -0700 Received: from dialup-ad-12-77.camtech.net.au ([203.55.242.77]:5892 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 27 Apr 2000 03:00:38 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Thu, 27 Apr 2000 19:30:32 +0930 (CST) Message-Id: From: Rusty Russell To: torvalds@transmeta.com cc: netdev@oss.sgi.com, havanna_moon@gmx.net, netfilter@lists.samba.org Subject: [PATCH] Memory leak in iptables Date: Thu, 27 Apr 2000 19:30:28 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linus, please apply v2.3.99-pre6. Kudos to Yon Uriarte for finding it. Rusty. diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_tables.c working/net/ipv4/netfilter/ip_tables.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_tables.c Fri Apr 14 17:41:01 2000 +++ working/net/ipv4/netfilter/ip_tables.c Wed Apr 19 15:35:04 2000 @@ -1094,7 +1099,7 @@ /* Silent error: too late now. */ copy_to_user(tmp.counters, counters, sizeof(struct ipt_counters) * tmp.num_counters); - + vfree(counters); up(&ipt_mutex); return 0; -- Hacking time. From owner-netdev@oss.sgi.com Thu Apr 27 03:01:39 2000 Received: by oss.sgi.com id ; Thu, 27 Apr 2000 03:01:11 -0700 Received: from dialup-ad-12-77.camtech.net.au ([203.55.242.77]:8708 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 27 Apr 2000 03:00:52 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Thu, 27 Apr 2000 19:30:41 +0930 (CST) Message-Id: From: Rusty Russell To: torvalds@transmeta.com cc: netdev@oss.sgi.com, netfilter@lists.samba.org Subject: [PATCH] ip_conntrack.o module removal fix. Date: Thu, 27 Apr 2000 19:30:37 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linus, please apply v2.3.99-pre6. When a packet is queued for userspace with a reference to an existing ip_conntrack, and someone tries to remove the module, we have to wait for the skb to be be cleaned. We don't want to use module counts here to prevent removal of the module, as that would put control of module removal in the hands of the network traffic, not the box administrator. Rusty. diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_conntrack_core.c working/net/ipv4/netfilter/ip_conntrack_core.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ip_conntrack_core.c Fri Apr 14 17:41:01 2000 +++ working/net/ipv4/netfilter/ip_conntrack_core.c Sun Apr 23 22:59:02 2000 @@ -836,7 +950,14 @@ #ifdef CONFIG_SYSCTL unregister_sysctl_table(ip_conntrack_sysctl_header); #endif + + i_see_dead_people: ip_ct_selective_cleanup(kill_all, NULL); + if (atomic_read(&ip_conntrack_count) != 0) { + schedule(); + goto i_see_dead_people; + } + kmem_cache_destroy(ip_conntrack_cachep); vfree(ip_conntrack_hash); nf_unregister_sockopt(&so_getorigdst); -- Hacking time. From owner-netdev@oss.sgi.com Thu Apr 27 03:01:39 2000 Received: by oss.sgi.com id ; Thu, 27 Apr 2000 03:01:10 -0700 Received: from dialup-ad-12-77.camtech.net.au ([203.55.242.77]:9732 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 27 Apr 2000 03:00:56 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Thu, 27 Apr 2000 19:30:41 +0930 (CST) Message-Id: From: Rusty Russell To: torvalds@transmeta.com cc: netdev@oss.sgi.com, netfilter@lists.samba.org Subject: [PATCH] REDIRECT and MASQUERADE port-range fix Date: Thu, 27 Apr 2000 19:30:40 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linus, please apply v2.3.99-pre6. Thinko + cut&paste + incomplete testsuite == stupid bug. Rusty. diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_MASQUERADE.c working/net/ipv4/netfilter/ipt_MASQUERADE.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_MASQUERADE.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_MASQUERADE.c Sun Apr 23 23:38:40 2000 @@ -60,7 +60,7 @@ { struct ip_conntrack *ct; enum ip_conntrack_info ctinfo; - const struct ip_nat_range *r; + const struct ip_nat_multi_range *mr; struct ip_nat_multi_range newrange; u_int32_t newsrc; struct rtable *rt; @@ -76,7 +76,7 @@ IP_NF_ASSERT(ct && (ctinfo == IP_CT_NEW || ctinfo == IP_CT_RELATED)); - r = targinfo; + mr = targinfo; if (ip_route_output(&rt, (*pskb)->nh.iph->daddr, 0, @@ -97,9 +97,9 @@ /* Transfer from original range. */ newrange = ((struct ip_nat_multi_range) - { 1, { { r->flags | IP_NAT_RANGE_MAP_IPS, + { 1, { { mr->range[0].flags | IP_NAT_RANGE_MAP_IPS, newsrc, newsrc, - r->min, r->max } } }); + mr->range[0].min, mr->range[0].max } } }); /* Hand modified range to generic setup. */ return ip_nat_setup_info(ct, &newrange, hooknum); diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_REDIRECT.c working/net/ipv4/netfilter/ipt_REDIRECT.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/ipt_REDIRECT.c Wed Apr 12 17:13:07 2000 +++ working/net/ipv4/netfilter/ipt_REDIRECT.c Wed Apr 19 19:35:44 2000 @@ -58,7 +58,7 @@ struct ip_conntrack *ct; enum ip_conntrack_info ctinfo; u_int32_t newdst; - const struct ip_nat_range *r = targinfo; + const struct ip_nat_multi_range *mr = targinfo; struct ip_nat_multi_range newrange; IP_NF_ASSERT(hooknum == NF_IP_PRE_ROUTING @@ -77,9 +77,9 @@ /* Transfer from original range. */ newrange = ((struct ip_nat_multi_range) - { 1, { { r->flags | IP_NAT_RANGE_MAP_IPS, + { 1, { { mr->range[0].flags | IP_NAT_RANGE_MAP_IPS, newdst, newdst, - r->min, r->max } } }); + mr->range[0].min, mr->range[0].max } } }); /* Hand modified range to generic setup. */ return ip_nat_setup_info(ct, &newrange, hooknum); -- Hacking time. From owner-netdev@oss.sgi.com Thu Apr 27 03:01:39 2000 Received: by oss.sgi.com id ; Thu, 27 Apr 2000 03:01:13 -0700 Received: from dialup-ad-12-77.camtech.net.au ([203.55.242.77]:10500 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 27 Apr 2000 03:01:02 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Thu, 27 Apr 2000 19:30:55 +0930 (CST) Message-Id: From: Rusty Russell To: torvalds@transmeta.com cc: netdev@oss.sgi.com, netfilter@lists.samba.org Subject: [PATCH] iptables filter: FORWARD default change! Date: Thu, 27 Apr 2000 19:30:49 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Linus, please apply v2.3.99-pre6. This finally alters the FORWARD policy to ACCEPT (you can override it if iptable_filter is a module with `forward=0'). People have /proc/sys/net/ipv4/ip_forward to control forwarding, and this extra trickiness just frustrated and confused people. Rusty. diff -urN --minimal --exclude *.lds --exclude *.ps --exclude *.pdf --exclude *.sgml --exclude *.tex --exclude *.aux --exclude *.log --exclude classlist.h --exclude devlist.h --exclude autoconf.h --exclude compile.h --exclude version.h --exclude .* --exclude *.[oa] --exclude *.orig --exclude config --exclude asm --exclude modules --exclude *.[Ss] --exclude System.map --exclude consolemap_deftbl.c --exclude *~ --exclude TAGS --exclude tags --exclude modversions.h --exclude install-kernel linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/iptable_filter.c working/net/ipv4/netfilter/iptable_filter.c --- linux-2.3.99-pre-6-2-rusty/net/ipv4/netfilter/iptable_filter.c Wed Apr 5 18:44:00 2000 +++ working/net/ipv4/netfilter/iptable_filter.c Thu Apr 27 11:48:47 2000 @@ -121,8 +122,8 @@ NF_IP_PRI_FILTER } }; -/* Default to no forward for security reasons. */ -static int forward = NF_DROP; +/* Default to forward because I got too much mail already. */ +static int forward = NF_ACCEPT; MODULE_PARM(forward, "i"); static int __init init(void) -- Hacking time. From owner-netdev@oss.sgi.com Thu Apr 27 06:45:39 2000 Received: by oss.sgi.com id ; Thu, 27 Apr 2000 06:45:29 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:10670 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Thu, 27 Apr 2000 06:45:09 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id JAA13682; Thu, 27 Apr 2000 09:44:45 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id JAA27063; Thu, 27 Apr 2000 09:44:42 -0400 (EDT) Date: Thu, 27 Apr 2000 09:44:42 -0400 (EDT) From: jamal To: Rusty Russell cc: torvalds@transmeta.com, netdev@oss.sgi.com, netfilter@lists.samba.org Subject: Re: [PATCH] Increased DoS protection. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 27 Apr 2000, Rusty Russell wrote: > 3) Do randomish/LRU drop on unreplied connections when we're > under stress. > My 2 cents CDN: I just gleaned at the code and must have missed the "randomness" in the drops. Infact i have never had the chance to look at netfilter (one of these days), so pardon my ignorance: Would you please explain your algorithm (in english)? Could unreplied connections also be in the (TCP) established state as well? In which case i think LRU is wrong unless your aging timer is somewhow associated with the connections RTT ( i suspect it is, but just in case). Think of high latency links (like most wireless or satellite, or even modems). Latency gets worse under duress. You need to favor already established connections more. cheers, jamal From owner-netdev@oss.sgi.com Thu Apr 27 19:09:56 2000 Received: by oss.sgi.com id ; Thu, 27 Apr 2000 19:09:36 -0700 Received: from dialup-ad-16-15.camtech.net.au ([203.55.241.15]:40452 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Thu, 27 Apr 2000 19:09:08 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Fri, 28 Apr 2000 11:38:35 +0930 (CST) Message-Id: From: Rusty Russell To: jamal cc: netdev@oss.sgi.com, netfilter@lists.samba.org Subject: Re: [PATCH] Increased DoS protection. In-reply-to: Your message of "Thu, 27 Apr 2000 09:44:42 -0400." Date: Fri, 28 Apr 2000 11:38:28 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In message you writ e: > > > On Thu, 27 Apr 2000, Rusty Russell wrote: > > > 3) Do randomish/LRU drop on unreplied connections when we're > > under stress. > > > > My 2 cents CDN: > I just gleaned at the code and must have missed the "randomness" > in the drops. ... > Would you please explain your algorithm (in english)? This is why I said `randomish': I sweep through the hash table. If that hash chain doesn't have a likely candidate, I try the hash chain the new entry is trying to go into (in case they're trying to bomb one hash chain). I traverse the hash chain backwards (meaning oldest-creation first), looking for connections which have only had one-way traffic (this means no established TCP connections). This approximates LRU (only approximate because retransmits do not reorder the hash chain). I never touch connections with traffic both ways (TCP RST packets don't count: handled specially in the tcp protocol tracking). This leaves it vulnerable to SYN floods (as is the old masquerading code, so we didn't get *worse* here): long term I will implement window tracking as per ipfilter, and then I can be more confident that a real three-way handshake has occurred, and set a high-confidence bit for that connection. Hope that helps, Rusty. -- Hacking time. From owner-netdev@oss.sgi.com Fri Apr 28 00:50:27 2000 Received: by oss.sgi.com id ; Fri, 28 Apr 2000 00:50:17 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:5793 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Fri, 28 Apr 2000 00:49:55 -0700 Received: from fred.muc.de (none@ns1018.munich.netsurf.de [195.180.235.18]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id JAA06196; Fri, 28 Apr 2000 09:49:41 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12l5a2-0000EF-00; Fri, 28 Apr 2000 09:52:54 +0200 Date: Fri, 28 Apr 2000 09:52:54 +0200 From: Andi Kleen To: Rusty Russell Cc: jamal , netdev@oss.sgi.com, netfilter@lists.samba.org Subject: Re: [PATCH] Increased DoS protection. Message-ID: <20000428095254.A875@fred.muc.de> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: ; from Rusty Russell on Fri, Apr 28, 2000 at 04:10:55AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, Apr 28, 2000 at 04:10:55AM +0200, Rusty Russell wrote: > > This leaves it vulnerable to SYN floods (as is the old masquerading > code, so we didn't get *worse* here): long term I will implement > window tracking as per ipfilter, and then I can be more confident that > a real three-way handshake has occurred, and set a high-confidence bit > for that connection. It is still hard when you consider reboots. The 3way handshake is long gone. Simply checking for an ACK from inside is not enough, because TCP generally acks all out of window packets (so it would be easy to fool from an attacker who guesses ports) On other connections you'll only see legitimate ACKs from one end, so checking for more than just an ack doesn't work neither. How do you plan to handle that problem? Forget connections on reboot ? -Andi From owner-netdev@oss.sgi.com Fri Apr 28 05:45:30 2000 Received: by oss.sgi.com id ; Fri, 28 Apr 2000 05:45:10 -0700 Received: from dialup-ad-10-33.camtech.net.au ([203.28.1.161]:32517 "EHLO halfway.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Fri, 28 Apr 2000 05:44:57 -0700 Received: from linuxcare.com.au (really [127.0.0.1]) by linuxcare.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.102) for ; Fri, 28 Apr 2000 22:14:45 +0930 (CST) Message-Id: From: Rusty Russell To: Andi Kleen Cc: netdev@oss.sgi.com, netfilter@lists.samba.org Subject: Re: [PATCH] Increased DoS protection. In-reply-to: Your message of "Fri, 28 Apr 2000 09:52:54 +0200." <20000428095254.A875@fred.muc.de> Date: Fri, 28 Apr 2000 22:14:43 +0930 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In message <20000428095254.A875@fred.muc.de> you write: > On Fri, Apr 28, 2000 at 04:10:55AM +0200, Rusty Russell wrote: > > window tracking as per ipfilter, and then I can be more confident that > > a real three-way handshake has occurred, and set a high-confidence bit > > for that connection. > > It is still hard when you consider reboots. The 3way handshake is long gone. > Simply checking for an ACK from inside is not enough, because TCP generally > acks all out of window packets (so it would be easy to fool from an attacker > who guesses ports) On other connections you'll only see legitimate ACKs > from one end, so checking for more than just an ack doesn't work neither. > How do you plan to handle that problem? I could stop reading mail from you, so I remain ignorant? 8) You *could* figure out retroactively that the prior packet was out-of-window (handwave). But it's probably easier to live with the fact that connections tracked across reboots won't have the `DONT_KILL_ME_IM_A_GENUINE_CONNECTION' bit set, meaning they'll be the first up against the wall if we're under stress. No connection tracking will be perfect. No NAPT will be perfect, either. Both are protocol perversions. Rusty. -- Hacking time. From owner-netdev@oss.sgi.com Fri Apr 28 06:27:43 2000 Received: by oss.sgi.com id ; Fri, 28 Apr 2000 06:27:34 -0700 Received: from widukind.bi.teuto.net ([212.8.197.28]:19719 "EHLO widukind.bi.teuto.net") by oss.sgi.com with ESMTP id ; Fri, 28 Apr 2000 06:27:22 -0700 Received: from hermes.marowsky-bree.de (1.0.0.224.in-addr.de [212.8.197.178]) by widukind.bi.teuto.net (8.9.3/8.9.3) with ESMTP id PAA25254; Fri, 28 Apr 2000 15:26:43 +0200 Received: by hermes.marowsky-bree.de (Postfix, from userid 500) id 1FFE24D037; Fri, 28 Apr 2000 17:29:26 +0200 (CEST) Date: Fri, 28 Apr 2000 17:29:26 +0200 From: Lars Marowsky-Bree To: Rusty Russell Cc: Andi Kleen , netdev@oss.sgi.com, netfilter@lists.samba.org Subject: Re: [PATCH] Increased DoS protection. Message-ID: <20000428172926.J3734@marowsky-bree.de> References: <20000428095254.A875@fred.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0i In-Reply-To: ; from "Rusty Russell" on 2000-04-28T22:14:43 X-Ctuhulu: HASTUR Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On 2000-04-28T22:14:43, Rusty Russell said: > You *could* figure out retroactively that the prior packet was > out-of-window (handwave). But it's probably easier to live with the > fact that connections tracked across reboots won't have the > `DONT_KILL_ME_IM_A_GENUINE_CONNECTION' bit set, meaning they'll be > the first up against the wall if we're under stress. It appears perfectly reasonable to me that stateful connection tracking may lose connections over a reboot. Yes, this is inflicting pain on the user, but on the other hand, it is supposed to be a firewall which is blocking what isn't allowed... If you don't want that, don't use stateful filtering. Sincerely, Lars Marowsky-Brée Development HA -- Perfection is our goal, excellence will be tolerated. -- J. Yahl From owner-netdev@oss.sgi.com Fri Apr 28 15:19:18 2000 Received: by oss.sgi.com id ; Fri, 28 Apr 2000 15:19:08 -0700 Received: from ren.mcnc.org ([152.45.4.110]:30983 "EHLO ren.mcnc.org") by oss.sgi.com with ESMTP id ; Fri, 28 Apr 2000 15:18:53 -0700 Received: from eos.ncsu.edu (localhost.localdomain [127.0.0.1]) by ren.mcnc.org (8.9.3/8.9.3) with ESMTP id SAA19270 for ; Fri, 28 Apr 2000 18:18:51 -0400 Message-ID: <390A0E4A.236E4FBB@eos.ncsu.edu> Date: Fri, 28 Apr 2000 18:18:50 -0400 From: Liang Han X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Linux2.2.12 and 2.0.36 TCP differences? Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I have written a program to send ICMP redirect message to a linux2.2.12 host to change its route to a destination. It works for both UDP and ICMP packets- I tested it using traceroute and tftp and both traffics went through the new route. But for TCP connections, such as Telnet and Ftp, even if I started them after sending out redirect message, them still stick to the default route. I also tested it on a Linux2.0.36 host. The TCP packets went through the new route as they are supposed to be. So I have 2 questions: 1. The routing should be resolved at IP level. My experience on linux2.2.12 seems conflict with the principle. So what's the reason. 2. Since Linux2.0.36 works fine. I wonder what's the difference between the two versions on routing implementation. I can not find any docs on the issue. So I raised my question here before I delve into the codes. Thanks a lot! Liang From owner-netdev@oss.sgi.com Fri Apr 28 16:49:58 2000 Received: by oss.sgi.com id ; Fri, 28 Apr 2000 16:49:49 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:9903 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Fri, 28 Apr 2000 16:49:25 -0700 Received: from fred.muc.de (none@ns1007.munich.netsurf.de [195.180.235.7]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id BAA15401; Sat, 29 Apr 2000 01:49:19 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12lK7L-0000Yc-00; Sat, 29 Apr 2000 01:24:15 +0200 Date: Sat, 29 Apr 2000 01:24:15 +0200 From: Andi Kleen To: Liang Han Cc: netdev@oss.sgi.com Subject: Re: Linux2.2.12 and 2.0.36 TCP differences? Message-ID: <20000429012414.A2137@fred.muc.de> References: <390A0E4A.236E4FBB@eos.ncsu.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <390A0E4A.236E4FBB@eos.ncsu.edu>; from Liang Han on Sat, Apr 29, 2000 at 12:20:23AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, Apr 29, 2000 at 12:20:23AM +0200, Liang Han wrote: > 1. The routing should be resolved at IP level. My experience on > linux2.2.12 seems conflict with the principle. So what's the reason. Most likely the TOS setting in your redirects. The TOS in the Ip Header in the redirect has to match the TOS of the TCP connection. ftp and telnet both set TOS. Linux 2.2 has per TOS routing, so it uses TOS dependent routes. Checking TOS for redirects is arguably very pedantic, but it is like it is. > 2. Since Linux2.0.36 works fine. I wonder what's the difference > between the two versions on routing implementation. Linux 2.2 has a near completely rewriten network stack compared to 2.0 (including a very sophisticated policy routing engine) -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Sat Apr 29 10:01:17 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 10:00:57 -0700 Received: from lrcsun15.epfl.ch ([128.178.156.77]:44704 "EHLO lrcsun15.epfl.ch") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 10:00:36 -0700 Received: (from almesber@localhost) by lrcsun15.epfl.ch (8.8.X/EPFL-8.1a) id TAA03871 for netdev@oss.sgi.com; Sat, 29 Apr 2000 19:00:38 +0200 (MET DST) From: Werner Almesberger Message-Id: <200004291700.TAA03871@lrcsun15.epfl.ch> Subject: neighbour cache vs. invalid addresses To: netdev@oss.sgi.com Date: Sat, 29 Apr 2000 19:00:37 +0200 (MET DST) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In non-broadcast multiple-access networks (NBMA) such as Classical IP over ATM (CLIP), neither broadcast nor multicast have any useful semantics. Right now, I catch this in neigh_table->constructor and return -EINVAL. Is this the right approach ? Or should I return success, accept the bogus neighbour entry (could this upset the neighbour cache ?), and blackhole the entire mess afterwards via neigh->ops and neigh->output ? (Background: some applications seem to insist on sending broadcast or multicast even on interfaces that have neither IFF_BROADCAST nor IFF_MULTICAST set. According to people who have such applications, the current approach makes the stack believe that there is a memory shortage, and shrink the neighbour cache, which is undesirable. Furthermore, the offending packets get killed before they show up on tcpdump, which makes it harder to debug the "network" problem.) - Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH werner.almesberger@ica.epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Sat Apr 29 11:13:07 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 11:12:57 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:47113 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 29 Apr 2000 11:12:40 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA06930; Sat, 29 Apr 2000 22:12:25 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004291812.WAA06930@ms2.inr.ac.ru> Subject: Re: neighbour cache vs. invalid addresses To: almesber@lrc.epfl.CH (Werner Almesberger) Date: Sat, 29 Apr 2000 22:12:25 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <200004291700.TAA03871@lrcsun15.epfl.ch> from "Werner Almesberger" at Apr 29, 0 09:13:11 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1748 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > In non-broadcast multiple-access networks (NBMA) such as Classical IP > over ATM (CLIP), neither broadcast nor multicast have any useful > semantics. Right now, I catch this in neigh_table->constructor and > return -EINVAL. > > Is this the right approach ? Yes, it is right. In theory. > Or should I return success, accept the > bogus neighbour entry (could this upset the neighbour cache ?), and > blackhole the entire mess afterwards via neigh->ops and > neigh->output ? And this is always right. > (Background: some applications seem to insist on sending broadcast > or multicast even on interfaces that have neither IFF_BROADCAST nor > IFF_MULTICAST set. They are right. These flags do not mean, that such packets are indeliverable. It is pure advise, and applications taking them seriously are buggy as rule. Actually, even CLIP could have some "broadcast router" on subnet without any modifications to protocol. > According to people who have such applications, > the current approach makes the stack believe that there is a memory > shortage, and shrink the neighbour cache, which is undesirable. It is news for me. Honestly. They almost do not lie, only neighbour cache is happy, it is IP tries to help it shrinking routing cache aggressively. It is bug. > Furthermore, the offending packets get killed before they show up > on tcpdump, which makes it harder to debug the "network" problem.) 8) Werner, do this in the way, convenient for you. I would prefer to see only _real_ packets with tcpdump. Tcpdump is supposed to tap device yet, rather than bogus packets wandering inside the stack. But taking into account bug, described above, you simply have no choice but to follow your preferred way. 8)8) Alexey From owner-netdev@oss.sgi.com Sat Apr 29 11:29:37 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 11:29:27 -0700 Received: from lrcsun15.epfl.ch ([128.178.156.77]:11197 "EHLO lrcsun15.epfl.ch") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 11:29:12 -0700 Received: (from almesber@localhost) by lrcsun15.epfl.ch (8.8.X/EPFL-8.1a) id UAA08071; Sat, 29 Apr 2000 20:29:14 +0200 (MET DST) From: Werner Almesberger Message-Id: <200004291829.UAA08071@lrcsun15.epfl.ch> Subject: Re: neighbour cache vs. invalid addresses To: kuznet@ms2.inr.ac.ru Date: Sat, 29 Apr 2000 20:29:13 +0200 (MET DST) Cc: netdev@oss.sgi.com In-Reply-To: <200004291812.WAA06930@ms2.inr.ac.ru> from "kuznet@ms2.inr.ac.ru" at Apr 29, 2000 10:12:25 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > They are right. These flags do not mean, that such packets are > indeliverable. It is pure advise, and applications taking them > seriously are buggy as rule. Not nice for NBMA ... :-( > Actually, even CLIP could have some "broadcast router" on subnet without > any modifications to protocol. If it has some means to read the ATMARP server's table, yes. Normal ATMARP (RFC1577) doesn't let it do this. > But taking into account bug, described above, you simply have no choice > but to follow your preferred way. 8)8) The oracle has spoken ;-) My preferred way is of course to do as I do now, i.e. to return the error as early as possible. But you seem to suggest that I change my preference ? :) Cheers, Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH werner.almesberger@ica.epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Sat Apr 29 11:41:37 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 11:41:18 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:63241 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 29 Apr 2000 11:40:56 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA11029; Sat, 29 Apr 2000 22:40:49 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004291840.WAA11029@ms2.inr.ac.ru> Subject: Re: neighbour cache vs. invalid addresses To: almesber@lrc.epfl.ch (Werner Almesberger) Date: Sat, 29 Apr 2000 22:40:49 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <200004291829.UAA08071@lrcsun15.epfl.ch> from "Werner Almesberger" at Apr 29, 0 08:29:13 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1106 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Not nice for NBMA ... :-( Why? > > Actually, even CLIP could have some "broadcast router" on subnet without > > any modifications to protocol. > > If it has some means to read the ATMARP server's table, yes. Normal > ATMARP (RFC1577) doesn't let it do this. ATMARP is not obliged even to know about this. Configure it to reply to address x.y.z.u with MAC ABCD. And order machine ABCD to relay broadcasts to all known peers or to some multicast group. And address x.y.z.u will be genuine broadcast/multicast. Actually, many NBMA media did exactly this, because life without broadcasts is sort of... mmm... not easy. > My preferred way is of course to do as I do > now, i.e. to return the error as early as possible. But you seem to > suggest that I change my preference ? :) Stop. But who did say: > Furthermore, the offending packets get killed before they show up > on tcpdump, which makes it harder to debug the "network" problem.) 8)8) No, I suggest to fix that bug. By the way, we will be able to get rid of that annoyning wrong "neighbour table overflow" for loopback. Alexey From owner-netdev@oss.sgi.com Sat Apr 29 11:56:27 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 11:56:17 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:16403 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 11:56:00 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id OAA05537; Sat, 29 Apr 2000 14:57:26 -0500 Message-ID: <20000429145725.A5529@doit.wisc.edu> Date: Sat, 29 Apr 2000 14:57:25 -0500 From: "James R. Leu" To: Werner Almesberger , netdev@oss.sgi.com Subject: Re: neighbour cache vs. invalid addresses Reply-To: jleu@mindspring.com References: <200004291700.TAA03871@lrcsun15.epfl.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 In-Reply-To: <200004291700.TAA03871@lrcsun15.epfl.ch>; from Werner Almesberger on Sat, Apr 29, 2000 at 07:00:37PM +0200 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, Apr 29, 2000 at 07:00:37PM +0200, Werner Almesberger wrote: > In non-broadcast multiple-access networks (NBMA) such as Classical IP > over ATM (CLIP), neither broadcast nor multicast have any useful > semantics. Right now, I catch this in neigh_table->constructor and > return -EINVAL. I hope you mean that it is not a trivial mapping to the current neigh_table setup. Broadcast and multicast do have defined meanings on CLIP interfaces, mapping this meaning to the neigh_table is where the problem comes in. > Is this the right approach ? Or should I return success, accept the > bogus neighbour entry (could this upset the neighbour cache ?), and > blackhole the entire mess afterwards via neigh->ops and > neigh->output ? > > (Background: some applications seem to insist on sending broadcast > or multicast even on interfaces that have neither IFF_BROADCAST nor > IFF_MULTICAST set. According to people who have such applications, > the current approach makes the stack believe that there is a memory > shortage, and shrink the neighbour cache, which is undesirable. > Furthermore, the offending packets get killed before they show up > on tcpdump, which makes it harder to debug the "network" problem.) I know the reason I want multicast on CLIP (or another ATM interface type) is because of an application I maintain that use it for neighbor discovery. Werner, is there a discussion on the ATM list about how multicast and broadcast will (could) work on ATM? -- James R. Leu From owner-netdev@oss.sgi.com Sat Apr 29 15:31:09 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 15:30:49 -0700 Received: from lrcsun15.epfl.ch ([128.178.156.77]:26561 "EHLO lrcsun15.epfl.ch") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 15:30:32 -0700 Received: (from almesber@localhost) by lrcsun15.epfl.ch (8.8.X/EPFL-8.1a) id AAA09376; Sun, 30 Apr 2000 00:30:35 +0200 (MET DST) From: Werner Almesberger Message-Id: <200004292230.AAA09376@lrcsun15.epfl.ch> Subject: Re: neighbour cache vs. invalid addresses To: jleu@mindspring.com Date: Sun, 30 Apr 2000 00:30:35 +0200 (MET DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000429145725.A5529@doit.wisc.edu> from "James R. Leu" at Apr 29, 2000 02:57:25 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing James R. Leu wrote: > Broadcast and multicast do have defined meanings on CLIP interfaces, > mapping this meaning to the neigh_table is where the problem comes in. RFC1577 and (RFC2225 update of 1577) have little encouragement for multicast (section 8 or 10) and make a rather fuzzy statement about broadcast (section 7 or 9). You probably mean MARS, RFC2022. That's a different story. > I know the reason I want multicast on CLIP (or another ATM interface type) > is because of an application I maintain that use it for neighbor discovery. With CLIP, only unicast can yield predictable results. I'm not sure if MARS is widely implemented or deployed. (Linux doesn't have it.) If you really require broadcast media like semantics on an ATM network, you're probably better off with LANE, which emulates a broadcast layer 2. > Werner, is there a discussion on the ATM list about how multicast and > broadcast will (could) work on ATM? It's more on how to make it fail gracefully :-) - Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH werner.almesberger@ica.epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Sat Apr 29 15:40:39 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 15:40:19 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:19987 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 15:40:04 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id SAA05701; Sat, 29 Apr 2000 18:41:40 -0500 Message-ID: <20000429184140.C5572@doit.wisc.edu> Date: Sat, 29 Apr 2000 18:41:40 -0500 From: "James R. Leu" To: Werner Almesberger Cc: netdev@oss.sgi.com Subject: Re: neighbour cache vs. invalid addresses Reply-To: jleu@mindspring.com References: <20000429145725.A5529@doit.wisc.edu> <200004292230.AAA09376@lrcsun15.epfl.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 In-Reply-To: <200004292230.AAA09376@lrcsun15.epfl.ch>; from Werner Almesberger on Sun, Apr 30, 2000 at 12:30:35AM +0200 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, Apr 30, 2000 at 12:30:35AM +0200, Werner Almesberger wrote: > James R. Leu wrote: > > Broadcast and multicast do have defined meanings on CLIP interfaces, > > mapping this meaning to the neigh_table is where the problem comes in. > > RFC1577 and (RFC2225 update of 1577) have little encouragement for > multicast (section 8 or 10) and make a rather fuzzy statement about > broadcast (section 7 or 9). > > You probably mean MARS, RFC2022. That's a different story. I was actually thinking of the way Cisco handles broadcast and multicast over static point-to-point or point-to-multipoint ATM sub interfaces. > It's more on how to make it fail gracefully :-) So does this mean there isn't any talk of adding this support? Jim -- James R. Leu From owner-netdev@oss.sgi.com Sat Apr 29 17:41:12 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 17:40:53 -0700 Received: from lrcsun15.epfl.ch ([128.178.156.77]:64450 "EHLO lrcsun15.epfl.ch") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 17:40:37 -0700 Received: (from almesber@localhost) by lrcsun15.epfl.ch (8.8.X/EPFL-8.1a) id CAA10618; Sun, 30 Apr 2000 02:40:40 +0200 (MET DST) From: Werner Almesberger Message-Id: <200004300040.CAA10618@lrcsun15.epfl.ch> Subject: Re: neighbour cache vs. invalid addresses To: kuznet@ms2.inr.ac.ru Date: Sun, 30 Apr 2000 02:40:40 +0200 (MET DST) Cc: netdev@oss.sgi.com In-Reply-To: <200004291840.WAA11029@ms2.inr.ac.ru> from "kuznet@ms2.inr.ac.ru" at Apr 29, 2000 10:40:49 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: >> Not nice for NBMA ... :-( > > Why? It would be good if an application could determine quickly if multicast or broadcast may actually work. If it comes back after trying for five minutes with a complaint about lack of connectivity or such, this isn't very friendly. > ATMARP is not obliged even to know about this. Except for the ATMARP server. Also, the clients need to allow sending of packets to such addresses, which they may or may not do. (They probably also need to disambiguate 255.255.255.255 broadcasts, because the ATMARP server couldn't tell which LIS they are for.) > Actually, many NBMA media did exactly this, because life > without broadcasts is sort of... mmm... not easy. Life with ATM is supposed to be hard ;-) > Stop. But who did say: > >> Furthermore, the offending packets get killed before they show up >> on tcpdump, which makes it harder to debug the "network" problem.) Oops, there I was relaying a user comment ;-) > No, I suggest to fix that bug. By the way, we will be able to get rid of > that annoyning wrong "neighbour table overflow" for loopback. First try: does my (completely untested) patch at ftp://icaftp.epfl.ch/pub/people/almesber/junk/neigh-error-0.patch.gz look reasonable ? - Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH werner.almesberger@ica.epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Sat Apr 29 17:49:32 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 17:49:23 -0700 Received: from lrcsun15.epfl.ch ([128.178.156.77]:11203 "EHLO lrcsun15.epfl.ch") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 17:49:19 -0700 Received: (from almesber@localhost) by lrcsun15.epfl.ch (8.8.X/EPFL-8.1a) id CAA10668; Sun, 30 Apr 2000 02:49:25 +0200 (MET DST) From: Werner Almesberger Message-Id: <200004300049.CAA10668@lrcsun15.epfl.ch> Subject: Re: neighbour cache vs. invalid addresses To: jleu@mindspring.com Date: Sun, 30 Apr 2000 02:49:25 +0200 (MET DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000429184140.C5572@doit.wisc.edu> from "James R. Leu" at Apr 29, 2000 06:41:40 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing James R. Leu wrote: > I was actually thinking of the way Cisco handles broadcast and multicast > over static point-to-point or point-to-multipoint ATM sub interfaces. That's PVCs, right ? With PVCs, it's simpler than with SVCs, because all hosts on the same LIS know must each other. With SVCs, only the ATMARP server knows that it knows everybody. > So does this mean there isn't any talk of adding this support? Multi-/broadcast for CLIP ? Very rarely. Seems that most people are using LANE for this. Multicast signaling for native ATM comes up a little more frequently, maybe every 1-3 months, but never really gets to the point where something useful emerges. - Werner -- _________________________________________________________________________ / Werner Almesberger, ICA, EPFL, CH werner.almesberger@ica.epfl.ch / /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ From owner-netdev@oss.sgi.com Sat Apr 29 17:58:13 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 17:58:03 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:22035 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 17:57:43 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id UAA05738; Sat, 29 Apr 2000 20:59:18 -0500 Message-ID: <20000429205918.A5719@doit.wisc.edu> Date: Sat, 29 Apr 2000 20:59:18 -0500 From: "James R. Leu" To: Werner Almesberger Cc: netdev@oss.sgi.com Subject: Re: neighbour cache vs. invalid addresses Reply-To: jleu@mindspring.com References: <20000429184140.C5572@doit.wisc.edu> <200004300049.CAA10668@lrcsun15.epfl.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 In-Reply-To: <200004300049.CAA10668@lrcsun15.epfl.ch>; from Werner Almesberger on Sun, Apr 30, 2000 at 02:49:25AM +0200 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, Apr 30, 2000 at 02:49:25AM +0200, Werner Almesberger wrote: > James R. Leu wrote: > > I was actually thinking of the way Cisco handles broadcast and multicast > > over static point-to-point or point-to-multipoint ATM sub interfaces. > > That's PVCs, right ? With PVCs, it's simpler than with SVCs, because all > hosts on the same LIS know must each other. With SVCs, only the ATMARP > server knows that it knows everybody. So why doesn't ATM for Linux support this then (or does it and I'm just clueless?) > > So does this mean there isn't any talk of adding this support? > > Multi-/broadcast for CLIP ? Very rarely. Seems that most people are > using LANE for this. Multicast signaling for native ATM comes up a > little more frequently, maybe every 1-3 months, but never really gets > to the point where something useful emerges. All I want to do is have one PVC between two boxes, and send packets addressed to the multicast all routers address. From what I can see in the neighbor processing code this isn't supported. > > - Werner > > -- > _________________________________________________________________________ > / Werner Almesberger, ICA, EPFL, CH werner.almesberger@ica.epfl.ch / > /_IN_N_032__Tel_+41_21_693_6621__Fax_+41_21_693_6610_____________________/ -- James R. Leu From owner-netdev@oss.sgi.com Sat Apr 29 21:25:43 2000 Received: by oss.sgi.com id ; Sat, 29 Apr 2000 21:25:24 -0700 Received: from north.net.CSUChico.EDU ([132.241.66.18]:13067 "EHLO north.net.csuchico.edu") by oss.sgi.com with ESMTP id ; Sat, 29 Apr 2000 21:24:59 -0700 Received: (from warlock@localhost) by north.net.csuchico.edu (8.10.0.Beta11/8.10.0.Beta11) id e3U4Oxs18076 for netdev@oss.sgi.com; Sat, 29 Apr 2000 21:24:59 -0700 Date: Sat, 29 Apr 2000 21:24:59 -0700 From: John Kennedy To: netdev@oss.sgi.com Subject: SIOCGLIFCONF? Message-ID: <20000429212459.A18071@north.csuchico.edu> References: <200003291154.DAA01727@pizda.ninka.net> <20000428161032.B30351@north.csuchico.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20000428161032.B30351@north.csuchico.edu>; from jk@csuchico.edu on Fri, Apr 28, 2000 at 04:10:32PM -0700 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Has anybody already done any work towards hacking in SIOCGLIFCONF and the associated struct lif* data structures into the kernel? From owner-netdev@oss.sgi.com Sun Apr 30 07:51:56 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 07:51:37 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:13060 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 07:51:07 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id XAA00935 for ; Sun, 30 Apr 2000 23:50:55 +0900 To: netdev@oss.sgi.com Subject: Re: SIOCGLIFCONF? From: Hideaki YOSHIFUJI (=?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) In-Reply-To: <20000429212459.A18071@north.csuchico.edu> References: <200003291154.DAA01727@pizda.ninka.net> <20000428161032.B30351@north.csuchico.edu> <20000429212459.A18071@north.csuchico.edu> X-Mailer: Mew version 1.94 on Emacs 20.5 / Mule 4.1 =?iso-2022-jp?B?KBskQjAqGyhCKQ==?= X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Sun, 30 Apr 2000 23:50:55 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 16 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, In article <20000429212459.A18071@north.csuchico.edu> (at Sat, 29 Apr 2000 21:24:59 -0700), John Kennedy says: > Has anybody already done any work towards hacking in SIOCGLIFCONF and > the associated struct lif* data structures into the kernel? I don't know, but I implemented getifaddrs() from bsdi using rtnetlink. I've sent this glibc people, they don't pay attention... -- Hideaki YOSHIFUJI Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Sun Apr 30 08:00:17 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 08:00:07 -0700 Received: from tazenda.demon.co.uk ([158.152.220.239]:11524 "EHLO kings-cross.london.uk.eu.org") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 07:59:59 -0700 Received: from localhost ([::ffff:127.0.0.1] helo=kings-cross.london.uk.eu.org ident=phil) by kings-cross.london.uk.eu.org with esmtp (Exim 3.11 #1) id 12lvCJ-0000Du-00; Sun, 30 Apr 2000 15:59:51 +0100 X-Mailer: exmh version 2.0.2 2/24/98 (debian) To: Hideaki YOSHIFUJI (=?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) cc: netdev@oss.sgi.com Subject: Re: SIOCGLIFCONF? In-Reply-To: Message from Hideaki YOSHIFUJI (=?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) of "Sun, 30 Apr 2000 23:50:55 +0900." <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> References: <200003291154.DAA01727@pizda.ninka.net> <20000428161032.B30351@north.csuchico.edu> <20000429212459.A18071@north.csuchico.edu> <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 30 Apr 2000 15:59:50 +0100 From: Philip Blundell Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >I've sent this glibc people, they don't pay attention... It's more a case of nobody having time to look at your changes yet. Do you have a copyright assignment on file already? p. From owner-netdev@oss.sgi.com Sun Apr 30 08:09:17 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 08:08:57 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:16900 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 08:08:49 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id AAA01083 for ; Mon, 1 May 2000 00:08:37 +0900 To: netdev@oss.sgi.com Subject: Re: SIOCGLIFCONF? From: Hideaki YOSHIFUJI (=?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) In-Reply-To: References: <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> X-Mailer: Mew version 1.94 on Emacs 20.5 / Mule 4.1 =?iso-2022-jp?B?KBskQjAqGyhCKQ==?= X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20000501000837W.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Mon, 01 May 2000 00:08:37 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 13 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In article (at Sun, 30 Apr 2000 15:59:50 +0100), Philip Blundell says: > >I've sent this glibc people, they don't pay attention... >.. Do you have a copyright assignment on file already? ifaddrs.h and ifaddrs.3 are based on BSDI's ones; "BSDI Lisence". ifaddrs.c is my work; GPL2. -- Hideaki YOSHIFUJI Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Sun Apr 30 08:17:17 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 08:16:57 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:18180 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 08:16:52 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id AAA01109 for ; Mon, 1 May 2000 00:16:40 +0900 To: netdev@oss.sgi.com Subject: Re: SIOCGLIFCONF? From: Hideaki YOSHIFUJI (=?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) In-Reply-To: <20000501000837W.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> References: <20000501000837W.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> X-Mailer: Mew version 1.94 on Emacs 20.5 / Mule 4.1 =?iso-2022-jp?B?KBskQjAqGyhCKQ==?= X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit Message-Id: <20000501001640H.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Mon, 01 May 2000 00:16:40 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 13 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In article <20000501000837W.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> (at Mon, 01 May 2000 00:08:37 +0900), Hideaki YOSHIFUJI ($B5HF#1QL@(B) says: > >.. Do you have a copyright assignment on file already? > > ifaddrs.h and ifaddrs.3 are based on BSDI's ones; "BSDI Lisence". > ifaddrs.c is my work; GPL2. I mean, yes. -- Hideaki YOSHIFUJI Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Sun Apr 30 08:57:57 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 08:57:48 -0700 Received: from tazenda.demon.co.uk ([158.152.220.239]:14852 "EHLO kings-cross.london.uk.eu.org") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 08:57:25 -0700 Received: from localhost ([::ffff:127.0.0.1] helo=kings-cross.london.uk.eu.org ident=phil) by kings-cross.london.uk.eu.org with esmtp (Exim 3.11 #1) id 12lvCJ-0000Du-00; Sun, 30 Apr 2000 15:59:51 +0100 X-Mailer: exmh version 2.0.2 2/24/98 (debian) To: Hideaki YOSHIFUJI (=?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) cc: netdev@oss.sgi.com Subject: Re: SIOCGLIFCONF? In-Reply-To: Message from Hideaki YOSHIFUJI (=?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) of "Sun, 30 Apr 2000 23:50:55 +0900." <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> References: <200003291154.DAA01727@pizda.ninka.net> <20000428161032.B30351@north.csuchico.edu> <20000429212459.A18071@north.csuchico.edu> <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 30 Apr 2000 15:59:50 +0100 From: Philip Blundell Message-Id: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >I've sent this glibc people, they don't pay attention... It's more a case of nobody having time to look at your changes yet. Do you have a copyright assignment on file already? p. From owner-netdev@oss.sgi.com Sun Apr 30 09:38:08 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 09:37:58 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:5901 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 30 Apr 2000 09:37:45 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA21703; Sun, 30 Apr 2000 20:37:37 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200004301637.UAA21703@ms2.inr.ac.ru> Subject: Re: neighbour cache vs. invalid addresses To: almesber@lrc.epfl.ch (Werner Almesberger) Date: Sun, 30 Apr 2000 20:37:37 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <200004300040.CAA10618@lrcsun15.epfl.ch> from "Werner Almesberger" at Apr 30, 0 02:40:40 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1350 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > First try: does my (completely untested) patch at > ftp://icaftp.epfl.ch/pub/people/almesber/junk/neigh-error-0.patch.gz > look reasonable ? It is reasonable in the highest extent. > It would be good if an application could determine quickly if multicast > or broadcast may actually work. If it comes back after trying for five > minutes with a complaint about lack of connectivity or such, this isn't > very friendly. Until now it resulted only in more troubles. F.e. multicasts were used mainly by routing control apps, which were supposed to start in void state on bare iron and turn on all the lights. Relying on flags they really went to coma instantly. 8) The reason is that "multicast"/"broadcast" notions depend on protocol. F.e. we can configure routing tables, so that IP multicasts will be routed to a gateway or to a specific exploder using normal unicasts, so that device never sees multicasts and application send multicatsts transparently. After this CLIP becomes truly multicasting, despite of it is not multicasting at MAC level and for protocols different of IP. Conclusion: IFF_MULTICAST should be used only as advice, when application has different modes of operation or is able to select from several devices to make the work. Otherwise, it must assume that multicasts are available on any kind of media. Alexey From owner-netdev@oss.sgi.com Sun Apr 30 13:37:57 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 13:37:48 -0700 Received: from c855439-a.pinol1.sfba.home.com ([24.14.147.74]:26897 "EHLO despot.finemaltcoding.com") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 13:37:29 -0700 Received: from finemaltcoding.com (localhost.localdomain [127.0.0.1]) by despot.finemaltcoding.com (8.9.3/8.9.3) with ESMTP id NAA08543 for ; Sun, 30 Apr 2000 13:37:30 -0700 Message-ID: <390C998A.42873E21@finemaltcoding.com> Date: Sun, 30 Apr 2000 13:37:30 -0700 From: "Daniel L. Rall" Reply-To: dlr@collab.net Organization: "Fine Malt Coding" X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: PATCH 2.2.14 net/core/dev.c Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello list. :) I rewrote the dev_alloc_name() function in the 2.2.14 Linux kernel's net/core/dev.c module. It had an apparently artificial limitation on number of network devices of the same type allowed (100), and was not implemented in a very efficient manner (which would not be a big deal under normal circumstances, as this routine wouldn't be called very often). Just another drop in the bucket. :) This patch has been through a few iterations, and I've been running on it for almost a month now (i.e. I consider it stable). I've received excellent feedback from Ben Canning , and look forward to more from the great programmers and engineers who are sure to be perusing this list (rip it up ladies and gentlemen). -- Daniel Rall PATCH follows: --- net/core/dev.c-ORIG Mon Apr 17 02:50:31 2000 +++ net/core/dev.c Sun Apr 30 13:21:06 2000 @@ -17,8 +17,12 @@ * David Hinds * Alexey Kuznetsov * Adam Sulmicki + * Daniel Rall * * Changes: + * Daniel Rall : Support an unlimitted number of devices + * of the same type and improve the + * efficiency of device name allocation. * Marcelo Tosatti : dont accept mtu 0 or < * Alan Cox : device private ioctl copies fields back. * Alan Cox : Transmit queue code does relevant stunts to @@ -66,6 +70,7 @@ #include #include #include +#include #include #include #include @@ -295,23 +300,60 @@ } /* - * Passed a format string - eg "lt%d" it will try and find a suitable - * id. Not efficient for many devices, not called a lot.. + * Passed a format string - eg "lt%d" or "eth%d", locates and returns an + * unused id. Assumes the provided device and name have been allocated + * properly. Note that a device name is the concatenation of its type + * and id. */ -int dev_alloc_name(struct device *dev, const char *name) +int dev_alloc_name(struct device *device, const char *name_fmt) { - int i; - /* - * If you need over 100 please also fix the algorithm... - */ - for(i=0;i<100;i++) + char *ptr = strrchr(name_fmt, '%'); + + if (ptr != NULL) { - sprintf(dev->name,name,i); - if(dev_get(dev->name)==NULL) - return i; + int id = 0; + size_t type_len = (size_t)(ptr - name_fmt); + + register struct device *dev; + int n; + + /* + * Find the device matching the desired device + * name in the master device list with the + * highest id. + */ + for (dev = dev_base; dev != NULL; dev = dev->next) + { + /* Be suspicious of the other devices. Who knows what + was what has happened to them since we saw them + last. */ + if (dev->name != NULL) + { + /* Check for matching device type. */ + if (strncmp(name_fmt, dev->name, type_len) == 0 + && isdigit(dev->name[type_len])) + { + /* Device type matches, parse id. */ + ptr = (char *)(dev->name + type_len); + n = (int)simple_strtoul(ptr, NULL, 0); + + /* Remeber the parsed id if it's the + highest found. */ + if (n > id) + id = n; + } + } + } + + /* Write the new name. */ + sprintf(device->name, name_fmt, id); + + return id; } - return -ENFILE; /* Over 100 of the things .. bail out! */ + else + /* No format string indicator found in arg name_fmt. */ + return -EINVAL; } struct device *dev_alloc(const char *name, int *err) From owner-netdev@oss.sgi.com Sun Apr 30 14:07:48 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 14:07:39 -0700 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:11020 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 14:07:25 -0700 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.9.3) with ESMTP id OAA25583; Sun, 30 Apr 2000 14:40:45 -0700 Message-ID: <390CA85D.20804F00@candelatech.com> Date: Sun, 30 Apr 2000 14:40:45 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) X-Accept-Language: en MIME-Version: 1.0 To: dlr@collab.net, "netdev@oss.sgi.com" Subject: Re: PATCH 2.2.14 net/core/dev.c References: <390C998A.42873E21@finemaltcoding.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing "Daniel L. Rall" wrote: > > Hello list. :) > > I rewrote the dev_alloc_name() function in the 2.2.14 Linux kernel's > net/core/dev.c module. It had an apparently artificial limitation on > number of network devices of the same type allowed (100), and was not > implemented in a very efficient manner (which would not be a big deal > under normal circumstances, as this routine wouldn't be called very > often). Just another drop in the bucket. :) I needed more for my VLAN implementation, and just upped the 100 to 8096 or so. Due to people's varying tastes, I also implemented a few other methods of generating VLAN device names, and did NOT even use the dev_alloc_name method in these instances. So: 1) Will that screw your patch up? (I don't think it will...) 2) I'm curious what instances of device naming this patch could help. For example, if you are going to have 200 devices of some type, then you may not want them linearly named because as soon as you blink you have forgotten what foo_dev198 refered to. At least from the comments I've read relating to VLANs, it seems that a more direct naming scheme: [base_dev]:[new_dev_id], ie: eth0:25 for a VLAN 25 on eth0, or something similar, may be more desirable. Thanks, Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Sun Apr 30 15:15:08 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 15:14:58 -0700 Received: from north.net.CSUChico.EDU ([132.241.66.18]:60940 "EHLO north.net.csuchico.edu") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 15:14:38 -0700 Received: (from warlock@localhost) by north.net.csuchico.edu (8.10.0.Beta11/8.10.0.Beta11) id e3UMEUN30215; Sun, 30 Apr 2000 15:14:30 -0700 Date: Sun, 30 Apr 2000 15:14:30 -0700 From: John Kennedy To: Hideaki YOSHIFUJI Cc: netdev@oss.sgi.com Subject: Re: SIOCGLIFCONF? Message-ID: <20000430151430.A26860@north.csuchico.edu> References: <200003291154.DAA01727@pizda.ninka.net> <20000428161032.B30351@north.csuchico.edu> <20000429212459.A18071@north.csuchico.edu> <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp>; from yoshfuji@ecei.tohoku.ac.jp on Sun, Apr 30, 2000 at 11:50:55PM +0900 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, Apr 30, 2000 at 11:50:55PM +0900, Hideaki YOSHIFUJI wrote: > In article <20000429212459.A18071@north.csuchico.edu> (at Sat, 29 Apr 2000 21:24:59 -0700), John Kennedy says: > > Has anybody already done any work towards hacking in SIOCGLIFCONF and > > the associated struct lif* data structures into the kernel? > > I don't know, but I implemented getifaddrs() from bsdi using rtnetlink. > > > I've sent this glibc people, they don't pay attention... Right now that host is refusing connections. I'm specifically looking for that API, since both BIND & sendmail prefer it when trying to pick up IPv6 interfaces. From owner-netdev@oss.sgi.com Sun Apr 30 15:25:08 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 15:24:58 -0700 Received: from c855439-a.pinol1.sfba.home.com ([24.14.147.74]:3603 "EHLO despot.finemaltcoding.com") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 15:24:34 -0700 Received: from finemaltcoding.com (localhost.localdomain [127.0.0.1]) by despot.finemaltcoding.com (8.9.3/8.9.3) with ESMTP id PAA09106; Sun, 30 Apr 2000 15:24:32 -0700 Message-ID: <390CB2A0.11EC291E@finemaltcoding.com> Date: Sun, 30 Apr 2000 15:24:32 -0700 From: "Daniel L. Rall" Reply-To: dlr@collab.net Organization: "Fine Malt Coding" X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Ben Greear CC: linux-net@vger.rutgers.edu, "netdev@oss.sgi.com" , bdc@bdc.cx Subject: Re: PATCH 2.2.14 net/core/dev.c References: <390C998A.42873E21@finemaltcoding.com> <390CA85D.20804F00@candelatech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Ben Greear wrote: > > "Daniel L. Rall" wrote: > > > > Hello list. :) > > > > I rewrote the dev_alloc_name() function in the 2.2.14 Linux kernel's > > net/core/dev.c module. It had an apparently artificial limitation on > > number of network devices of the same type allowed (100), and was not > > implemented in a very efficient manner (which would not be a big deal > > under normal circumstances, as this routine wouldn't be called very > > often). Just another drop in the bucket. :) > > I needed more for my VLAN implementation, and just upped the > 100 to 8096 or so. Due to people's varying tastes, I also implemented > a few other methods of generating VLAN device names, and did NOT > even use the dev_alloc_name method in these instances. > > So: > 1) Will that screw your patch up? (I don't think it will...) No, my patch actually totally replaces the existing implementation (though the API doesn't change at all). The post-patch device limit will be memory imposed only. > 2) I'm curious what instances of device naming this patch could help. > For example, if you are going to have 200 devices of some type, then > you may not want them linearly named because as soon as you blink you > have forgotten what foo_dev198 refered to. At least from the comments > I've read relating to VLANs, it seems that a more direct naming scheme: > [base_dev]:[new_dev_id], ie: eth0:25 for a VLAN 25 on eth0, or something > similar, may be more desirable. Whelp, I didn't change the existing API at all. (I wasn't comfortable doing that without just this type of input--thanks!) As far as naming devices in a non-linear fashion, I don't see either the previous implementation or my patched implementation supporting that. Your VLAN example is partially supported by the dev_alloc_name API (but you can only allocate a new VLAN in a linear fashion): int result = dev_alloc_name(device, "eth0:%d"); I agree with Ben Greear's comment about being able to specify a desired device name in dev_alloc_name() (as long as we don't duplicate existing functionality). We could support *both* the existing linear dev_max_id auto-increment and the addition of a caller-specified device naming scheme by looking for the existance of the format indicator '%' in the specified name and allocating that name. If the specified name already exists, we return error status and do a printk(). This keeps the existing API, and adds functionality. Here are the existing usages of the function: dlr@despot:linux$ find . -name '*.c' | xargs grep -n dev_alloc_name ./net/core/dev.c:309:int dev_alloc_name(struct device *device, const char *name_fmt) ./net/core/dev.c:368: *err=dev_alloc_name(dev,name); ./net/netsyms.c:472:EXPORT_SYMBOL(dev_alloc_name); ./net/sched/sch_teql.c:463: err = dev_alloc_name(&the_master.dev, "teql%d");./drivers/net/ppp.c:2847: if_num = dev_alloc_name(dev, "ppp%d"); ./drivers/net/ppp.c:2849: printk(KERN_ERR "ppp: dev_alloc_name failed (%d)\n", if_num); ./drivers/net/ipddp.c:341: err=dev_alloc_name(&dev_ipddp, "ipddp%d"); ./drivers/net/dummy.c:147: int err=dev_alloc_name(&dev_dummy,"dummy%d"); ./drivers/net/shaper.c:674: int err=dev_alloc_name(&dev_shape,"shaper%d"); ./drivers/net/ltpc.c:1304: err=dev_alloc_name(&dev_ltpc,"lt%d"); ./drivers/net/cops.c:1056: err=dev_alloc_name(&cops0_dev, "lt%d"); Do other people find this useful, or consider this additional functionality bloat? I would be happy to add this functionality. Also, should I be posting this to linux-next@vger.rutgers.edu, or to netdev@oss.sgi.com? Or to both (as I did)? -- Daniel Rall From owner-netdev@oss.sgi.com Sun Apr 30 17:39:09 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 17:38:59 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:32260 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 17:38:42 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id JAA02083; Mon, 1 May 2000 09:38:26 +0900 To: jk@csuchico.edu Cc: netdev@oss.sgi.com Subject: Re: SIOCGLIFCONF? From: Hideaki YOSHIFUJI (=?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?=) In-Reply-To: <20000430151430.A26860@north.csuchico.edu> References: <20000429212459.A18071@north.csuchico.edu> <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> <20000430151430.A26860@north.csuchico.edu> X-Mailer: Mew version 1.94 on Emacs 20.5 / Mule 4.1 =?iso-2022-jp?B?KBskQjAqGyhCKQ==?= X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20000501093826K.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Mon, 01 May 2000 09:38:26 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 34 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In article <20000430151430.A26860@north.csuchico.edu> (at Sun, 30 Apr 2000 15:14:30 -0700), John Kennedy says: > > > Right now that host is refusing connections. I'm specifically > looking for that API, since both BIND & sendmail prefer it when > trying to pick up IPv6 interfaces. Hmm, I've restarted the httpd. Also, you can get a copy from Here's a simple code that shows how to use getifaddrs(3): #include : struct ifaddrs *ifa0, *ifa; if (getifaddrs(&ifa0)){ perror("ifaddrs"); exit(1); } for (ifa=ifa0; ifa; ifa=ifa->ifa_next){ /* Use ifa->ifa_addr, ifa->ifa_broadaddr etc. which points struct sockaddr and ifa->ifa_flags */ if (ifa->ifa_addr && ifa->ifa_addr == AF_INET6){ : } } freeifaddrs(ifa0); -- Hideaki YOSHIFUJI Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25 From owner-netdev@oss.sgi.com Sun Apr 30 20:44:10 2000 Received: by oss.sgi.com id ; Sun, 30 Apr 2000 20:44:00 -0700 Received: from cerberus.nemoto.ecei.tohoku.ac.jp ([130.34.199.67]:38404 "EHLO cerberus.nemoto.ecei.tohoku.ac.jp") by oss.sgi.com with ESMTP id ; Sun, 30 Apr 2000 20:43:35 -0700 Received: from localhost (yoshfuji@localhost [127.0.0.1]) by cerberus.nemoto.ecei.tohoku.ac.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id MAA02530 for ; Mon, 1 May 2000 12:43:18 +0900 To: netdev@oss.sgi.com Subject: Re: SIOCGLIFCONF? From: Hideaki YOSHIFUJI In-Reply-To: <20000501093826K.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> References: <20000430235055F.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> <20000430151430.A26860@north.csuchico.edu> <20000501093826K.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> X-Mailer: Mew version 1.94 on XEmacs 20.4 (Emerald) X-URL: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20000501124318G.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> Date: Mon, 01 May 2000 12:43:18 +0900 X-Dispatcher: imput version 990905(IM130) Lines: 15 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In article <20000501093826K.yoshfuji@cerberus.nemoto.ecei.tohoku.ac.jp> (at Mon, 01 May 2000 09:38:26 +0900), Hideaki YOSHIFUJI says: > if (ifa->ifa_addr && ifa->ifa_addr == AF_INET6){ > : > } Sorry, it should be if (ifa->ifa_addr && ifa->ifa_addr->sa_family == AF_INET6){ : } -- Hideaki YOSHIFUJI Web Page: http://www.ecei.tohoku.ac.jp/%7Eyoshfuji/ PGP5i FP: F731 6599 5EB2 BBA7 1515 1323 1806 A96F 5700 6B25