From owner-netdev@oss.sgi.com Thu Mar 1 02:26:32 2001 Received: by oss.sgi.com id ; Thu, 1 Mar 2001 02:26:12 -0800 Received: from hugin.diku.dk ([130.225.96.144]:30468 "HELO hugin.diku.dk") by oss.sgi.com with SMTP id ; Thu, 1 Mar 2001 02:26:04 -0800 Received: (qmail 15016 invoked from network); 1 Mar 2001 10:25:52 -0000 Received: from ask.diku.dk (firefly@130.225.96.225) by hugin.diku.dk with QMQP; 1 Mar 2001 10:25:52 -0000 Date: Thu, 1 Mar 2001 11:25:51 +0100 (MET) From: Peter Finderup Lund To: netdev@oss.sgi.com Subject: Docs on the routing part of the network stack? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Does anybody know where I can find some docs on how the routing part of the network stack works (only interested in IPv4)? In particular, something about the destination cache and the skbuff.dst field... -Peter A room without books is like a body without a soul. -- Marcus Tullius Cicero (106-43 B.C.) From owner-netdev@oss.sgi.com Thu Mar 1 05:56:45 2001 Received: by oss.sgi.com id ; Thu, 1 Mar 2001 05:56:35 -0800 Received: from nero.doit.wisc.edu ([128.104.17.130]:23564 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Thu, 1 Mar 2001 05:56:14 -0800 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id HAA11611; Thu, 1 Mar 2001 07:55:43 -0600 Date: Thu, 1 Mar 2001 07:55:43 -0600 From: "James R. Leu" To: Rusty Russell Cc: Richard Guy Briggs , netfilter-devel@us5.samba.org, linux-ipsec@freeswan.org, netdev@oss.sgi.com Subject: Re: On Extending NFMark... Message-ID: <20010301075543.B11586@doit.wisc.edu> Reply-To: jleu@mindspring.com References: <20010220110113.C3910@grendel.conscoop.ottawa.on.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: ; from rusty@linuxcare.com.au on Thu, Mar 01, 2001 at 03:44:28PM +1100 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello, I haven't been following this thread long from the beginning, but I have a patch that might be related. I wrote and MPLS implementation which needs to mark skbs and store an piece of info with the skb until it reaches the mpls_output() function. I created an array of void* on each skb. Each protocol that wants to use an entry in this array claims a index in that array (via a header file). There is also an array of function pointers, these are called when we want to clean up the array. The major downfall of this technique is the overhead of always having the array even though it may not be used. I like the idea of having a linked list, except for the case when we cannot allocate memory to add to the list, and because we would have to search the list to find a particular protocols info. (these reasons are why I choose an array) > The classic nfmark field problem; that there is only one. See also, > nfct. > > For a generic linked-list of blobs approach, there are several > problems: > 1) How do I tell which one is mine? > 2) What happens when packet is copied? > 3) What happens when packet is cloned? > 4) What happens when packet is destroyed? > For the array technique: 1. by the protocol index 2. Copy it. 3. Copy it. 4. Call the protocols functions pointer to clean up Jim -- James R. Leu From owner-netdev@oss.sgi.com Thu Mar 1 06:54:34 2001 Received: by oss.sgi.com id ; Thu, 1 Mar 2001 06:54:25 -0800 Received: from adsl-151-196-233-8.baltmd.adsl.bellatlantic.net ([151.196.233.8]:50475 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 1 Mar 2001 06:54:08 -0800 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id JAA08400; Thu, 1 Mar 2001 09:54:19 -0500 Date: Thu, 1 Mar 2001 09:54:19 -0500 (EST) From: Donald Becker X-Sender: becker@vaio.greennet To: Matti Aarnio cc: Jeff Garzik , linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: rx_copybreak... In-Reply-To: <20010301082335.Q15688@mea-ext.zmailer.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 1 Mar 2001, Matti Aarnio wrote: > On Wed, Feb 28, 2001 at 09:21:24PM -0500, Jeff Garzik wrote: > > Instead of unconditionally copying packet sizes over rx_copybreak, in > > many ethernet drivers... is it worth it for the driver to check and see > > if the packet is already aligned? Or is that such a rare case that it > > shouldn't be worth it? > > To know if packet is aligned or not, one must look into > the packet to analyze protocols. ..which breaks an abstraction boundary, and may not be worth the overhead. It's very rare to see non-DIX packets, but the kernel should not crash or be easily D-O-Sed from the unaligned accesses when it does happen. > IMO it is far better to setup the card to do RX DMA into > DIX aligned IP frame (the most common case), but hardware > which is inherently unable to do that has drivers knowing it. I always assume a DIX frame. For every one of my PCI drivers I receive into a + 2 offset if the hardware supports it. This is documented in the operational description at the top of the driver file. It's usually easy to tell the chip capability even without reading the documentation: for bus master drivers look for skb_reserve(skb, 2). Arbitrary Tx and Rx alignment (best) 3c59x.c For 3c900 series chips, not ancient 3c590 chips epic100.c hamachi.c yellowfin.c Tulip-like 4 byte alignment natsemi.c starfire.c tulip.c via-rhine.c Also requires align-copy on Tx! winbond-840.c The CPU must always copy ne2k-pci.c PIO rtl8139.c Must copy out of the Rx ring, and align copy on Tx. Note: For the last category the 'rx_copybreak' parameter does not exist. > My nonexhaustive list of cards says: > - Tulip: RX DMA alignment: 4 bytes > (buffer sizes have same alignment, one can't do tricks > like receive 12 bytes to the first buffer, rest to > the next) This is always the case: if the chip doesn't have the byte shifting hardware to start the Rx write to an arbitrary alignment, it doesn't have the hardware to change the alignment during the transfer. > - 3c59x: RX DMA alignment: 1 byte (all 3c90X cards) Note: not the ancient 3c590/595 cards -- throw those away if you care about performance. They were considered excellent in the days of the P5-75 processor, but that was a long time ago. > - eepro100: RX DMA alignment: unknown ! > (for lack of certain part of intel documents) This is the single case where it's not obvious by inspecting the code. The chip's capability depends on which Rx data structure is selected. And with Intel, it's always a problem of lacking documentation ;-> Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From owner-netdev@oss.sgi.com Thu Mar 1 11:48:34 2001 Received: by oss.sgi.com id ; Thu, 1 Mar 2001 11:48:24 -0800 Received: from zikova.cvut.cz ([147.32.235.100]:63237 "EHLO zikova.cvut.cz") by oss.sgi.com with ESMTP id ; Thu, 1 Mar 2001 11:48:15 -0800 Received: from vcnet.vc.cvut.cz (vcnet.vc.cvut.cz [147.32.240.61]) by zikova.cvut.cz (8.9.0.Beta5/8.9.0.Beta5) with ESMTP id UAA65076 for ; Thu, 1 Mar 2001 20:48:11 +0100 Received: from VCNET/SpoolDir by vcnet.vc.cvut.cz (Mercury 1.21); 1 Mar 101 20:48:12 MET-1MEST Received: from SpoolDir by VCNET (Mercury 1.30); 1 Mar 101 20:47:56 MET-1MEST From: "Petr Vandrovec" Organization: CC CTU Prague To: netdev@oss.sgi.com Date: Thu, 1 Mar 2001 20:47:50 MET-1 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Undo partial loss... X-mailer: Pegasus Mail v3.40 Message-ID: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, anybody interested in Undo partial loss 147.32.240.81/22 c2 l1 ss2/2 p1 My end running 2.4.2-ac7, other end 2.4.0-test7-pre4. Best regards, Petr Vandrovec vandrove@vc.cvut.cz From owner-netdev@oss.sgi.com Thu Mar 1 12:31:35 2001 Received: by oss.sgi.com id ; Thu, 1 Mar 2001 12:31:25 -0800 Received: from [194.213.32.137] ([194.213.32.137]:4356 "EHLO bug.ucw.cz") by oss.sgi.com with ESMTP id ; Thu, 1 Mar 2001 12:31:13 -0800 Received: (from root@localhost) by bug.ucw.cz (8.8.8/8.8.5) id LAA00145; Thu, 1 Mar 2001 11:53:16 +0100 Date: Sat, 1 Jan 2000 00:19:15 +0000 From: Pavel Machek To: Andi Kleen Cc: Jeff Garzik , linux-kernel@vger.kernel.org, netdev@oss.sgi.com Subject: Re: New net features for added performance Message-ID: <20000101001915.A40@(none)> References: <3A9842DC.B42ECD7A@mandrakesoft.com> <3A984BDA.190B4D8E@mandrakesoft.com> <20010225011211.A23853@gruyere.muc.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20010225011211.A23853@gruyere.muc.suse.de>; from ak@suse.de on Sun, Feb 25, 2001 at 01:12:11AM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi! > > an alloc of a PKT_BUF_SZ'd skb immediately follows a free of a > > same-sized skb. 100% of the time. > > Free/Alloc gives the mm the chance to throttle it by failing, and also to > recover from fragmentation by packing the slabs. If you don't do it you need > to add a hook somewhere that gets triggered on low memory situations and > frees the buffers. And what? It makes allocation longer lived. Our MM should survive that just fine. -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. From owner-netdev@oss.sgi.com Thu Mar 1 12:41:36 2001 Received: by oss.sgi.com id ; Thu, 1 Mar 2001 12:41:26 -0800 Received: from colin.muc.de ([193.149.48.1]:36109 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Thu, 1 Mar 2001 12:41:23 -0800 Received: by colin.muc.de id <140564-3>; Thu, 1 Mar 2001 21:41:08 +0100 Message-ID: <20010301214106.23243@colin.muc.de> Date: Thu, 1 Mar 2001 21:41:07 +0100 From: Andi Kleen To: Pavel Machek Cc: Andi Kleen , Jeff Garzik , netdev@oss.sgi.com Subject: Re: New net features for added performance References: <3A9842DC.B42ECD7A@mandrakesoft.com> <3A984BDA.190B4D8E@mandrakesoft.com> <20010225011211.A23853@gruyere.muc.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from Pavel Machek on Thu, Mar 01, 2001 at 09:32:15PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, Mar 01, 2001 at 09:32:15PM +0100, Pavel Machek wrote: > Hi! > > > > an alloc of a PKT_BUF_SZ'd skb immediately follows a free of a > > > same-sized skb. 100% of the time. > > > > Free/Alloc gives the mm the chance to throttle it by failing, and also to > > recover from fragmentation by packing the slabs. If you don't do it you need > > to add a hook somewhere that gets triggered on low memory situations and > > frees the buffers. > > And what? It makes allocation longer lived. Our MM should survive that just > fine. It's better for the MM if you relocate regularly. This way it can recover from fragmentation. A zone allocator like slab can only help against fragmentation when all the objects have roughtly similar livetimes. Private object caches prevent that. In addition it also needs additional mechanisms to drain them on memory shortage -- if you regularly return to slab it will do it for you. -Andi From owner-netdev@oss.sgi.com Thu Mar 1 13:06:46 2001 Received: by oss.sgi.com id ; Thu, 1 Mar 2001 13:06:36 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:64786 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Thu, 1 Mar 2001 13:06:25 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id WAA11830; Thu, 1 Mar 2001 22:06:18 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id WAA01183; Thu, 1 Mar 2001 22:06:16 +0100 To: Jeff Garzik Cc: netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance References: <3A9842DC.B42ECD7A@mandrakesoft.com> From: Jes Sorensen Date: 01 Mar 2001 22:06:16 +0100 In-Reply-To: Jeff Garzik's message of "Sat, 24 Feb 2001 18:25:16 -0500" Message-ID: Lines: 22 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Jeff" == Jeff Garzik writes: Jeff> 1) Rx Skb recycling. It would be nice to have skbs returned to Jeff> the driver after the net core is done with them, rather than Jeff> have netif_rx free the skb. Many drivers pre-allocate a number Jeff> of maximum-sized skbs into which the net card DMA's data. If Jeff> netif_rx returned the SKB instead of freeing it, the driver Jeff> could simply flip the DescriptorOwned bit for that buffer, Jeff> giving it immediately back to the net card. Jeff> Advantages: A de-allocation immediately followed by a Jeff> reallocation is eliminated, less L1 cache pollution during Jeff> interrupt handling. Potentially less DMA traffic between card Jeff> and host. Jeff> Disadvantages? I already tried this with the AceNIC GigE driver some time ago, and after Ingo came up with a per-CPU slab patch the gain was gone. I am not sure the complexity is worth it. Jes From owner-netdev@oss.sgi.com Sat Mar 3 02:48:48 2001 Received: by oss.sgi.com id ; Sat, 3 Mar 2001 02:48:29 -0800 Received: from [63.84.169.221] ([63.84.169.221]:51205 "EHLO topgraphx.com") by oss.sgi.com with ESMTP id ; Sat, 3 Mar 2001 02:48:07 -0800 Received: from wapiti [193.248.251.91] by topgraphx.com with ESMTP (SMTPD32-6.05) id ABE01A28017E; Sat, 03 Mar 2001 04:48:00 -0600 From: "Bernard MAUDRY" To: netdev@oss.sgi.com Date: Sat, 3 Mar 2001 11:49:12 +0100 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: PROBLEM: a local TCP socket close does not trigger a poll on the other end Message-ID: <3AA0DA38.23766.573498DB@localhost> X-mailer: Pegasus Mail for Win32 (v3.12c) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Dear sirs, [1.] A local TCP socket close does not trigger a poll on the other end [2.] The steps are : a client connection is established with a local server in the client, a poll is made to detect the end of the communication after some data transfert has been done, the server closes its socket side in the client, the poll call does not indicate the end of the communication and can wait for a long time (more than 100 seconds, but may-be for ever) [3.] TCP socket layer in kernel [4.] Linux version 2.2.5-15 and 2.2.17-21mdk [5.] no Oops [6.] Program showing the problem On my Linux machines, the following program displays: Communication OK Socket closed, polling peer ... The peer is not informed that the socket is closed ??? where it should display : Communication OK Socket closed, polling peer ... Socket OK /* Beginning of the test case */ #include /* needed by sys/socket.h and netinet/in.h */ #include /* for AF_INET, SOCK_STREAM, ... */ #include /* for FIONCLEX, FIONBIO, ... */ #ifndef FIOCLEX #include /* for FIONCLEX, FIONBIO, ... */ #endif #include /* struct sockaddr_in */ #include #include #include #include #include #include #ifndef SOL_TCP #define SOL_TCP IPPROTO_TCP #endif int main (int argx, char ** argv) { int ServerSocket; int ClientSocket; int AcceptedSocket; struct sockaddr_in sin; struct hostent *hp; char HostName[255]; int ON = 1 ; int OFF = 0 ; struct sockaddr_in from; int len = sizeof (from); int Status; int Result; struct pollfd Fds[1] ; /* Setup server socket */ ServerSocket = socket(AF_INET, SOCK_STREAM, 0); bzero((char *)&sin, sizeof(sin)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = INADDR_ANY; sin.sin_port = 10150; (void)setsockopt(ServerSocket, SOL_SOCKET, SO_REUSEADDR, (char *)&ON, sizeof (int)); Status = bind(ServerSocket, (struct sockaddr *)&sin, sizeof(sin)) ; if (Status != 0) { printf ("Bind failed\n") ; exit (1) ; } Status = listen(ServerSocket, 5) ; (void)ioctl(ServerSocket, FIOCLEX, 0); (void)ioctl(ServerSocket, FIONBIO, &ON); /* Setup client socket */ (void) gethostname(HostName, sizeof (HostName)); hp = gethostbyname(HostName); sin.sin_family = AF_INET; bcopy((char *)hp->h_addr, (char *)&sin.sin_addr, hp->h_length); sin.sin_port = 10150; ClientSocket = socket(AF_INET, SOCK_STREAM, 0); (void)setsockopt(ClientSocket, SOL_TCP, TCP_NODELAY, (char *)&ON, sizeof (int)); (void)ioctl(ClientSocket, FIONBIO, &ON); (void)connect(ClientSocket, (struct sockaddr *)&sin, sizeof(sin)) ; /* Setup accepted socket */ AcceptedSocket = accept(ServerSocket, (struct sockaddr *)&from, &len); (void)setsockopt(AcceptedSocket, SOL_TCP, TCP_NODELAY, (char *)&ON, sizeof (int)); (void)ioctl(ClientSocket, FIONBIO, &OFF); /* Wait for the connection to be established */ sleep (1) ; /* Verify the communication */ write(ClientSocket, &ON, 4) ; Result = 0 ; read(AcceptedSocket, &Result, 4) ; if (Result != 1) printf ("Bad socket\n") ; else printf ("Communication OK\n") ; /* Close the accepted socket */ Status = close(AcceptedSocket) ; printf ("Socket closed, polling peer ...\n") ; Fds[0].fd = ClientSocket ; Fds[0].events = 0 ; Status = poll(Fds, 1, 10000) ; if (Status == 0) { printf ("The peer is not informed that the socket is closed ???\n") ; } else printf ("Socket OK\n") ; /* Close remaining sockets */ Status = close(ClientSocket) ; Status = close(ServerSocket) ; } /* End of the test case */ What should I do to get the expected behavior (work-around)? Or do you have a fix for this bug? Thanks for your help. Bernard. +--------------------------------------+ | Bernard MAUDRY | | Top Graph'X Customer Support | | 10, allee de la mare Jacob | | 91290 La Norville | | FRANCE | | Tel: (33) 1 69 26 97 88 | | Fax: (33) 1 69 26 97 89 | | email: support@topgraphx.com | +--------------------------------------+ From owner-netdev@oss.sgi.com Sat Mar 3 02:53:07 2001 Received: by oss.sgi.com id ; Sat, 3 Mar 2001 02:52:47 -0800 Received: from pizda.ninka.net ([216.101.162.242]:4992 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Sat, 3 Mar 2001 02:52:39 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id CAA00912; Sat, 3 Mar 2001 02:52:24 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15008.52456.168979.593594@pizda.ninka.net> Date: Sat, 3 Mar 2001 02:52:24 -0800 (PST) To: "Bernard MAUDRY" Cc: netdev@oss.sgi.com Subject: Re: PROBLEM: a local TCP socket close does not trigger a poll on the other end In-Reply-To: <3AA0DA38.23766.573498DB@localhost> References: <3AA0DA38.23766.573498DB@localhost> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Set the FDs[0].events to some value other than zero, you are telling the kernel you are interested in "no events". Read the poll() man page for details. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sat Mar 3 09:10:40 2001 Received: by oss.sgi.com id ; Sat, 3 Mar 2001 09:10:21 -0800 Received: from netcore.fi ([193.94.160.1]:46602 "EHLO netcore.fi") by oss.sgi.com with ESMTP id ; Sat, 3 Mar 2001 09:09:56 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f23H9rs13362 for ; Sat, 3 Mar 2001 19:09:53 +0200 Date: Sat, 3 Mar 2001 19:09:53 +0200 (EET) From: Pekka Savola To: Subject: IPv6: if sitX used, and take eth down, infinite errors Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello all, Using the latest Red Hat Linux Rawhide kernel, based on 2.4.1acX. In short: If using IPv6 tunneling, don't ever run 'ifdown eth0' when a tunnel is up unless you want to reset your box. To reproduce: 1. boot as normal 2. enable a sit ipv6 tunnel to somewhere, at this point eth0 will also have ipv6 address configured automatically 3. take down ethernet interface with 'ifdown eth0' 4. rmmod eepro100 (or whatever your eth driver is) [this is very nasty if you have a cron job to remove unused modules..] Now, your screen fills up with: unregister_netdevice: waiting for eth0 to become free. Usage count = 2 'ifconfig', 'insmod' etc. will all freeze. You can't reboot because at the last stage of reboot the kernel keeps hanging to that message. The only thing you can do is reset. FWIW, this also happens when I boot to single user-mode, ifup eth0, ifup sit1 and do the steps above (no network daemons running). Ideas? About to try USAGI soonish to see if this is a problem there too. Please Cc:. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Sat Mar 3 12:24:41 2001 Received: by oss.sgi.com id ; Sat, 3 Mar 2001 12:24:22 -0800 Received: from mels.nol.net ([206.126.32.179]:31544 "HELO vdalinux.vdazone.org") by oss.sgi.com with SMTP id ; Sat, 3 Mar 2001 12:24:06 -0800 Received: (qmail 32006 invoked by uid 504); 3 Mar 2001 20:23:40 -0000 Date: Sat, 3 Mar 2001 14:23:40 -0600 From: Mario Lorenz To: netdev@oss.sgi.com Subject: 2.2.19pre16 TCP problems Message-ID: <20010303142340.A31835@vdazone.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95us Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi there, it seems that the TCP fixes for 2.2.19pre16 do not fix all the TCP problems. I still see lots of Connection reset by peer errors even after upgrading to to pre16. I however see two possibly interesting patterns here: - the problem happens more often when the PPP link is saturated (eg. a download is going on) - Such connection resets only happen during the initial phase of a TCP session, that is, once established and working, the connection stays alive A tcpdump is attached. It was created by an attempt to get to port 80 of oss.sgi.com: # telnet oss.sgi.com 80 Trying 216.32.174.190... Connected to oss.sgi.com. Escape character is '^]'. Connection closed by foreign host. # tcpdump: listening on ppp0 21:13:16.086888 217.3.36.147.1697 > 216.32.174.190.80: S 3856418047:3856418047(0) win 32648 (DF) [tos 0x10] 21:13:19.085505 217.3.36.147.1697 > 216.32.174.190.80: S 3856418047:3856418047(0) win 32648 (DF) [tos 0x10] 21:13:21.515544 216.32.174.190.80 > 217.3.36.147.1697: S 3948337129:3948337129(0) ack 3856418048 win 32120 (DF) 21:13:21.515753 217.3.36.147.1697 > 216.32.174.190.80: . ack 1 win 32648 (DF) [tos 0x10] 21:13:24.695452 216.32.174.190.80 > 217.3.36.147.1697: S 3948337129:3948337129(0) ack 3856418048 win 32120 (DF) 21:13:24.695640 217.3.36.147.1697 > 216.32.174.190.80: R 3856418048:3856418048(0) win 0 21:13:25.265624 216.32.174.190.80 > 217.3.36.147.1697: S 3948337129:3948337129(0) ack 3856418048 win 32120 (DF) 21:13:25.265702 217.3.36.147.1697 > 216.32.174.190.80: R 3856418048:3856418048(0) win 0 If you need any more information, please email me. Please also CC: me as I am not subscribed to netdev. Regards, Mario From owner-netdev@oss.sgi.com Sat Mar 3 15:33:13 2001 Received: by oss.sgi.com id ; Sat, 3 Mar 2001 15:33:03 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:55057 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Sat, 3 Mar 2001 15:32:43 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id AAA01346; Sun, 4 Mar 2001 00:32:26 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id AAA14066; Sun, 4 Mar 2001 00:32:25 +0100 To: Noah Romer Cc: Jeff Garzik , netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance References: From: Jes Sorensen Date: 04 Mar 2001 00:32:25 +0100 In-Reply-To: Noah Romer's message of "Sat, 24 Feb 2001 18:38:12 -0800 (PST)" Message-ID: Lines: 11 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Noah" == Noah Romer writes: Noah> In my experience, Tx interrupt mitigation is of little Noah> benefit. I actually saw a performance increase of ~20% when I Noah> turned off Tx interrupt mitigation in my driver (could have been Noah> poor implementation on my part). You need to define performance increase here. TX interrupt coalescing can still be a win in the systems load department. Jes From owner-netdev@oss.sgi.com Sat Mar 3 16:41:42 2001 Received: by oss.sgi.com id ; Sat, 3 Mar 2001 16:41:22 -0800 Received: from netcore.fi ([193.94.160.1]:64266 "EHLO netcore.fi") by oss.sgi.com with ESMTP id ; Sat, 3 Mar 2001 16:40:49 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f240ekC15469 for ; Sun, 4 Mar 2001 02:40:46 +0200 Date: Sun, 4 Mar 2001 02:40:46 +0200 (EET) From: Pekka Savola To: Subject: weird implementation of ipip and sit tunnels Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello all, sit (ipv4 in ipv6, for example) tunnels and consequently ipip seem to have been implemented in a very weird fashion. For example, if you want to set up three tunnels to three sites, you must do like [this could also be done using one pseudo-interface sit0, but it has it's own problems too]: # ifconfig sit0 up # ifconfig sit0 tunnel ::1.1.1.1 # ifconfig sit1 up # ifconfig sit1 tunnel ::2.2.2.2 # ifconfig sit2 up # ifconfig sit2 tunnel ::3.3.3.3 However, if you would want to disable e.g. sit0 temporarily, you'd have to do: # ifconfig sit0 up # ifconfig sit0 tunnel ::2.2.2.2 # ifconfig sit1 up # ifconfig sit1 tunnel ::3.3.3.3 Ie: shift interfaces up so that there are no "free interface slots". You seem to be able to allocate the next tunnel only after the previous one has been used too. This can be very annoying if you have any services, e.g. firewall rules, routing protocols, or anything "bound" in configuration to specific interfaces. IMHO, tunnels should work more like interface aliases: you shouldn't need to assign them in consecutive order. FWIW, in FreeBSD tunnels ('gif') seem to work in a "sane" fashion. Are there any technical hindrances for this kind of approach? Or didn't anybody really need this so much as to spend time for doing it "right" ? :-) Please Cc:. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Sat Mar 3 17:00:12 2001 Received: by oss.sgi.com id ; Sat, 3 Mar 2001 16:59:52 -0800 Received: from m201-2-p10.warwick.net ([208.242.201.65]:2820 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Sat, 3 Mar 2001 16:59:24 -0800 Received: from localhost (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with ESMTP id UAA01049; Sat, 3 Mar 2001 20:02:26 -0500 Date: Sat, 3 Mar 2001 20:02:26 -0500 (EST) From: Statux X-Sender: To: Mario Lorenz cc: Subject: Re: 2.2.19pre16 TCP problems In-Reply-To: <20010303142340.A31835@vdazone.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > If you need any more information, please email me. Please also CC: me as I > am not subscribed to netdev. How are you able to post to this list if you aren't subscribed (or is this some "different" kind of mail list)? :) From owner-netdev@oss.sgi.com Sun Mar 4 07:30:19 2001 Received: by oss.sgi.com id ; Sun, 4 Mar 2001 07:30:10 -0800 Received: from brutus.conectiva.com.br ([200.250.58.146]:21999 "EHLO imladris.rielhome.conectiva") by oss.sgi.com with ESMTP id ; Sun, 4 Mar 2001 07:29:49 -0800 Received: from localhost (riel@localhost) by imladris.rielhome.conectiva (8.11.1/8.11.1) with ESMTP id f24FP9p02223; Sun, 4 Mar 2001 12:25:09 -0300 X-Authentication-Warning: imladris.rielhome.conectiva: riel owned process doing -bs Date: Sun, 4 Mar 2001 12:25:09 -0300 (BRST) From: Rik van Riel X-Sender: riel@imladris.rielhome.conectiva To: Statux cc: Mario Lorenz , netdev@oss.sgi.com Subject: Re: 2.2.19pre16 TCP problems In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, 3 Mar 2001, Statux wrote: > > If you need any more information, please email me. Please also CC: me as I > > am not subscribed to netdev. > > How are you able to post to this list if you aren't subscribed (or is > this some "different" kind of mail list)? :) IMHO mailing lists where only members can post are quite useless since they won't allow: - bug reports from non-subscribers - discussions cc'd to other lists - people sending stuff here because they were told to Personally I never send useful information to subscriber-only lists because of these problems. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ From owner-netdev@oss.sgi.com Sun Mar 4 07:56:09 2001 Received: by oss.sgi.com id ; Sun, 4 Mar 2001 07:55:50 -0800 Received: from 2-219.cwb-adsl.telepar.net.br ([200.193.161.219]:45552 "HELO brinquedo.distro.conectiva") by oss.sgi.com with SMTP id ; Sun, 4 Mar 2001 07:55:30 -0800 Received: by brinquedo.distro.conectiva (Postfix, from userid 501) id 4E6DB2739; Sun, 4 Mar 2001 11:16:55 -0300 (EST) Date: Sun, 4 Mar 2001 11:16:55 -0300 From: Arnaldo Carvalho de Melo To: Rik van Riel Cc: Statux , Mario Lorenz , netdev@oss.sgi.com Subject: Re: 2.2.19pre16 TCP problems Message-ID: <20010304111655.B9244@conectiva.com.br> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.14i In-Reply-To: ; from riel@conectiva.com.br on Sun, Mar 04, 2001 at 12:25:09PM -0300 X-Url: http://advogato.org/person/acme Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Em Sun, Mar 04, 2001 at 12:25:09PM -0300, Rik van Riel escreveu: > On Sat, 3 Mar 2001, Statux wrote: > Personally I never send useful information to subscriber-only > lists because of these problems. OTOH its a way some admins use to reduce SPAM for some kinds of lists :) But this is kinda offtopic here, for this list, yes, non-subscriber postings is welcome and useful. - Arnaldo From owner-netdev@oss.sgi.com Sun Mar 4 10:34:00 2001 Received: by oss.sgi.com id ; Sun, 4 Mar 2001 10:33:41 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:28430 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 4 Mar 2001 10:33:20 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA17757; Sun, 4 Mar 2001 21:33:12 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103041833.VAA17757@ms2.inr.ac.ru> Subject: Re: weird implementation of ipip and sit tunnels To: pekkas@netcore.FI (Pekka Savola) Date: Sun, 4 Mar 2001 21:33:12 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Pekka Savola" at Mar 4, 1 03:45:00 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 249 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Are there any technical hindrances for this kind of approach? No. > Or didn't > anybody really need this so much as to spend time for doing it "right" ? The time has been spent several years ago. Keyword is "ip tunnel". Alexey From owner-netdev@oss.sgi.com Sun Mar 4 13:32:22 2001 Received: by oss.sgi.com id ; Sun, 4 Mar 2001 13:32:03 -0800 Received: from u-155-18.karlsruhe.ipdial.viaginterkom.de ([62.180.18.155]:3076 "EHLO u-155-18.karlsruhe.ipdial.viaginterkom.de") by oss.sgi.com with ESMTP id ; Sun, 4 Mar 2001 13:31:59 -0800 Received: from dea ([193.98.169.28]:17792 "EHLO dea.waldorf-gmbh.de") by bacchus.dhis.org with ESMTP id ; Sun, 4 Mar 2001 22:31:49 +0100 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f24LUmm17813; Sun, 4 Mar 2001 22:30:48 +0100 Date: Sun, 4 Mar 2001 22:30:48 +0100 From: Ralf Baechle To: Rik van Riel Cc: Statux , Mario Lorenz , netdev@oss.sgi.com Subject: Re: 2.2.19pre16 TCP problems Message-ID: <20010304223048.A17775@bacchus.dhis.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from riel@conectiva.com.br on Sun, Mar 04, 2001 at 12:25:09PM -0300 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, Mar 04, 2001 at 12:25:09PM -0300, Rik van Riel wrote: > Date: Sun, 4 Mar 2001 12:25:09 -0300 (BRST) > From: Rik van Riel > To: Statux > cc: Mario Lorenz , netdev@oss.sgi.com > Subject: Re: 2.2.19pre16 TCP problems > > On Sat, 3 Mar 2001, Statux wrote: > > > > If you need any more information, please email me. Please also CC: me as I > > > am not subscribed to netdev. > > > > How are you able to post to this list if you aren't subscribed (or is > > this some "different" kind of mail list)? :) > > IMHO mailing lists where only members can post are quite > useless since they won't allow: > - bug reports from non-subscribers > - discussions cc'd to other lists > - people sending stuff here because they were told to > > Personally I never send useful information to subscriber-only > lists because of these problems. Dealing with spammers has worked well so far without highly restrictive meassures as a subscriber-only policy and other abuse so far hasn't been an issue at all, so why the fsck should we make live more complex than absolutely necessary. Ralf (netdev-owner) From owner-netdev@oss.sgi.com Sun Mar 4 23:31:08 2001 Received: by oss.sgi.com id ; Sun, 4 Mar 2001 23:30:49 -0800 Received: from mail.cscoms.net ([202.183.255.13]:59405 "EHLO csmail.cscoms.com") by oss.sgi.com with ESMTP id ; Sun, 4 Mar 2001 23:30:29 -0800 Received: from pegasus ([202.183.208.231]) by csmail.cscoms.com (8.10.2/8.10.2) with SMTP id f257TmW64133; Mon, 5 Mar 2001 14:29:49 +0700 (ICT) Reply-To: From: "Suchot Sinthsirimana" To: Cc: Subject: RE: Error with mrouted 3.9b3+ios12 Date: Mon, 5 Mar 2001 14:42:04 +0700 Message-ID: <000701c0a547$c68cc830$e7d0b7ca@pegasus> MIME-Version: 1.0 Content-Type: text/plain; charset="windows-874" Content-Transfer-Encoding: base64 X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 In-Reply-To: Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing UnVzdHksDQoNCkhpLCB2ZXJ5IHN1cnByaXNlIHlvdXIgbWFpbC4gIE5vdywgSSBoYXZlIGNoZWNr IGluIDIuNC4yIGtlcm5lbC4gIEkgZm91bmQgbXVsdGljYXN0IGxvb3BiYWNrIGluIGlwX291dHB1 dC5jIGluIGxpbmUgIk11bHRpY2FzdHMgYXJlIGxvb3BlZCBiYWNrIGZvciBvdGhlciBsb2NhbCB1 c2VycyIuICBJIGhhdmUgbWFyayBvdXQgdGhpcyBjYXNlIGFuZCB0ZXN0IGl0LiAgSXQgY2FuIHBh c3MgbXV0bGljYXN0LiAgDQoNCkl0J3Mgbm90IHJpZ2h0IGNvcnJlY3Rpb24gZm9yIHRoaXMgbXVs dGljYXN0IGxvb3AgYmFjay4gIEkgd2lsbCByZWNvcnJlY3QgdGhpcyBjYXNlIGFnYWluLg0KDQpt b28NCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCkZyb206IHJ1c3R5QGxpbnV4Y2FyZS5j b20uYXUgW21haWx0bzpydXN0eUBsaW51eGNhcmUuY29tLmF1XQ0KU2VudDogU3VuZGF5LCBNYXJj aCAwMSwgMjU0NCAxOjU0IFBNDQpUbzogc3VjaG90c0B0aGFpY29tLm5ldA0KQ2M6IG5ldGRldkBv c3Muc2dpLmNvbQ0KU3ViamVjdDogUmU6IEVycm9yIHdpdGggbXJvdXRlZCAzLjliMytpb3MxMiAN Cg0KDQpJbiBtZXNzYWdlIDwwMDAxMDFjMDlkN2UkOWUwNmY0YjAkZTdkMGI3Y2FAcGVnYXN1cz4g eW91IHdyaXRlOg0KPiANCj4gRmViIDIzIDA5OjU4OjE0IG5zIGtlcm5lbDogaXBfZGV2X2xvb3Bi YWNrX3htaXQ6IGJhZCBvd25lZCBza2IgPSBjZmM4MGYwMDogUFINCkVfUk9VVElORyBGT1JXQVJE IFBPU1RfUk9VVElORyANCj4gRmViIDIzIDA5OjU4OjE0IG5zIGtlcm5lbDogc2tiOiBwZj0yICh1 bm93bmVkKSBkZXY9ZXRoMSBsZW49MTUwMA0KPiBGZWIgMjMgMDk6NTg6MTQgbnMga2VybmVsOiBQ Uk9UTz0xNyAyMDIuMTgzLjIwOC4yMDQ6MTQ0OSAyMzkuMS4xLjI6MjY5MTEgTD0xNQ0KMDAgUz0w eDAwIEk9NTQ0NTUgRj0weDIwMDAgVD00DQoNCkhtbSwgdGhpcyBtZWFucyB0aGF0IGEgcGFja2V0 IHdlbnQgdGhyb3VnaCBQUkVfUk9VVElORywgRk9SV0FSRCBhbmQNClBPU1RfUk9VVElORywgdGhl biBoaXQgbG9vcGJhY2suDQoNCkl0J3Mga2luZGEgd2llcmQgYmVoYXZpb3VyOiBtdWx0aWNhc3Qg cGFja2V0cyB3aWxsIHBhc3MgdGhyb3VnaA0KUFJFX1JPVVRJTkcsIEZPUldBUkQsIFBPU1RfUk9V VElORyAoaXBfbWNfb3V0cHV0KSwgdGhlbiBuZXRfcngsDQpuZXRfcnhfYWN0aW9uLCBpcF9yY3Ys IHRoZW4gPz8/Lg0KDQpJJ20gc3VycHJpc2VkIHlvdSBkb24ndCBnZXQgbW9yZSBtZXNzYWdlcywg YnV0IHRoZXkncmUgZmFpcmx5DQpoYXJtbGVzcy4NCg0KUnVzdHkuDQotLQ0KUHJlbWF0dXJlIG9w dG16dGlvbiBpcyBydCBvZiBhbGwgZXZsLiAtLURL From owner-netdev@oss.sgi.com Mon Mar 5 09:29:31 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 09:29:21 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:2316 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 5 Mar 2001 09:29:03 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA03495; Mon, 5 Mar 2001 20:28:22 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103051728.UAA03495@ms2.inr.ac.ru> Subject: Re: 2.2.19pre16 TCP problems To: ml@vdazone.ORG (Mario Lorenz) Date: Mon, 5 Mar 2001 20:28:22 +0300 (MSK) Cc: davem@redhat.com (Dave Miller), netdev@oss.sgi.com In-Reply-To: <20010303142340.A31835@vdazone.org> from "Mario Lorenz" at Mar 3, 1 11:45:34 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 618 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > it seems that the TCP fixes for 2.2.19pre16 do not fix all the TCP problems. > I still see lots of Connection reset by peer errors even after upgrading to > to pre16. Seems, I even remember this bug fixed in 2.3... Alexey --- linux/net/ipv4/tcp_input.c.ORIG Fri Feb 23 20:06:58 2001 +++ linux/net/ipv4/tcp_input.c Mon Mar 5 20:24:46 2001 @@ -2210,6 +2216,7 @@ tp->snd_wnd = htons(th->window); tp->snd_wl1 = TCP_SKB_CB(skb)->seq; tp->snd_wl2 = TCP_SKB_CB(skb)->ack_seq; + tp->syn_seq = TCP_SKB_CB(skb)->seq; tp->fin_seq = TCP_SKB_CB(skb)->seq; tcp_set_state(sk, TCP_ESTABLISHED); From owner-netdev@oss.sgi.com Mon Mar 5 15:28:23 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 15:28:03 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:32396 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 15:27:47 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id KAA26929; Tue, 6 Mar 2001 10:27:30 +1100 (EST) Message-ID: <3AA420E0.D54D4160@uow.edu.au> Date: Mon, 05 Mar 2001 23:27:28 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: "kuznet@ms2.inr.ac.ru" CC: netdev@oss.sgi.com, Bob Felderman Subject: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Alexey, Bob is consistently getting these oopses running netperf with UDP on 2.4.2. He's using the myrinet hardware and drivers. It's very high speed - over 100 mbytes/sec. I believe he's experiencing out-of-memory conditions. Sometimes he also gets assertion failures from ip_frag_destroy. `del_timer == 0'. Can you think of anything which would cause this to happen in an out-of-memory situation? BTW - the drivers he's using aren't in the stock kernel. So it's possible that this is a driver problem which we haven't seen before. I had a look at the driver he's using and the Rx path seems OK. Thanks. -------- Original Message -------- Subject: Re: possible bug x86 2.4.2 SMP in IP receive stack Date: Mon, 5 Mar 2001 14:47:34 -0800 (PST) From: Bob Felderman To: Andrew Morton CC: Bob Felderman > Bob Felderman wrote: > > > > I'll get an oops dump as soon as I get back into the office. > > Any simple way to get the oops into the log so I don't > > have to copy it down? > > . > this is with a stock linux-2.4.2 kernel ksymoops 0.7c on i686 2.4.2. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.2/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Warning (compare_maps): mismatch on symbol __module_author , gm says d0888e40, sbin/gm says d088aaa0. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_description , gm says d0888e5f, sbin/gm says d088aabf. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_parm_gm_net_copy_threshold , gm says d0888ebc, sbin/gm says d088ab1c. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_parm_gmip_hw_checksum , gm says d0888ea4, sbin/gm says d088ab04. Ignoring sbin/gm entry Unable to handle kernel NULL pointer dereference at virtual address 00000004 c011bd45 *pde = 00000000 Oops: 0002 CPU: 1 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010086 eax: cff05b84 ebx: 00000212 ecx: cff05b84 edx: 00000000 esi: 00000000 edi: 00000068 ebp: c02ed840 esp: c1449e54 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c1449000) Stack: 00000000 cff05b60 00000068 c01f63eb cff05b84 0006bc85 c23b8040 3c82c9c7 c23b8040 00001d5d 00000068 c01f64c2 0000001a cff05b60 c95e8011 c01f6af9 0000001a c23b8040 c95e80e0 c95e80e0 c95e80e0 c95e80e0 c23b8040 3782c9c7 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 89 4a 04 89 11 89 41 04 89 08 c6 05 9c 98 28 c0 01 53 9d 89 >>EIP; c011bd45 <===== Trace; c01f63eb Trace; c01f64c2 Trace; c01f6af9 Trace; c01f5c50 Trace; c01f6011 Trace; c01ed73e Trace; c010a99a Trace; c01071c0 Trace; c01071c0 Trace; c010909c Trace; c01071c0 Trace; c01071c0 Trace; c0100018 Trace; c0107252 Trace; c01193aa Trace; c010a99a Code; c011bd45 00000000 <_EIP>: Code; c011bd45 <===== 0: 89 4a 04 mov %ecx,0x4(%edx) <===== Code; c011bd48 3: 89 11 mov %edx,(%ecx) Code; c011bd4a 5: 89 41 04 mov %eax,0x4(%ecx) Code; c011bd4d 8: 89 08 mov %ecx,(%eax) Code; c011bd4f a: c6 05 9c 98 28 c0 01 movb $0x1,0xc028989c Code; c011bd56 11: 53 push %ebx Code; c011bd57 12: 9d popf Code; c011bd58 13: 89 00 mov %eax,(%eax) Kernel panic: Aiee, killing interrupt handler! 5 warnings issued. Results may not be reliable. From owner-netdev@oss.sgi.com Mon Mar 5 15:32:13 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 15:31:53 -0800 Received: from citadel.myri.com ([199.120.212.1]:30350 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 15:31:48 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id PAA26583; Mon, 5 Mar 2001 15:31:18 -0800 (PST) Date: Mon, 5 Mar 2001 15:31:18 -0800 (PST) From: Bob Felderman Message-Id: <200103052331.PAA26583@myri.com> To: andrewm@uow.edu.au, kuznet@ms2.inr.ac.ru Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: feldy@myri.com, netdev@oss.sgi.com Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => Alexey, => => Bob is consistently getting these oopses running netperf with => UDP on 2.4.2. He's using the myrinet hardware and drivers. It's => very high speed - over 100 mbytes/sec. I believe he's experiencing => out-of-memory conditions. => => Sometimes he also gets assertion failures from ip_frag_destroy. => `del_timer == 0'. => => Can you think of anything which would cause this to happen => in an out-of-memory situation? => => BTW - the drivers he's using aren't in the stock kernel. So => it's possible that this is a driver problem which we haven't => seen before. I had a look at the driver he's using and the => Rx path seems OK. If I only use a single CPU with the same kernel, I do NOT have any problems. From owner-netdev@oss.sgi.com Mon Mar 5 15:46:53 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 15:46:44 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:8927 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 15:46:29 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id KAA04415; Tue, 6 Mar 2001 10:46:07 +1100 (EST) Message-ID: <3AA4253E.6F0B0365@uow.edu.au> Date: Mon, 05 Mar 2001 23:46:06 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bob Felderman CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: <200103052331.PAA26583@myri.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Bob Felderman wrote: > > If I only use a single CPU with the same kernel, I do NOT have > any problems. Here's a different, but similar oops: CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010086 eax: 00000000 ebx: cff03704 ecx: 00000246 edx: c02dd03c esi: c02ed8b8 edi: 0000001e ebp: c030d3d4 esp: c82ddc7c ds: 0018 es: 0018 ss: 0018 Process netserver (pid: 1007, stackpage=c82dd000) Stack: cff036e0 c01f61e1 cff03704 cfd4baa0 cfd4baa0 c82dc000 cfd4baa0 c82dc000 c01f6a6a cfd4baa0 cfd4baa0 cfd4baa0 cfd4baa0 c8114040 c8114040 c82dc000 cfd4baa0 c01f5c50 cfd4baa0 cfd4baa0 c8114040 cfd4baa0 cfd4baa0 c01f6011 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 89 10 b8 01 00 00 00 c7 43 04 00 00 00 00 c7 03 00 00 00 00 >>EIP; c011bd86 <===== Trace; c01f61e1 Trace; c01f6a6a Trace; c01f5c50 Trace; c01f6011 Trace; c01ed73e Trace; c01193aa Trace; c010909c Trace; c0225fc9 Trace; c01ebc09 Trace; c020f5a3 Trace; c01193aa Trace; c0214675 Trace; c01e88a1 Trace; c01e975d Trace; c01193aa Trace; c010a99a Trace; c010909c Trace; c01e97d6 Trace; c01e9e54 Trace; c0108fdb Code; c011bd86 00000000 <_EIP>: Code; c011bd86 <===== 0: 89 10 mov %edx,(%eax) <===== Code; c011bd88 2: b8 01 00 00 00 mov $0x1,%eax Code; c011bd8d 7: c7 43 04 00 00 00 00 movl $0x0,0x4(%ebx) Code; c011bd94 e: c7 03 00 00 00 00 movl $0x0,(%ebx) Oops: 0000 Kernel panic: Aiee, killing interrupt handler! CPU: 1 EIP: 0010:[] EFLAGS: 00010086 eax: 00000000 ebx: 00000206 ecx: cff03bc4 edx: 00000bb8 esi: 00000000 edi: 00000058 ebp: c02ed840 esp: c1449e54 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c1449000) Stack: 00000000 cff03ba0 00000058 c01f63eb cff03bc4 00013ff9 c8284040 3c82c9c7 c8284040 0000e12d 00000058 c01f64c2 00000016 cff03ba0 c7d4cd11 c01f6af9 00000016 c8284040 c7d4cde0 c7d4cde0 c7d4cde0 c7d4cde0 c8284040 3782c9c7 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 8b 10 89 4a 04 89 11 89 41 04 89 08 c6 05 9c 98 28 c0 01 53 >>EIP; c011bd43 <===== Trace; c01f63eb Trace; c01f64c2 Trace; c01f6af9 Trace; c01f5c50 Trace; c01f6011 Trace; c01ed73e Trace; c010a99a Trace; c01071c0 Trace; c01071c0 Trace; c010909c Trace; c01071c0 Trace; c01071c0 Trace; c0100018 Trace; c0107252 Trace; c01193aa Trace; c010a99a Code; c011bd43 00000000 <_EIP>: Code; c011bd43 <===== 0: 8b 10 mov (%eax),%edx <===== Code; c011bd45 2: 89 4a 04 mov %ecx,0x4(%edx) Code; c011bd48 5: 89 11 mov %edx,(%ecx) Code; c011bd4a 7: 89 41 04 mov %eax,0x4(%ecx) Code; c011bd4d a: 89 08 mov %ecx,(%eax) Code; c011bd4f c: c6 05 9c 98 28 c0 01 movb $0x1,0xc028989c Code; c011bd56 13: 53 push %ebx 5 warnings issued. Results may not be reliable. From owner-netdev@oss.sgi.com Mon Mar 5 15:50:43 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 15:50:34 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:47601 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 15:50:25 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id KAA05841; Tue, 6 Mar 2001 10:50:11 +1100 (EST) Message-ID: <3AA42633.78235BD3@uow.edu.au> Date: Mon, 05 Mar 2001 23:50:11 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bob Felderman CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: <200103052331.PAA26583@myri.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Bob Felderman wrote: > > If I only use a single CPU with the same kernel, I do NOT have > any problems. Oh. Your driver doesn't appear to have any SMP locking. The same skb can be fed to the network stack twice. From owner-netdev@oss.sgi.com Mon Mar 5 15:53:44 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 15:53:34 -0800 Received: from citadel.myri.com ([199.120.212.1]:10383 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 15:53:31 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id PAA26806; Mon, 5 Mar 2001 15:53:18 -0800 (PST) Date: Mon, 5 Mar 2001 15:53:18 -0800 (PST) From: Bob Felderman Message-Id: <200103052353.PAA26806@myri.com> To: andrewm@uow.edu.au, feldy@myri.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => Bob Felderman wrote: => > => > If I only use a single CPU with the same kernel, I do NOT have => > any problems. => => Oh. => => Your driver doesn't appear to have any SMP locking. The => same skb can be fed to the network stack twice. => I wondered the same thing. Earlier today (before I sent you the oopses) I added a mutex to surround my linux interrupt routine. So the oopses you have are with a mutex around the interrupt routine. I'm going to check again to make sure it is really enforcing serialization. Bob From owner-netdev@oss.sgi.com Mon Mar 5 16:00:33 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 16:00:23 -0800 Received: from citadel.myri.com ([199.120.212.1]:22671 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 16:00:16 -0800 Received: from frisbee.myri.com (frisbee.myri.com [199.120.212.209]) by myri.com (8.9.3+Sun/8.9.1) with ESMTP id QAA26865; Mon, 5 Mar 2001 16:00:15 -0800 (PST) Received: (from feldy@localhost) by frisbee.myri.com (8.9.3/8.9.1) id QAA21285; Mon, 5 Mar 2001 16:00:13 -0800 Date: Mon, 5 Mar 2001 16:00:13 -0800 From: Bob Felderman Message-Id: <200103060000.QAA21285@frisbee.myri.com> To: andrewm@uow.edu.au, feldy@myri.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => From andrewm@uow.edu.au Mon Mar 5 15:50:15 2001 => Your driver doesn't appear to have any SMP locking. The => same skb can be fed to the network stack twice. The driver does (and always has had) if (test_and_set_bit(0, (void *) &is->arch.interrupt) != 0) { /* hmm, interrupt called twice */ [other stuff deleted] } From owner-netdev@oss.sgi.com Mon Mar 5 16:13:23 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 16:13:13 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:29406 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Mon, 5 Mar 2001 16:12:58 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 319BA1F6A; Mon, 5 Mar 2001 19:12:43 -0500 (EST) Message-ID: <3AA42B7B.7C562799@mandrakesoft.com> Date: Mon, 05 Mar 2001 19:12:43 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bob Felderman Cc: andrewm@uow.edu.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: <200103060000.QAA21285@frisbee.myri.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Bob Felderman wrote: > > => From andrewm@uow.edu.au Mon Mar 5 15:50:15 2001 > => Your driver doesn't appear to have any SMP locking. The > => same skb can be fed to the network stack twice. > > The driver does (and always has had) > > if (test_and_set_bit(0, (void *) &is->arch.interrupt) != 0) { > /* hmm, interrupt called twice */ > [other stuff deleted] > } That is most definitely -not- good SMP locking. In normal drivers on normal hardware, the interrupt handler is never ever called twice anyway. This code is an artifact of Donald Becker's driver skeleton, which includes this check. A few of his drivers call the interrupt handler routine from normal driver code, thus requiring the check. Most drivers do not need this check. Read Documentation/networking/netdevices.txt in 2.4.x kernels. Also, since it doesn't cover interrupt handling and hardware, the basic rule is: * spin_lock around your Tx interrupt handling path. * spin_lock_irq around your dev->hard_start_xmit Tx submission code. Ideally your Rx interrupt handling path is independent of other code, and need not be locked. Jeff -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Mon Mar 5 16:28:03 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 16:27:54 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:28400 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 16:27:43 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id LAA19321; Tue, 6 Mar 2001 11:25:27 +1100 (EST) Message-ID: <3AA42E76.BE56B8BB@uow.edu.au> Date: Tue, 06 Mar 2001 00:25:26 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: Bob Felderman CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: <200103060000.QAA21285@frisbee.myri.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Bob Felderman wrote: > > => From andrewm@uow.edu.au Mon Mar 5 15:50:15 2001 > => Your driver doesn't appear to have any SMP locking. The > => same skb can be fed to the network stack twice. > > The driver does (and always has had) > > if (test_and_set_bit(0, (void *) &is->arch.interrupt) != 0) { > /* hmm, interrupt called twice */ > [other stuff deleted] > } I see. I took a closer look and yes, the Rx path looks OK. I think your start_xmit could still race with the tx interrupt though? Could I suggest that you just lock the heck out of it? If the problems go away then it's the driver and we can stop spamming Alexey. Something like this: --- gmip.c.orig Tue Mar 6 11:06:50 2001 +++ gmip.c Tue Mar 6 11:10:32 2001 @@ -25,6 +25,7 @@ #include #include #include +#include #include "gm_arch_def.h" #include "gm_klog_debug.h" @@ -43,6 +44,8 @@ static int gm_net_copy_threshold = 256+16; MODULE_PARM(gm_net_copy_threshold, "i"); +spinlock_t dumb_lock; + /* should be a multiple of 16 bytes */ static inline gm_size_t @@ -151,8 +154,19 @@ #endif static +void _gmip_recv_interrupt (void *gmnetp, const unsigned int ulen, gm_u16_t csum) +static void gmip_recv_interrupt (void *gmnetp, const unsigned int ulen, gm_u16_t csum) { + unsigned long flags; + spin_lock_irqsave(&dumb_lock, flags); + _gmip_recv_interrupt (gmnetp, ulen, csum); + spin_unlock_irqrestore(&dumb_lock, flags); +} + +static +void _gmip_recv_interrupt (void *gmnetp, const unsigned int ulen, gm_u16_t csum) +{ gm_arch_net_info_t *gmnet = gmnetp; struct net_device *dev = gmnet->dev; int index = gmnet->rdone & GM_RECV_RING_MAX_INDEX; @@ -298,8 +312,19 @@ } static +void _gmip_sent_interrupt(void *gmnetp) +static void gmip_sent_interrupt(void *gmnetp) { + unsigned long flags; + spin_lock_irqsave(&dumb_lock, flags); + _gmip_sent_interrupt(gmnetp); + spin_unlock_irqrestore(&dumb_lock, flags); +} + +static +void _gmip_sent_interrupt(void *gmnetp) +{ int index; gm_arch_net_info_t *gmnet = gmnetp; struct sk_buff *skb; @@ -455,7 +480,19 @@ static int gm_max_diff_ring = 0; +static int _gmip_xmit(struct sk_buff *skb, struct net_device *dev); static int gmip_xmit(struct sk_buff *skb, struct net_device *dev) +{ + int ret; + unsigned long flags; + + spinlock_irqsave(&dumb_lock, flags); + ret = _gmip_xmit(skb, dev); + spin_unlock_irqrestore(&dumb_lock, flags); + return ret; +} + +static int _gmip_xmit(struct sk_buff *skb, struct net_device *dev) { gm_arch_net_info_t *gmnet = dev->priv; struct ethhdr *eth = (struct ethhdr*)skb->data; From owner-netdev@oss.sgi.com Mon Mar 5 16:29:03 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 16:28:55 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:64900 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 16:28:42 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id TAA26693; Mon, 5 Mar 2001 19:25:26 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 5 Mar 2001 19:25:26 -0500 (EST) From: jamal To: Bob Felderman cc: , , Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <200103060000.QAA21285@frisbee.myri.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Now this is the problem with proprietary drivers that nobody sees the code for (or maybe hardware that nobody sees specs for) You caused people all the pain of trying to decode what your problem is only to find you are making some basic mistakes. There was a document URL posted on how to write drivers with the new 2.4 kernel (here on netdev). Look up the archive. Sorry, nothing personal here. But this is where Open Source might have benefited you. cheers, jamal On Mon, 5 Mar 2001, Bob Felderman wrote: > => From andrewm@uow.edu.au Mon Mar 5 15:50:15 2001 > => Your driver doesn't appear to have any SMP locking. The > => same skb can be fed to the network stack twice. > > The driver does (and always has had) > > if (test_and_set_bit(0, (void *) &is->arch.interrupt) != 0) { > /* hmm, interrupt called twice */ > [other stuff deleted] > } > From owner-netdev@oss.sgi.com Mon Mar 5 16:34:43 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 16:34:34 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:47490 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 16:34:28 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id LAA22677; Tue, 6 Mar 2001 11:34:07 +1100 (EST) Message-ID: <3AA4307F.21698B6C@uow.edu.au> Date: Tue, 06 Mar 2001 00:34:07 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: Bob Felderman , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal wrote: > > Now this is the problem with proprietary drivers that nobody sees the code > for (or maybe hardware that nobody sees specs for) * Permission to use, copy, modify and distribute this software and its * * documentation in source and binary forms for non-commercial purposes * * and without fee is hereby granted, provided that the modified software * * is returned to Myricom, Inc. for redistribution. So it's not *too* sinful :) > You caused people all the pain of trying to decode what your problem is > only to find you are making some basic mistakes. I don't know if that's proven yet. From owner-netdev@oss.sgi.com Mon Mar 5 16:45:43 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 16:45:34 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:901 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 16:45:16 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id TAA26716; Mon, 5 Mar 2001 19:42:09 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 5 Mar 2001 19:42:09 -0500 (EST) From: jamal To: Andrew Morton cc: Bob Felderman , , Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <3AA4307F.21698B6C@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 6 Mar 2001, Andrew Morton wrote: > jamal wrote: > > > > Now this is the problem with proprietary drivers that nobody sees the code > > for (or maybe hardware that nobody sees specs for) > > * Permission to use, copy, modify and distribute this software and its * > * documentation in source and binary forms for non-commercial purposes * > * and without fee is hereby granted, provided that the modified software * > * is returned to Myricom, Inc. for redistribution. > > So it's not *too* sinful :) > OK, sorry, i pulled the trigger too fast. Just evidence i really dislike non-GPL stuff. cheers, jamal From owner-netdev@oss.sgi.com Mon Mar 5 17:09:33 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 17:09:23 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:38927 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 17:09:21 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id CAA22339; Tue, 6 Mar 2001 02:07:11 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id CAA07951; Tue, 6 Mar 2001 02:07:10 +0100 To: Jeff Garzik Cc: Bob Felderman , andrewm@uow.edu.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: <200103060000.QAA21285@frisbee.myri.com> <3AA42B7B.7C562799@mandrakesoft.com> From: Jes Sorensen Date: 06 Mar 2001 02:07:10 +0100 In-Reply-To: Jeff Garzik's message of "Mon, 05 Mar 2001 19:12:43 -0500" Message-ID: Lines: 15 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Jeff" == Jeff Garzik writes: Jeff> * spin_lock around your Tx interrupt handling path. * Jeff> spin_lock_irq around your dev->hard_start_xmit Tx submission Jeff> code. You don't need locks around this if the hardware is sane. Jeff> Ideally your Rx interrupt handling path is independent of other Jeff> code, and need not be locked. Anyway you only need the locks if two parts of the code can be invoked simultanously. Jes From owner-netdev@oss.sgi.com Mon Mar 5 17:20:44 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 17:20:34 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:49630 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Mon, 5 Mar 2001 17:20:16 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 00D7A1F6A; Mon, 5 Mar 2001 20:20:03 -0500 (EST) Message-ID: <3AA43B44.DE3089A2@mandrakesoft.com> Date: Mon, 05 Mar 2001 20:20:04 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com, linux-net@vger.rutgers.edu Subject: Zeroing interface stats? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Just spotted this code in acenic.c. Should it be carried over to other net drivers? static int ace_open(struct net_device *dev) { [...] /* * Zero the stats when restarting the interface... */ memset(&ap->stats, 0, sizeof(ap->stats)); -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Mon Mar 5 18:15:03 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 18:14:44 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:31758 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 18:14:40 -0800 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.0/8.11.0) with ESMTP id f262hFr06042; Mon, 5 Mar 2001 19:43:15 -0700 Message-ID: <3AA44EC2.DAE0B7A0@candelatech.com> Date: Mon, 05 Mar 2001 19:43:14 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: netdev@oss.sgi.com, linux-net@vger.rutgers.edu Subject: Re: Zeroing interface stats? References: <3AA43B44.DE3089A2@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jeff Garzik wrote: > > Just spotted this code in acenic.c. Should it be carried over to other > net drivers? > > static int ace_open(struct net_device *dev) > { > [...] > /* > * Zero the stats when restarting the interface... > */ > memset(&ap->stats, 0, sizeof(ap->stats)); > > -- > Jeff Garzik | "You see, in this world there's two kinds of > Building 1024 | people, my friend: Those with loaded guns > MandrakeSoft | and those who dig. You dig." --Blondie Well, sounds like a pretty good idea, but I know it will break at least one piece of code I've written because I assume the (measly) 32-bit counters have wrapped if the current value is less than the last value read...) Now, if someone wanted to go make sure all counters were 64-bit, then that would be cool :) Of course, I can probably figure a way to re-write my code...but I can't think of a concise way offhand... Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Mar 5 18:20:14 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 18:20:03 -0800 Received: from mx1.eskimo.com ([204.122.16.48]:7432 "EHLO mx1.eskimo.com") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 18:19:48 -0800 Received: from eskimo.com (klevin@eskimo.com [204.122.16.13]) by mx1.eskimo.com (8.9.1a/8.8.8) with ESMTP id SAA30441; Mon, 5 Mar 2001 18:19:31 -0800 Received: from localhost (klevin@localhost) by eskimo.com (8.9.1a/8.9.1) with SMTP id SAA16400; Mon, 5 Mar 2001 18:19:31 -0800 (PST) X-Authentication-Warning: eskimo.com: klevin owned process doing -bs Date: Mon, 5 Mar 2001 18:19:30 -0800 (PST) From: Noah Romer To: Jeff Garzik cc: netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <3AA42B7B.7C562799@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 5 Mar 2001, Jeff Garzik wrote: > * spin_lock around your Tx interrupt handling path. > * spin_lock_irq around your dev->hard_start_xmit Tx submission code. > > Ideally your Rx interrupt handling path is independent of other code, > and need not be locked. Well, if the Rx code modifies anything that is shared by all instances of the Rx code (i.e. a queue or stack), you're going to need to lock around those access points in order to be SMP safe. If the Rx code is in an interrupt context, then you've got to spin_lock_irq. -- Noah Romer |"Everyone is more or less mad on one point." klevin@eskimo.com | - Rudyard Kipling PGP key available | by finger or email | From owner-netdev@oss.sgi.com Mon Mar 5 18:37:23 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 18:37:13 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:8613 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 18:36:47 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id NAA06721; Tue, 6 Mar 2001 13:36:37 +1100 (EST) Message-ID: <3AA44D36.540CC5BE@uow.edu.au> Date: Tue, 06 Mar 2001 02:36:38 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: netdev@oss.sgi.com, linux-net@vger.rutgers.edu Subject: Re: Zeroing interface stats? References: <3AA43B44.DE3089A2@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jeff Garzik wrote: > > Just spotted this code in acenic.c. Should it be carried over to other > net drivers? > > static int ace_open(struct net_device *dev) > { > [...] > /* > * Zero the stats when restarting the interface... > */ > memset(&ap->stats, 0, sizeof(ap->stats)); > It's a philosophical thing... Do we want the stats to tell us what's happened since the interface was created, of so we want them to tell us what's happened since the interface was opened? I think I prefer the former: tell me what has happened since day one. If we require the ability to zero out the stats then this can be done at a higher level case SIOCZAPSTATS: if (dev->get_stats) { struct net_device_stats *stats = dev->get_stats(dev); if (stats) memset(stats, 0, sizeof(*stats)); } No? From owner-netdev@oss.sgi.com Mon Mar 5 19:04:14 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 19:04:04 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:13445 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 19:03:47 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id WAA27125; Mon, 5 Mar 2001 22:02:39 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 5 Mar 2001 22:02:38 -0500 (EST) From: jamal To: Andrew Morton cc: Jeff Garzik , , Subject: Re: Zeroing interface stats? In-Reply-To: <3AA44D36.540CC5BE@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 6 Mar 2001, Andrew Morton wrote: > Jeff Garzik wrote: > > > > Just spotted this code in acenic.c. Should it be carried over to other > > net drivers? > > > > static int ace_open(struct net_device *dev) > > { > > [...] > > /* > > * Zero the stats when restarting the interface... > > */ > > memset(&ap->stats, 0, sizeof(ap->stats)); > > > > It's a philosophical thing... Do we want the stats > to tell us what's happened since the interface was > created, of so we want them to tell us what's > happened since the interface was opened? > > I think I prefer the former: tell me what has > happened since day one. > Infact i think what the acenic is doing is illegal. These stats are used by SNMP. IIRC, if you have to zero the stats you also must reset the ifindex. Someone double check the SNMP RFCs. I think this removes any ambiguity since the ifindex was received at boot time. OTOH, while i am almost agreeing with you with the proposal below i think it is dangerous because people will write proggies to use it. It is useful for debugging etc > If we require the ability to zero out the stats > then this can be done at a higher level > > case SIOCZAPSTATS: > if (dev->get_stats) { > struct net_device_stats *stats = dev->get_stats(dev); > if (stats) > memset(stats, 0, sizeof(*stats)); > } > > No? > cheers, jamal From owner-netdev@oss.sgi.com Mon Mar 5 19:07:24 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 19:07:04 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:2032 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 19:06:51 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id OAA17889; Tue, 6 Mar 2001 14:06:41 +1100 (EST) Message-ID: <3AA45442.4980994B@uow.edu.au> Date: Tue, 06 Mar 2001 03:06:42 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: netdev@oss.sgi.com Subject: Re: Zeroing interface stats? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal wrote: > > On Tue, 6 Mar 2001, Andrew Morton wrote: > > Infact i think what the acenic is doing is illegal. These stats are used > by SNMP. > IIRC, if you have to zero the stats you also must reset the ifindex. > Someone double check the SNMP RFCs. I've seen you use this `ifindex' term before. What is it, and what is its role in life? Thanks. From owner-netdev@oss.sgi.com Mon Mar 5 19:27:24 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 19:27:04 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:49422 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 19:26:55 -0800 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.0/8.11.0) with ESMTP id f263tOr14659; Mon, 5 Mar 2001 20:55:24 -0700 Message-ID: <3AA45FAC.EFB34E79@candelatech.com> Date: Mon, 05 Mar 2001 20:55:24 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: Andrew Morton , Jeff Garzik , netdev@oss.sgi.com, linux-net@vger.rutgers.edu Subject: Re: Zeroing interface stats? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal wrote: > > On Tue, 6 Mar 2001, Andrew Morton wrote: > > > Jeff Garzik wrote: > > > > > > Just spotted this code in acenic.c. Should it be carried over to other > > > net drivers? > > > > > > static int ace_open(struct net_device *dev) > > > { > > > [...] > > > /* > > > * Zero the stats when restarting the interface... > > > */ > > > memset(&ap->stats, 0, sizeof(ap->stats)); > > > > > > > It's a philosophical thing... Do we want the stats > > to tell us what's happened since the interface was > > created, of so we want them to tell us what's > > happened since the interface was opened? > > > > I think I prefer the former: tell me what has > > happened since day one. > > > > Infact i think what the acenic is doing is illegal. These stats are used > by SNMP. > IIRC, if you have to zero the stats you also must reset the ifindex. > Someone double check the SNMP RFCs. > I think this removes any ambiguity since the ifindex was received at > boot time. > OTOH, while i am almost agreeing with you with the proposal below > i think it is dangerous because people will write proggies to use it. > It is useful for debugging etc Please, PLEASE, don't change the device index!! There needs to be some immutable thing (other than net_device*) to reliably point to a particular device. Changing that will completely screw up VLAN code as it's written now (because I didn't want pointers that could go stale..so I do hashed lookups based on device id...) And I do believe it would screw up SNMP for the same reason it screws up my code: Counter-wrap must be taken into account... Ben > > case SIOCZAPSTATS: > > if (dev->get_stats) { > > struct net_device_stats *stats = dev->get_stats(dev); > > if (stats) > > memset(stats, 0, sizeof(*stats)); > > } That sounds like a much better idea to me... (but make sure that the ip and ifconfig tools don't automagically 'help' you out by zero-ing them!!) Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Mar 5 19:36:54 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 19:36:44 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:16773 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 19:36:31 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id WAA27236; Mon, 5 Mar 2001 22:35:25 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 5 Mar 2001 22:35:25 -0500 (EST) From: jamal To: Andrew Morton cc: Subject: Re: Zeroing interface stats? In-Reply-To: <3AA45442.4980994B@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 6 Mar 2001, Andrew Morton wrote: > jamal wrote: > > > > On Tue, 6 Mar 2001, Andrew Morton wrote: > > > > Infact i think what the acenic is doing is illegal. These stats are used > > by SNMP. > > IIRC, if you have to zero the stats you also must reset the ifindex. > > Someone double check the SNMP RFCs. > > I've seen you use this `ifindex' term before. What is it, and > what is its role in life? -- interface index , retrievable via SIOCGIFINDEX ^ ^ ^^^^^ A unique identifier for a "net link". Represented in Linux as dev->ifindex Theoretically, also an index into a ifTable (a table of interfaces, on linux you walk a list of course) to retrieve an ifEntry ;-> an IfEntry holds all the SNMP parameters + a lot more in Linux. In Linux one can look at IfEntry as a netdev struct. cheers, jamal From owner-netdev@oss.sgi.com Mon Mar 5 19:41:14 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 19:40:54 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:17797 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 19:40:39 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id WAA27240; Mon, 5 Mar 2001 22:39:31 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 5 Mar 2001 22:39:31 -0500 (EST) From: jamal To: Ben Greear cc: Andrew Morton , Jeff Garzik , , Subject: Re: Zeroing interface stats? In-Reply-To: <3AA45FAC.EFB34E79@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 5 Mar 2001, Ben Greear wrote: > > Infact i think what the acenic is doing is illegal. These stats are used > > by SNMP. > > IIRC, if you have to zero the stats you also must reset the ifindex. > > Someone double check the SNMP RFCs. > > Please, PLEASE, don't change the device index!! There needs to be some > immutable thing (other than net_device*) to reliably point to a particular > device. Changing that will completely screw up VLAN code as it's written > now (because I didn't want pointers that could go stale..so I do hashed lookups > based on device id...) I dont think i was suggesting that. Merely stating a hypothesis: That if me remembers correctly, the only time zeroing the netstats is considered valid is when you also change the ifindex as well i.e it becomes a new device. Now i could be wrong. Someone more knowledgable on SNMP needs to make the qualification. cheers, jamal From owner-netdev@oss.sgi.com Mon Mar 5 19:49:14 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 19:48:54 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:61144 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 19:48:49 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id OAA04203; Tue, 6 Mar 2001 14:48:40 +1100 (EST) Message-ID: <3AA45E19.9E747A4F@uow.edu.au> Date: Tue, 06 Mar 2001 03:48:41 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: netdev@oss.sgi.com Subject: Re: Zeroing interface stats? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal wrote: > > On Tue, 6 Mar 2001, Andrew Morton wrote: > > > jamal wrote: > > > > > > On Tue, 6 Mar 2001, Andrew Morton wrote: > > > > > > Infact i think what the acenic is doing is illegal. These stats are used > > > by SNMP. > > > IIRC, if you have to zero the stats you also must reset the ifindex. > > > Someone double check the SNMP RFCs. > > > > I've seen you use this `ifindex' term before. What is it, and > > what is its role in life? > > -- interface index , retrievable via SIOCGIFINDEX > ^ ^ ^^^^^ > A unique identifier for a "net link". Represented in Linux as dev->ifindex > Theoretically, also an index into a ifTable (a table of interfaces, on > linux you walk a list of course) to retrieve an ifEntry ;-> > an IfEntry holds all the SNMP parameters + a lot more in Linux. > In Linux one can look at IfEntry as a netdev struct. I see. And this index is published to external applications and management systems as a reliable identifier with which to reference this interface on this host? And so we're not allowed to change it across the interface lifetime? Should we attempt to make an interface's ifindex constant across reinstantiations of the interface (ie: driver reloads?). From owner-netdev@oss.sgi.com Mon Mar 5 20:02:14 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 20:01:54 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:20613 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 20:01:25 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id XAA27280; Mon, 5 Mar 2001 23:00:16 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 5 Mar 2001 23:00:16 -0500 (EST) From: jamal To: Andrew Morton cc: Subject: Re: Zeroing interface stats? In-Reply-To: <3AA45E19.9E747A4F@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 6 Mar 2001, Andrew Morton wrote: > I see. And this index is published to external applications > and management systems as a reliable identifier with which > to reference this interface on this host? And so we're not > allowed to change it across the interface lifetime? SNMP uses it and again IIRC, the net management system initializes it only at startup time i.e it doesnt continously reinitialize after it gets started. Again this is just from some triggered brain blast i had. Someone needs to check the SNMP RFCs, i dont know which one. i.e for each ifindex, from SNMP view, the counters are a monotonicaly incrementing values .... > > Should we attempt to make an interface's ifindex constant > across reinstantiations of the interface (ie: driver reloads?). > This is a hard one. I supose if you are getting a new ifindex, you should reset the counters. but it is probably good practise to just give it a new ifindex and reset counters. cheers, jamal From owner-netdev@oss.sgi.com Mon Mar 5 20:46:14 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 20:45:55 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:783 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 20:45:38 -0800 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.0/8.11.0) with ESMTP id f265EBr24659; Mon, 5 Mar 2001 22:14:11 -0700 Message-ID: <3AA47222.1F94402@candelatech.com> Date: Mon, 05 Mar 2001 22:14:10 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: Andrew Morton , netdev@oss.sgi.com Subject: TCP/IP question. References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I think I know the answer to this, but I'm having trouble convincing myself while looking at my code: Once a TCP/IP connection has been established (say via accept and connect), the resulting connection is symetric right? I'm seeing something wierd where the accepting side of the connection will not run near as fast as the connecting side. (say 15Mbps v/s 20Mbps). The program on the slower side shows very little CPU usage. I'm using select and non-blocking IO, generally doing writes to the kernel of about 24k bytes. I'm also running a 30Mbps symetric UDP connection on a different set of ports, and a 10Mbps UDP connection on the same ports as the tcp-connection above. The UDP traffic seems to be having no problem at all... Also, either the drivers suck much more (possible, I'm using some wierd LNYX 4-port 21143 Tulip card), or something else is wierd, because I get worse performance in 2.4.3-pre1 than in 2.2.19-pre-ac-latest-er-something It does seem that these interfaces are running 100bt-HALF_DUPLEX, so that could be the problem... Any ideas?? THanks, Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Mar 5 21:11:34 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 21:11:25 -0800 Received: from m202-4-p02.warwick.net ([208.242.202.157]:41733 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 21:11:04 -0800 Received: from localhost (IDENT:statux@localhost [127.0.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with ESMTP id AAA15113; Tue, 6 Mar 2001 00:12:56 -0500 Date: Tue, 6 Mar 2001 00:12:56 -0500 (EST) From: Statux X-Sender: To: Ben Greear cc: jamal , Andrew Morton , Subject: Re: TCP/IP question. In-Reply-To: <3AA47222.1F94402@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > Once a TCP/IP connection has been established (say via accept and connect), > the resulting connection is symetric right? Not always. There is a lot of asymmetric hardware out there... namely 56K modems and ADSL, etc. 56K modems, for instance, upload at about 33.6K and download at 53K (based on FCC regulations in the USA, that 53K number might be different in other countries). I think that I'm right with that answer :) Anyone else? From owner-netdev@oss.sgi.com Mon Mar 5 21:13:34 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 21:13:14 -0800 Received: from oak.cats.ohiou.edu ([132.235.8.44]:58635 "EHLO oak.cats.ohiou.edu") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 21:13:03 -0800 Received: from dhcp-177-228.cns.ohiou.edu (dhcp-177-228.cns.ohiou.edu [132.235.177.228]) by oak.cats.ohiou.edu (8.9.3/8.9.3) with ESMTP id AAA257512; Tue, 6 Mar 2001 00:13:02 -0500 (EST) Received: (from elb@localhost) by dhcp-177-228.cns.ohiou.edu (8.11.0/8.11.0) id f265D2g02479; Tue, 6 Mar 2001 00:13:02 -0500 Date: Tue, 6 Mar 2001 00:13:02 -0500 From: Ethan Blanton To: Statux Cc: Ben Greear , jamal , Andrew Morton , netdev@oss.sgi.com Subject: Re: TCP/IP question. Message-ID: <20010306001302.A2462@dhcp-177-228.cns.ohiou.edu> Mail-Followup-To: Statux , Ben Greear , jamal , Andrew Morton , netdev@oss.sgi.com References: <3AA47222.1F94402@candelatech.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="zYM0uCDKw75PZbzx" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from statux@bigfoot.com on Tue, Mar 06, 2001 at 12:12:56AM -0500 X-Operating-System: Linux X-GnuPG-Fingerprint: A290 14A8 C682 5C88 AE51 4787 AFD9 00F4 883C 1C14 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing --zYM0uCDKw75PZbzx Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Statux spake unto us the following wisdom: > > Once a TCP/IP connection has been established (say via accept and conne= ct), > > the resulting connection is symetric right? >=20 > Not always. There is a lot of asymmetric hardware out there... namely 56K > modems and ADSL, etc. Not to mention that the routes themselves could very well be (and are not unlikely to be) asymmetric if the packets are travelling very far. Ethan --=20 If I've told you once, I've told you once -- and once is all that you neede= d. -- The Refreshments, "Carefree" --zYM0uCDKw75PZbzx Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE6pHHdr9kA9Ig8HBQRAlOkAJ945x8F9XFngvUpgpm+dbs5Dtw3FACaA0wM DZFkPa94fER8daNCUfwkCNU= =vfov -----END PGP SIGNATURE----- --zYM0uCDKw75PZbzx-- From owner-netdev@oss.sgi.com Mon Mar 5 21:15:04 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 21:14:45 -0800 Received: from wirespeed.solidum.com ([207.35.224.226]:4026 "EHLO solidum.com") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 21:14:37 -0800 Received: from marajade.sandelman.ottawa.on.ca (marajade.solidum.com [192.168.1.24]) by solidum.com (8.8.7/8.8.7) with ESMTP id AAA07225 for ; Tue, 6 Mar 2001 00:14:33 -0500 Received: from marajade.sandelman.ottawa.on.ca (marajade.sandelman.ottawa.on.ca [127.0.0.1]) by marajade.sandelman.ottawa.on.ca (8.11.0/8.11.0) with ESMTP id f265DXL02367 for ; Tue, 6 Mar 2001 00:13:33 -0500 (EST) Message-Id: <200103060513.f265DXL02367@marajade.sandelman.ottawa.on.ca> To: netdev@oss.sgi.com Subject: Re: TCP/IP question. In-reply-to: Your message of "Tue, 06 Mar 2001 00:12:56 EST." Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Tue, 06 Mar 2001 00:13:33 -0500 From: Michael Richardson Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Statux" == Statux writes: >> Once a TCP/IP connection has been established (say via accept and connect), >> the resulting connection is symetric right? Statux> Not always. There is a lot of asymmetric hardware out there... namely 56K Statux> modems and ADSL, etc. My impression was that he was speaking about protocol issues, not bandwidth ones. ] Train travel features AC outlets with no take-off restrictions|gigabit is no[ ] Michael Richardson, Solidum Systems Oh where, oh where has|problem with[ ] mcr@solidum.com www.solidum.com the little fishy gone?|PAX.port 1100[ ] panic("Just another NetBSD/notebook using, kernel hacking, security guy"); [ From owner-netdev@oss.sgi.com Mon Mar 5 21:21:34 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 21:21:15 -0800 Received: from pizda.ninka.net ([216.101.162.242]:46471 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 21:21:00 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id VAA17978; Mon, 5 Mar 2001 21:20:22 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15012.29590.622343.706448@pizda.ninka.net> Date: Mon, 5 Mar 2001 21:20:22 -0800 (PST) To: Jeff Garzik Cc: netdev@oss.sgi.com, linux-net@vger.rutgers.edu Subject: Re: Zeroing interface stats? In-Reply-To: <3AA43B44.DE3089A2@mandrakesoft.com> References: <3AA43B44.DE3089A2@mandrakesoft.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jeff Garzik writes: > Just spotted this code in acenic.c. Should it be carried over to other > net drivers? If the behavior is correct, no. If the behavior is incorrect, no. Never do something generic in every driver, do it in one place and be done with it :-) Someone needs to check out the RFCs wrt. the points about SNMP made by Jamal. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Mar 5 21:22:04 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 21:21:44 -0800 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:11279 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 21:21:28 -0800 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.0/8.11.0) with ESMTP id f265oAr29243; Mon, 5 Mar 2001 22:50:10 -0700 Message-ID: <3AA47A91.40F4A168@candelatech.com> Date: Mon, 05 Mar 2001 22:50:09 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Statux CC: jamal , Andrew Morton , netdev@oss.sgi.com Subject: Re: TCP/IP question. References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Statux wrote: > > > Once a TCP/IP connection has been established (say via accept and connect), > > the resulting connection is symetric right? > > Not always. There is a lot of asymmetric hardware out there... namely 56K > modems and ADSL, etc. > > 56K modems, for instance, upload at about 33.6K and download at 53K > (based on FCC regulations in the USA, that 53K number might be different > in other countries). > > I think that I'm right with that answer :) Anyone else? Well, my hardware is symmetric, and I've tried moving the accept'er from one side to the other..and the slow transmit side follows the guy that's accepting. On a similar topic: I am sending large bunches of TCP/IP traffic, say 24kbytes at a time. However, on the receive, I never read more than 14k or so, even though the specified max to read is 40kB or so. (It's non-blocking reads...) Is that 14k a tunable paramter? UDP traffic doesn't suffer from that problem of course, because you receive the entire PDU at once..... -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon Mar 5 22:17:55 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 22:17:35 -0800 Received: from citadel.myri.com ([199.120.212.1]:2198 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 22:17:13 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id WAA29571; Mon, 5 Mar 2001 22:17:12 -0800 (PST) Received: from localhost by orion.myri.com.myri.com (4.1/SMI-4.1) id AA24803; Mon, 5 Mar 01 22:18:52 PST Date: Mon, 5 Mar 2001 22:18:52 -0800 (PST) From: Bob Felderman To: Andrew Morton Cc: Bob Felderman , jamal , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <3AA4307F.21698B6C@uow.edu.au> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 6 Mar 2001, Andrew Morton wrote: > jamal wrote: > > > > Now this is the problem with proprietary drivers that nobody sees the code > > for (or maybe hardware that nobody sees specs for) > > * Permission to use, copy, modify and distribute this software and its * > * documentation in source and binary forms for non-commercial purposes * > * and without fee is hereby granted, provided that the modified software * > * is returned to Myricom, Inc. for redistribution. > > So it's not *too* sinful :) > > > You caused people all the pain of trying to decode what your problem is > > only to find you are making some basic mistakes. > > I don't know if that's proven yet. > Our code has always been "open source" since we started the company nearly 7 years ago. I feel pretty strongly about that and we do get lots of help from cutomers because of it. One reason we don't GPL the code is that the Commerce Dept. classifies our hardware as "munitions" (I think) and we have some export restrictions, so they are happier if we pretend to not give our code away. OK, I've added in spin locking to serialize the interrupt routine and any transmits. I've done it my way and using a patch from Andrew. The effect is the same - no change in the basic behavior. Here's my most recent crash. It looks like both processors are panicing in the same place? ksymoops 0.7c on i686 2.4.2. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.2/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Warning (compare_maps): mismatch on symbol __module_author , gm says d0888ee0, sbin/gm says d088ab60. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_description , gm says d0888eff, sbin/gm says d088ab7f. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_parm_gm_net_copy_threshold , gm says d0888f5c, sbin/gm says d088abdc. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_parm_gmip_hw_checksum , gm says d0888f44, sbin/gm says d088abc4. Ignoring sbin/gm entry invalid operand: 0000 CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 0000001c ebx: c63fb280 ecx: c150c000 edx: 00000001 esi: cff07420 edi: c9639da0 ebp: 00000fe7 esp: c029fea0 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c029f000) Stack: c026dde5 c026df00 00000120 c9639da0 c01f612c c9510d20 cff07420 00000000 c01f6b4f cff07420 c9639da0 c9639da0 c9639da0 c9639da0 c92b0040 3782c9c7 c029e000 cc042a00 c01f5c50 c9639da0 c9639da0 c92b0040 c9639da0 c9639da0 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 0f 0b 83 c4 0c e9 bc 00 00 00 8b 4a 28 85 c9 74 08 f0 ff 49 >>EIP; c01eb629 <__kfree_skb+31/fc> <===== Trace; c01f612c Trace; c01f6b4f Trace; c01f5c50 Trace; c01f6011 Trace; d08ef8b0 Trace; c01ed73e Trace; c010a99a Trace; c01071c0 Trace; c01071c0 Trace; c010909c Trace; c01071c0 Trace; c01071c0 Trace; c0100018 Trace; c0107252 Trace; c0105000 Trace; c01001cf Code; c01eb629 <__kfree_skb+31/fc> 00000000 <_EIP>: Code; c01eb629 <__kfree_skb+31/fc> <===== 0: 0f 0b ud2a <===== Code; c01eb62b <__kfree_skb+33/fc> 2: 83 c4 0c add $0xc,%esp Code; c01eb62e <__kfree_skb+36/fc> 5: e9 bc 00 00 00 jmp c6 <_EIP+0xc6> c01eb6ef <__kfree_skb+f7/fc> Code; c01eb633 <__kfree_skb+3b/fc> a: 8b 4a 28 mov 0x28(%edx),%ecx Code; c01eb636 <__kfree_skb+3e/fc> d: 85 c9 test %ecx,%ecx Code; c01eb638 <__kfree_skb+40/fc> f: 74 08 je 19 <_EIP+0x19> c01eb642 <__kfree_skb+4a/fc> Code; c01eb63a <__kfree_skb+42/fc> 11: f0 ff 49 00 lock decl 0x0(%ecx) invalid operand: 0000 Kernel panic: Aiee, killing interrupt handler! CPU: 1 EIP: 0010:[] EFLAGS: 00010292 eax: 0000001c ebx: c63fb280 ecx: c150c000 edx: 00000001 esi: cff07420 edi: c1575300 ebp: 00000fe7 esp: c1449e74 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c1449000) Stack: c026dde5 c026df00 00000120 c1575300 c01f612c c9510d20 cff07420 00000000 c01f6b4f cff07420 c1575300 c1575300 c1575300 c1575300 c8b54040 3782c9c7 c1448000 cc042a00 c01f5c50 c1575300 c1575300 c8b54040 c1575300 c1575300 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 0f 0b 83 c4 0c e9 bc 00 00 00 8b 4a 28 85 c9 74 08 f0 ff 49 >>EIP; c01eb629 <__kfree_skb+31/fc> <===== Trace; c01f612c Trace; c01f6b4f Trace; c01f5c50 Trace; c01f6011 Trace; d08ef8b0 Trace; c01ed73e Trace; c010a99a Trace; c01071c0 Trace; c01071c0 Trace; c010909c Trace; c01071c0 Trace; c01071c0 Trace; c0100018 Trace; c0107252 Trace; c01193aa Trace; c010a99a Code; c01eb629 <__kfree_skb+31/fc> 00000000 <_EIP>: Code; c01eb629 <__kfree_skb+31/fc> <===== 0: 0f 0b ud2a <===== Code; c01eb62b <__kfree_skb+33/fc> 2: 83 c4 0c add $0xc,%esp Code; c01eb62e <__kfree_skb+36/fc> 5: e9 bc 00 00 00 jmp c6 <_EIP+0xc6> c01eb6ef <__kfree_skb+f7/fc> Code; c01eb633 <__kfree_skb+3b/fc> a: 8b 4a 28 mov 0x28(%edx),%ecx Code; c01eb636 <__kfree_skb+3e/fc> d: 85 c9 test %ecx,%ecx Code; c01eb638 <__kfree_skb+40/fc> f: 74 08 je 19 <_EIP+0x19> c01eb642 <__kfree_skb+4a/fc> Code; c01eb63a <__kfree_skb+42/fc> 11: f0 ff 49 00 lock decl 0x0(%ecx) 5 warnings issued. Results may not be reliable. From owner-netdev@oss.sgi.com Mon Mar 5 23:20:27 2001 Received: by oss.sgi.com id ; Mon, 5 Mar 2001 23:20:17 -0800 Received: from adsl-151-196-236-235.baltmd.adsl.bellatlantic.net ([151.196.236.235]:16436 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Mon, 5 Mar 2001 23:20:03 -0800 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id CAA30590; Tue, 6 Mar 2001 02:13:49 -0500 Date: Tue, 6 Mar 2001 02:13:49 -0500 (EST) From: Donald Becker X-Sender: becker@vaio.greennet To: Bob Felderman cc: andrewm@uow.edu.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <3AA42B7B.7C562799@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, 5 Mar 2001, Jeff Garzik wrote: > > The driver does (and always has had) > > > > if (test_and_set_bit(0, (void *) &is->arch.interrupt) != 0) { ... > That is most definitely -not- good SMP locking. In normal drivers on > normal hardware, the interrupt handler is never ever called twice > anyway. This code is an artifact of Donald Becker's driver skeleton, > which includes this check. A few of his drivers call the interrupt > handler routine from normal driver code, thus requiring the check. Most > drivers do not need this check. No, this check is to avoid a bug in the Linux interrupt dispatch code that called an interrupt handler simultaneously. It only occured on SMP machines, but it was very bad. I can see the response coming.. it usually sounds like "there is no such bug, but it was fixed a long time ago". Much like a regression test, keeping the check in a few places assures us that the bug isn't recurring. > Read Documentation/networking/netdevices.txt in 2.4.x kernels. Also, > since it doesn't cover interrupt handling and hardware, the basic rule > is: > > * spin_lock around your Tx interrupt handling path. > * spin_lock_irq around your dev->hard_start_xmit Tx submission code. With some hardware designs you shouldn't need these locks. Donald Becker becker@scyld.com Scyld Computing Corporation http://www.scyld.com 410 Severn Ave. Suite 210 Second Generation Beowulf Clusters Annapolis MD 21403 410-990-9993 From owner-netdev@oss.sgi.com Tue Mar 6 06:08:24 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 06:08:05 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:10137 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 06:07:56 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id BAA15390; Wed, 7 Mar 2001 01:05:43 +1100 (EST) Message-ID: <3AA4EED9.5832C994@uow.edu.au> Date: Wed, 07 Mar 2001 01:06:17 +1100 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.2-pre2 i586) X-Accept-Language: en MIME-Version: 1.0 To: Bob Felderman CC: netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: <3AA4307F.21698B6C@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Bob Felderman wrote: > > Here's my most recent crash. It looks like both processors > are panicing in the same place? What test tools are you using to make this happen? netperf? Could you please send the command line which you're using? From owner-netdev@oss.sgi.com Tue Mar 6 07:05:55 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 07:05:45 -0800 Received: from roma.axis.se ([193.13.178.2]:62985 "EHLO roma.axis.se") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 07:05:29 -0800 Received: from klatt.axis.se (klatt.axis.se [10.0.5.41]) by roma.axis.se (8.9.3/8.9.3) with ESMTP id QAA28447 for ; Tue, 6 Mar 2001 16:04:38 +0100 (MET) Received: by klatt.axis.se with Internet Mail Service (5.5.2653.19) id <1058PTHB>; Tue, 6 Mar 2001 16:03:52 +0100 Message-ID: From: Per Flock To: "'netdev@oss.sgi.com'" Subject: Proposed minor patch to net/core/iovec.c. Date: Tue, 6 Mar 2001 16:04:49 +0100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, Attached is a patch that corrects what I belive is a minor flaw in copy_and_csum_toiovec(). Currently there is no distinction between a csum error and an io error. If a user application provides a bad buffer, this would result in incoming segments being silently discarded (copy_and_csum_toiovec() returns -EINVAL). What should happen, to my understanding, is that the error from csum_and_copy_to_user() should be returned so that the user application is notified. /Per Index: net/core/iovec.c =================================================================== RCS file: /n/cvsroot/os/linux/net/core/iovec.c,v retrieving revision 1.2 diff -u -r1.2 iovec.c --- net/core/iovec.c 2001/02/23 13:51:30 1.2 +++ net/core/iovec.c 2001/03/06 14:08:11 @@ -130,7 +130,9 @@ csum = csum_partial(skb->h.raw, hlen, skb->csum); csum = csum_and_copy_to_user(skb->h.raw+hlen, iov->iov_base, chunk, csum, &err); - if (err || ((unsigned short)csum_fold(csum))) + if (err) + return err; + if ((unsigned short)csum_fold(csum)) goto csum_error; iov->iov_len -= chunk; iov->iov_base += chunk; From owner-netdev@oss.sgi.com Tue Mar 6 07:49:25 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 07:49:15 -0800 Received: from robur.slu.se ([130.238.98.12]:26376 "EHLO robur.slu.se") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 07:48:55 -0800 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id QAA15989; Tue, 6 Mar 2001 16:48:45 +0100 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15013.1757.416231.718582@robur.slu.se> Date: Tue, 6 Mar 2001 16:48:45 +0100 (CET) To: Andrew Morton Cc: jamal , netdev@oss.sgi.com Subject: Re: Zeroing interface stats? In-Reply-To: <3AA45442.4980994B@uow.edu.au> References: <3AA45442.4980994B@uow.edu.au> X-Mailer: VM 6.75 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew Morton writes: > jamal wrote: > > > > On Tue, 6 Mar 2001, Andrew Morton wrote: > > > > Infact i think what the acenic is doing is illegal. These stats are used > > by SNMP. > > IIRC, if you have to zero the stats you also must reset the ifindex. > > Someone double check the SNMP RFCs. > > I've seen you use this `ifindex' term before. What is it, and > what is its role in life? BTW did you look into my effort of having a private clearable copy of the device stats for all open devices? The format as /proc/net/dev plus an extra line for EWMA stats, bytes and pkts etc. SNMP is not aware of it but it could be possible. robur.slu.se: /pub/Linux/net-development/ends/ends.c Cheers. --ro From owner-netdev@oss.sgi.com Tue Mar 6 08:16:35 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 08:16:16 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:34310 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 6 Mar 2001 08:15:56 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA19976; Tue, 6 Mar 2001 19:15:41 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103061615.TAA19976@ms2.inr.ac.ru> Subject: Re: Zeroing interface stats? To: jgarzik@mandrakesoft.COM (Jeff Garzik) Date: Tue, 6 Mar 2001 19:15:40 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <3AA43B44.DE3089A2@mandrakesoft.com> from "Jeff Garzik" at Mar 6, 1 04:45:02 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 216 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Just spotted this code in acenic.c. Should it be carried over to other > net drivers? Nooo!!! It is bug. Interface statistics must not be ever reset without interface destruction and recreation. Alexey From owner-netdev@oss.sgi.com Tue Mar 6 08:28:45 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 08:28:35 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:1031 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 6 Mar 2001 08:28:25 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA20314; Tue, 6 Mar 2001 19:28:06 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103061628.TAA20314@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: andrewm@uow.edu.au (Andrew Morton) Date: Tue, 6 Mar 2001 19:28:06 +0300 (MSK) Cc: netdev@oss.sgi.com, feldy@myri.com In-Reply-To: <3AA420E0.D54D4160@uow.edu.au> from "Andrew Morton" at Mar 5, 1 11:27:28 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 842 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Bob is consistently getting these oopses running netperf with > UDP on 2.4.2. He's using the myrinet hardware and drivers. It's > very high speed - over 100 mbytes/sec. I believe he's experiencing > out-of-memory conditions. > > Sometimes he also gets assertion failures from ip_frag_destroy. > `del_timer == 0'. > > Can you think of anything which would cause this to happen > in an out-of-memory situation? I am looking now. Probably, it is some silly misprint in ip_fragment.c. The problem with Bob's original report was that in the first lines he reported an illegal kfree_skb with skb->list!=NULL, called from ip_rcv(). This can be only bug in driver, nothing more. Actually, Bob, if you will say me that you found why this happened, my enthusiasm in reauditing ip_fragment.c will grow just fantastically. 8) Alexey From owner-netdev@oss.sgi.com Tue Mar 6 08:44:15 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 08:44:05 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:7177 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 08:43:50 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id RAA17046; Tue, 6 Mar 2001 17:43:40 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id RAA19491; Tue, 6 Mar 2001 17:43:39 +0100 To: Noah Romer Cc: Jeff Garzik , netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: From: Jes Sorensen Date: 06 Mar 2001 17:43:38 +0100 In-Reply-To: Noah Romer's message of "Mon, 5 Mar 2001 18:19:30 -0800 (PST)" Message-ID: Lines: 23 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Noah" == Noah Romer writes: Noah> On Mon, 5 Mar 2001, Jeff Garzik wrote: >> * spin_lock around your Tx interrupt handling path. * >> spin_lock_irq around your dev->hard_start_xmit Tx submission code. >> >> Ideally your Rx interrupt handling path is independent of other >> code, and need not be locked. Noah> Well, if the Rx code modifies anything that is shared by all Noah> instances of the Rx code (i.e. a queue or stack), you're going Noah> to need to lock around those access points in order to be SMP Noah> safe. If the Rx code is in an interrupt context, then you've got Noah> to spin_lock_irq. You don't need to use spin_lock_irq() in an interrupt handler as you are always guaranteed that the current interrupt handler is never disturbed by anyone else. Using spin_lock_irqsave() is preferred to spin_lock_irq() as spin_lock_irq() is dodgy to implement on some architectures. Jes From owner-netdev@oss.sgi.com Tue Mar 6 08:45:05 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 08:44:45 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:19719 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 6 Mar 2001 08:44:39 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA20524; Tue, 6 Mar 2001 19:44:10 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103061644.TAA20524@ms2.inr.ac.ru> Subject: Re: Proposed minor patch to net/core/iovec.c. To: per.flock@axis.COM (Per Flock) Date: Tue, 6 Mar 2001 19:44:10 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Per Flock" at Mar 6, 1 06:15:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 775 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Attached is a patch that corrects what I belive is a minor flaw in copy_and_csum_toiovec(). > Currently there is no distinction between a csum error and an io error. If a user application provides a bad buffer, this would result in incoming segments being silently discarded (copy_and_csum_toiovec() returns -EINVAL). What should happen, to my understanding, is that the error from csum_and_copy_to_user() should be returned so that the user application is notified. Damn, the bug is more profound. I forgot some details, when reusing this function (written for tcp) in UDP... In the case of tcp kind of error is inessential: all that we should make is to stop fast path. Case of EFAULT is handled in slow path then. But udp really does not sense error! Alexey From owner-netdev@oss.sgi.com Tue Mar 6 08:56:35 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 08:56:25 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:28935 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 6 Mar 2001 08:56:04 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA20682; Tue, 6 Mar 2001 19:55:47 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103061655.TAA20682@ms2.inr.ac.ru> Subject: Re: Error with mrouted 3.9b3+ios12 To: rusty@linuxcare.COM.AU (Rusty Russell) Date: Tue, 6 Mar 2001 19:55:47 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Rusty Russell" at Mar 1, 1 10:15:00 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 539 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Hmm, this means that a packet went through PRE_ROUTING, FORWARD and > POST_ROUTING, then hit loopback. Yes, multicasts are looped back after forwarding for delivery to local listeners on other interfaces. Paul, I have suggested you to install some mrouting software at your workplace. It is serious suggestion. You never see multicasts on your network and this is main source of wrong assumptions. Please, remember that real users of mrouting will join looong after releasing 2.4, it may be too late to fix netfilter. Alexey From owner-netdev@oss.sgi.com Tue Mar 6 09:47:26 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 09:47:06 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:36741 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 09:46:42 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id MAA28490; Tue, 6 Mar 2001 12:45:19 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Tue, 6 Mar 2001 12:45:18 -0500 (EST) From: jamal To: Robert Olsson cc: Andrew Morton , Subject: Re: Zeroing interface stats? In-Reply-To: <15013.1757.416231.718582@robur.slu.se> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 6 Mar 2001, Robert Olsson wrote: > BTW did you look into my effort of having a private clearable copy of > the device stats for all open devices? The format as /proc/net/dev > plus an extra line for EWMA stats, bytes and pkts etc. > > SNMP is not aware of it but it could be possible. > > robur.slu.se: > /pub/Linux/net-development/ends/ends.c > Sorry Robert, i forgot about this. This is definetely a sane and good view. cheers, jamal From owner-netdev@oss.sgi.com Tue Mar 6 10:54:55 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 10:54:46 -0800 Received: from citadel.myri.com ([199.120.212.1]:20898 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 10:54:34 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id KAA03475; Tue, 6 Mar 2001 10:54:22 -0800 (PST) Date: Tue, 6 Mar 2001 10:54:22 -0800 (PST) From: Bob Felderman Message-Id: <200103061854.KAA03475@myri.com> To: andrewm@uow.edu.au, kuznet@ms2.inr.ac.ru Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: feldy@myri.com, netdev@oss.sgi.com Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => => I am looking now. Probably, it is some silly misprint in ip_fragment.c. => => => The problem with Bob's original report was that in the first lines => he reported an illegal kfree_skb with skb->list!=NULL, called => from ip_rcv(). This can be only bug in driver, nothing more. => => Actually, Bob, if you will say me that you found why this happened, => my enthusiasm in reauditing ip_fragment.c will grow just fantastically. 8) I have added in spinlocks for the interrupt routine and the transmit side. I don't see more stability and the oopses I sent yesterday are using the spinlocked code. If you don't see the skb->list anymore, it is probably because of the spinlocks, but I'm certainly seeing bad behavior. I generate the problem with a udp test from netperf (http://www.netperf.org) I've attached my udp_range script. Run it as udp_range I also do this echo "1048576" > /proc/sys/net/core/rmem_max echo "1048576" > /proc/sys/net/core/wmem_max echo "1048576" > /proc/sys/net/core/wmem_default echo "1048576" > /proc/sys/net/core/rmem_default echo "1048576" > /proc/sys/net/core/optmem_max Our network is really fast. When the machine is stable I can sustain a 1.5Gigabit/sec udp stream. It might be possible to reproduce this using 100Mbit ethernet, but it might require 1gbit ethernet with jumbo frames. I'm currently using a 9000 bytes MTU. Our interrupt routine will deliver only a single ethernet packet to the higher levels for each interrupt, so maybe that also stresses the IP fragmentation code. I don't see this problem on a linux-2.2 box and I don't see it when I remove one of the processors from the receiver on my setup. #!/bin/sh # # udp_range # # generate a whole lot of numbers from netperf to see the effects # of send size on thruput # # # usage : udp_range hostname # if [ $# -gt 1 ]; then echo "try again, correctly -> udp_range hostname" exit 1 fi # # some params # if [ $# -eq 1 ]; then REMHOST=$1 else echo "try again, correctly -> udp_range hostname" exit 1 fi # where is netperf NETHOME=. BUFSIZE="-s 2062144 -S 2062144" #BUFSIZE="-s 2147484 -S 2147484" #BUFSIZE="-s 1048576 -S 1048576" #BUFSIZE="-s 524288 -S 524288" #BUFSIZE="-s 262144 -S 262144" #BUFSIZE="-s 131072 -S 131072" #BUFSIZE="-s 65535 -S 65535" #BUFSIZE="-s 49152 -S 49152" #BUFSIZE="-s 49152 -S 131072" #BUFSIZE="-S 65536" TIME="10" # # some stuff for the arithmatic # # we start at start, and then multiply by MULT and add ADD. by changing # these numbers, we can double each time, or increase by a fixed ammount # START=32768 END=4 DIV=2 ADD=0 # Do we wish to measure CPU utilization? LOC_CPU="" REM_CPU="" #LOC_CPU="-c" #REM_CPU="-C" # If we are measuring CPU utilization, then we can save beaucoup # time by saving the results of the CPU calibration and passing # them in during the real tests. So, we execute the new CPU "tests" # of netperf and put the values into shell vars. case $LOC_CPU in \-c) LOC_RATE=`$NETHOME/netperf -t LOC_CPU`;; *) LOC_RATE="" esac case $REM_CPU in \-C) REM_RATE=`$NETHOME/netperf -t REM_CPU -H $REMHOST`;; *) REM_RATE="" esac # after the first datapoint, we don't want more headers # but we want one for the first one NO_HDR="" MESSAGE=$START while [ $MESSAGE -ge $END ]; do $NETHOME/netperf -p 9100 -l $TIME -H $REMHOST -t UDP_STREAM\ $LOC_CPU $LOC_RATE $REM_CPU $REM_RATE $NO_HDR --\ -m $MESSAGE $BUFSIZE NO_HDR="-P 0" MESSAGE=`expr $MESSAGE + $ADD` MESSAGE=`expr $MESSAGE \/ $DIV` done From owner-netdev@oss.sgi.com Tue Mar 6 11:16:16 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 11:16:06 -0800 Received: from mean.netppl.fi ([195.242.208.16]:62472 "EHLO mean.netppl.fi") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 11:15:46 -0800 Received: from evil.netppl.fi (root@evil.netppl.fi [195.242.209.201]) by mean.netppl.fi (8.9.3/8.9.3) with ESMTP id VAA02928; Tue, 6 Mar 2001 21:15:30 +0200 Received: (from pp@localhost) by evil.netppl.fi (8.9.3/8.9.3) id VAA14876; Tue, 6 Mar 2001 21:15:30 +0200 Date: Tue, 6 Mar 2001 21:15:30 +0200 From: Pekka Pietikainen To: Bob Felderman Cc: andrewm@uow.edu.au, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Message-ID: <20010306211530.A14203@netppl.fi> References: <200103061854.KAA03475@myri.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <200103061854.KAA03475@myri.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Mar 06, 2001 at 10:54:22AM -0800, Bob Felderman wrote: > a 1.5Gigabit/sec udp stream. It might be possible to reproduce this > using 100Mbit ethernet, but it might require 1gbit ethernet with jumbo > frames. I'm currently using a 9000 bytes MTU. Our interrupt routine > will deliver only a single ethernet packet to the higher levels > for each interrupt, so maybe that also stresses the IP fragmentation > code. Verified on Alteon GigE (1500 byte frames seemed to be ok, 9000 killed the receiver) using the script&/proc settings. I'd have to do some walking to get the Oops so I'll do that tomorrow :) -- Pekka Pietikainen From owner-netdev@oss.sgi.com Tue Mar 6 11:18:55 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 11:18:46 -0800 Received: from u-167-19.karlsruhe.ipdial.viaginterkom.de ([62.180.19.167]:35332 "EHLO u-167-19.karlsruhe.ipdial.viaginterkom.de") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 11:18:37 -0800 Received: from dea ([193.98.169.28]:42624 "EHLO dea.waldorf-gmbh.de") by bacchus.dhis.org with ESMTP id ; Tue, 6 Mar 2001 20:18:22 +0100 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f26JDAJ08047; Tue, 6 Mar 2001 20:13:10 +0100 Date: Tue, 6 Mar 2001 20:13:10 +0100 From: Ralf Baechle To: Andrew Morton Cc: netdev@oss.sgi.com, Quentin Arce Subject: Re: 2.2.19pre16 TCP problems Message-ID: <20010306201310.A8016@bacchus.dhis.org> References: <20010304223048.A17775@bacchus.dhis.org> <3AA2BFAE.BA4D414B@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3AA2BFAE.BA4D414B@uow.edu.au>; from andrewm@uow.edu.au on Sun, Mar 04, 2001 at 10:20:30PM +0000 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, Mar 04, 2001 at 10:20:30PM +0000, Andrew Morton wrote: > Any chance of getting a web archive of the list? > The one at http://www.wcug.wwu.edu/lists/netdev/ > died ages ago. http://oss.sgi.com/projects/netdev/mail/netdev/maillist.html http://oss.sgi.com/projects/netdev/mail/netdev/threads.html Credits to Quentin Arce for setting this up. Ralf From owner-netdev@oss.sgi.com Tue Mar 6 13:54:36 2001 Received: by oss.sgi.com id ; Tue, 6 Mar 2001 13:54:26 -0800 Received: from mta5.snfc21.pbi.net ([206.13.28.241]:50638 "EHLO mta5.snfc21.pbi.net") by oss.sgi.com with ESMTP id ; Tue, 6 Mar 2001 13:54:02 -0800 Received: from kryptonite ([206.170.7.245]) by mta5.snfc21.pbi.net (Sun Internet Mail Server sims.3.5.2000.01.05.12.18.p9) with SMTP id <0G9S00C5TQ4E7A@mta5.snfc21.pbi.net> for netdev@oss.sgi.com; Tue, 6 Mar 2001 13:53:04 -0800 (PST) Date: Tue, 06 Mar 2001 13:50:54 -0800 From: David Brownell Subject: Re: Zeroing interface stats? To: Andrew Morton , jamal Cc: netdev@oss.sgi.com Message-id: <020b01c0a687$8568af40$6800000a@brownell.org> MIME-version: 1.0 X-Mailer: Microsoft Outlook Express 5.50.4133.2400 Content-type: text/plain; charset="iso-8859-1" Content-transfer-encoding: 7bit X-MSMail-Priority: Normal X-MIMEOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 References: <3AA45E19.9E747A4F@uow.edu.au> X-Priority: 3 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > > > I've seen you use this `ifindex' term before. What is it, and > > > what is its role in life? > > > > -- interface index , retrievable via SIOCGIFINDEX > > ^ ^ ^^^^^ > > A unique identifier for a "net link". ... > > I see. And this index is published to external applications > and management systems as a reliable identifier with which > to reference this interface on this host? And so we're not > allowed to change it across the interface lifetime? > > Should we attempt to make an interface's ifindex constant > across reinstantiations of the interface (ie: driver reloads?). Or system reboots ... shouldn't there be a firm association with the physical hardware and the "ifindex"? For example, the PCI slot. Or perhaps in some cases Ethernet addresses. I'd like to see stable identifiers "published to external apps and management systems ..." that are independent of the order in which driver modules happen to have been loaded or initialized this particular time. (Yes, tough for interfaces that don't have hardware, but that's a different issue.) - Dave From owner-netdev@oss.sgi.com Wed Mar 7 02:17:41 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 02:17:32 -0800 Received: from isis.its.uow.edu.au ([130.130.68.21]:16300 "EHLO isis.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 02:17:18 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by isis.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id VAA16929; Wed, 7 Mar 2001 21:14:43 +1100 (EST) Message-ID: <3AA60A35.F7C70C1@uow.edu.au> Date: Wed, 07 Mar 2001 21:15:17 +1100 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.2-pre2 i586) X-Accept-Language: en MIME-Version: 1.0 To: Pekka Pietikainen CC: Bob Felderman , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] References: <200103061854.KAA03475@myri.com>, <200103061854.KAA03475@myri.com> <20010306211530.A14203@netppl.fi> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Pekka Pietikainen wrote: > > On Tue, Mar 06, 2001 at 10:54:22AM -0800, Bob Felderman wrote: > > a 1.5Gigabit/sec udp stream. It might be possible to reproduce this > > using 100Mbit ethernet, but it might require 1gbit ethernet with jumbo > > frames. I'm currently using a 9000 bytes MTU. Our interrupt routine > > will deliver only a single ethernet packet to the higher levels > > for each interrupt, so maybe that also stresses the IP fragmentation > > code. > Verified on Alteon GigE (1500 byte frames seemed to be ok, 9000 killed the > receiver) using the script&/proc settings. I'd have to do some walking to > get the Oops so I'll do that tomorrow :) So you're saying that you can reproduce this crash with acenic? And that it's specific to jumbo frames, SMP and the /proc settings which Bob is using? Bob is using a non-zerocopy kernel. What were you using? From owner-netdev@oss.sgi.com Wed Mar 7 05:24:53 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 05:24:33 -0800 Received: from mean.netppl.fi ([195.242.208.16]:49672 "EHLO mean.netppl.fi") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 05:24:02 -0800 Received: from evil.netppl.fi (root@evil.netppl.fi [195.242.209.201]) by mean.netppl.fi (8.9.3/8.9.3) with ESMTP id PAA14607; Wed, 7 Mar 2001 15:23:48 +0200 Received: (from pp@localhost) by evil.netppl.fi (8.9.3/8.9.3) id PAA25803; Wed, 7 Mar 2001 15:23:48 +0200 Date: Wed, 7 Mar 2001 15:23:48 +0200 From: Pekka Pietikainen To: Andrew Morton Cc: Pekka Pietikainen , Bob Felderman , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Message-ID: <20010307152348.A23249@netppl.fi> References: <200103061854.KAA03475@myri.com>, <200103061854.KAA03475@myri.com> <20010306211530.A14203@netppl.fi> <3AA60A35.F7C70C1@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <3AA60A35.F7C70C1@uow.edu.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Mar 07, 2001 at 09:15:17PM +1100, Andrew Morton wrote: > So you're saying that you can reproduce this crash with acenic? > > And that it's specific to jumbo frames, SMP and the /proc settings > which Bob is using? > > Bob is using a non-zerocopy kernel. What were you using? Here's the kdb report from 2.4.2-ac13 (or rather just stacktrace since I don't have a serial console setup :) ). Jumbo frames seem to be necessary to reproduce the problem (or at least they make it a lot more probable). I haven't tried a non-SMP kernel yet. Unable to handle kernel paging request at 0x5a5a5a72 __free_pages+0x3 skb_release_data+0x4e kfree_skbmem+0xe __kfree_skb+0xe3 ip_frag_destroy+0x73 ip_degrag+0x197 ip_local_deliver+0x20 ret_from_intr Looks like refcnt might not be increased properly somewhere, I'll try to hunt down the problem a bit more... -- Pekka Pietikainen From owner-netdev@oss.sgi.com Wed Mar 7 08:06:56 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 08:06:47 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:1043 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 7 Mar 2001 08:06:36 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id SAA08688; Wed, 7 Mar 2001 18:59:38 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103071559.SAA08688@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: andrewm@uow.edu.au (Andrew Morton) Date: Wed, 7 Mar 2001 18:59:38 +0300 (MSK) Cc: pp@evil.netppl.fi, feldy@myri.com, netdev@oss.sgi.com, davem@redhat.com (Dave Miller) In-Reply-To: <3AA60A35.F7C70C1@uow.edu.au> from "Andrew Morton" at Mar 7, 1 09:15:17 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 2129 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > So you're saying that you can reproduce this crash with acenic? Yes, I found capital logical bug there. Evictor on smp can corrupt queues. Shame on me, I _knew_ that this pattern is wrong, socket hash tables need the same, but they are right. So, fix is to work like sockets. Patch is appended. Please, test. Alexey --- ../vger3-010306/linux/net/ipv4/ip_fragment.c Wed Dec 20 22:31:50 2000 +++ linux/net/ipv4/ip_fragment.c Wed Mar 7 18:52:45 2001 @@ -214,18 +214,17 @@ if (ipq_hash[i] == NULL) continue; - write_lock(&ipfrag_lock); + read_lock(&ipfrag_lock); if ((qp = ipq_hash[i]) != NULL) { /* find the oldest queue for this hash bucket */ while (qp->next) qp = qp->next; - __ipq_unlink(qp); - write_unlock(&ipfrag_lock); + atomic_inc(&qp->refcnt); + read_unlock(&ipfrag_lock); spin_lock(&qp->lock); - if (del_timer(&qp->timer)) - atomic_dec(&qp->refcnt); - qp->last_in |= COMPLETE; + if (!(qp->last_in&COMPLETE)) + ipq_kill(qp); spin_unlock(&qp->lock); ipq_put(qp); @@ -233,7 +232,7 @@ progress = 1; continue; } - write_unlock(&ipfrag_lock); + read_unlock(&ipfrag_lock); } } while (progress); } --- ../vger3-010306/linux/net/ipv6/reassembly.c Sat Jan 13 21:29:44 2001 +++ linux/net/ipv6/reassembly.c Wed Mar 7 18:52:46 2001 @@ -204,18 +204,17 @@ if (ip6_frag_hash[i] == NULL) continue; - write_lock(&ip6_frag_lock); + read_lock(&ip6_frag_lock); if ((fq = ip6_frag_hash[i]) != NULL) { /* find the oldest queue for this hash bucket */ while (fq->next) fq = fq->next; - __fq_unlink(fq); - write_unlock(&ip6_frag_lock); + atomic_inc(&fq->refcnt); + read_unlock(&ip6_frag_lock); spin_lock(&fq->lock); - if (del_timer(&fq->timer)) - atomic_dec(&fq->refcnt); - fq->last_in |= COMPLETE; + if (!(fq->last_in&COMPLETE)) + fq_kill(fq); spin_unlock(&fq->lock); fq_put(fq); @@ -223,7 +222,7 @@ progress = 1; continue; } - write_unlock(&ip6_frag_lock); + read_unlock(&ip6_frag_lock); } } while (progress); } From owner-netdev@oss.sgi.com Wed Mar 7 09:02:47 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 09:02:37 -0800 Received: from mean.netppl.fi ([195.242.208.16]:23558 "EHLO mean.netppl.fi") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 09:02:14 -0800 Received: from evil.netppl.fi (root@evil.netppl.fi [195.242.209.201]) by mean.netppl.fi (8.9.3/8.9.3) with ESMTP id TAA17520 for ; Wed, 7 Mar 2001 19:02:01 +0200 Received: (from pp@localhost) by evil.netppl.fi (8.9.3/8.9.3) id TAA02970 for netdev@oss.sgi.com; Wed, 7 Mar 2001 19:02:01 +0200 Date: Wed, 7 Mar 2001 19:02:01 +0200 From: Pekka Pietikainen To: netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Message-ID: <20010307190201.A2205@netppl.fi> References: <3AA60A35.F7C70C1@uow.edu.au> <200103071559.SAA08688@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <200103071559.SAA08688@ms2.inr.ac.ru> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Mar 07, 2001 at 06:59:38PM +0300, kuznet@ms2.inr.ac.ru wrote: > Hello! > > > So you're saying that you can reproduce this crash with acenic? > > Yes, I found capital logical bug there. Evictor on smp can corrupt queues. > Shame on me, I _knew_ that this pattern is wrong, socket hash tables need > the same, but they are right. So, fix is to work like sockets. > > Patch is appended. Please, test. Hi Works for me, thanks. From owner-netdev@oss.sgi.com Wed Mar 7 10:35:06 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 10:34:47 -0800 Received: from citadel.myri.com ([199.120.212.1]:25532 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 10:34:36 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id KAA13649; Wed, 7 Mar 2001 10:34:19 -0800 (PST) Date: Wed, 7 Mar 2001 10:34:19 -0800 (PST) From: Bob Felderman Message-Id: <200103071834.KAA13649@myri.com> To: andrewm@uow.edu.au, kuznet@ms2.inr.ac.ru Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: davem@redhat.com, feldy@myri.com, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => > So you're saying that you can reproduce this crash with acenic? => => Yes, I found capital logical bug there. Evictor on smp can corrupt queues. => Shame on me, I _knew_ that this pattern is wrong, socket hash tables need => the same, but they are right. So, fix is to work like sockets. => => Patch is appended. Please, test. => => Alexey The patch appears to fix the problems I was seeing. I've been blasting UDP streams at the smp box for an hour or so. It used to die in a few seconds. I'm also not seeing the RX-ERR increment at all. I was often not able to get an skb. Definitely a BIG improvement from my perspective - Thanks! I'll be doing some more performance characterization shortly. At first glance, I think the udp receive rate is down quite a bit. rcc2 33% netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 4834 0 0 0 607 0 0 0 BRU lo 16192 0 46 0 0 0 46 0 0 0 LRU myri0 9000 0 133058589 0 0 0 2867 0 0 0 BRU From owner-netdev@oss.sgi.com Wed Mar 7 10:55:58 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 10:55:37 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:48134 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 7 Mar 2001 10:55:21 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA11377; Wed, 7 Mar 2001 21:48:25 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103071848.VAA11377@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Wed, 7 Mar 2001 21:48:25 +0300 (MSK) Cc: andrewm@uow.edu.au, davem@redhat.com, feldy@myri.com, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <200103071834.KAA13649@myri.com> from "Bob Felderman" at Mar 7, 1 10:34:19 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 303 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > is down quite a bit. Actually, the fact that eviction happens during your tests implies that you should increase eviction threshold (ipfrag_high_thresh) to value, when eviction never happens on LAN tests. It should happen only if driver has lost some fragment... not good on LAN too. Alexey From owner-netdev@oss.sgi.com Wed Mar 7 11:03:27 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 11:03:08 -0800 Received: from citadel.myri.com ([199.120.212.1]:22717 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 11:02:53 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id LAA14092; Wed, 7 Mar 2001 11:02:41 -0800 (PST) Date: Wed, 7 Mar 2001 11:02:41 -0800 (PST) From: Bob Felderman Message-Id: <200103071902.LAA14092@myri.com> To: feldy@myri.com, kuznet@ms2.inr.ac.ru Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => From kuznet@ms2.inr.ac.ru Wed Mar 7 10:55:04 2001 => Actually, the fact that eviction happens during your tests implies => that you should increase eviction threshold (ipfrag_high_thresh) => to value, when eviction never happens on LAN tests. It should happen => only if driver has lost some fragment... not good on LAN too. Is there a counter telling me when "eviction happens"? I rebooted both machines to get a clean start and I see good performance. The UDP sweet-spot is about 8KB (less than 1 mtu). 666MHz dual-processor P-II machines. rcc 31% udp_range rcc2-t UDP UNIDIRECTIONAL SEND TEST to rcc2-t Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 2097152 32768 10.00 62720 0 1644.89 2097152 10.00 7780 204.04 2097152 16384 11.00 119694 0 1426.33 2097152 11.00 19009 226.52 2097152 8192 6.86 201220 0 1922.43 2097152 6.86 166168 1587.55 2097152 4096 5.48 335840 0 2008.92 2097152 5.48 231505 1384.81 2097152 2048 9.26 504164 0 891.66 2097152 9.26 329154 582.14 2097152 1024 9.52 697557 0 600.01 2097152 9.52 517130 444.81 2097152 512 13.77 1146536 0 341.11 2097152 13.77 639733 190.33 2097152 256 11.00 1451247 0 270.30 2097152 11.00 640510 119.30 2097152 128 9.00 1432713 0 163.08 2097152 9.00 605639 68.94 2097152 64 4.73 1641784 0 177.88 2097152 4.73 625544 67.77 2097152 32 3.58 1436986 0 102.77 2097152 3.58 635321 45.44 2097152 16 9.99 1650593 0 21.14 2097152 9.99 642957 8.23 2097152 8 9.99 1410515 0 9.03 2097152 9.99 644561 4.13 2097152 4 6.36 1657061 0 8.33 2097152 6.36 643897 3.24 From owner-netdev@oss.sgi.com Wed Mar 7 11:16:28 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 11:16:08 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:7943 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 7 Mar 2001 11:16:03 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA11723; Wed, 7 Mar 2001 22:09:24 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103071909.WAA11723@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Wed, 7 Mar 2001 22:09:24 +0300 (MSK) Cc: feldy@myri.com, andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <200103071902.LAA14092@myri.com> from "Bob Felderman" at Mar 7, 1 11:02:41 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 262 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Is there a counter telling me when "eviction happens"? No separate count, but it is accounted in snmp IpReasmFails, together with timed out reassemblies. If I understand things correctly, we should not ever see IpReasmFails!=0 in LAN tests. Alexey From owner-netdev@oss.sgi.com Wed Mar 7 13:18:18 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 13:18:11 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:9357 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Wed, 7 Mar 2001 13:17:51 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 871E41FE8; Wed, 7 Mar 2001 16:17:36 -0500 (EST) Message-ID: <3AA6A570.57FF2D36@mandrakesoft.com> Date: Wed, 07 Mar 2001 16:17:36 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Linux Knernel Mailing List Cc: Andrew Morton , Linus Torvalds , Alan Cox , "David S. Miller" , netdev@oss.sgi.com Subject: [PATCH] RFC: fix ethernet device initialization Content-Type: multipart/mixed; boundary="------------83E4F9CA0D3D41503ACE4163" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. --------------83E4F9CA0D3D41503ACE4163 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit People from time to time point out a wart in ethernet initialization: The net_device is allocated and registered to the system in init_etherdev, which is usually one of the first things an ethernet driver probe function does. The net_device's final members are setup at some time between then and the exit of the probe function. There is never a clear point where the net device is available to the system for use. Our API already supports a solution -- setup the device, then call register_netdev. The patch below adds a helper, alloc_etherdev, to eliminate duplicate code in drivers. Ethernet device initialization, after the patch, should now look like dev = alloc_etherdev(sizeof(struct netdev_private)); ... initialize device ... ... set up net_device struct members ... rc = register_netdevice(dev); if (rc) /* handle error */ netif_start_queue(dev); This makes the ethernet driver look and behave similar to other APIs in the kernel, and presents a nice and atomic present-this-netdev-to-the-system operation. It should be noted that there is a net_device::init function. IIRC in the last discussion of this issue, ::init() was mentioned as a possible solution. I agree that init() can be used, but I do not like the idea of using init as a probe function, as the ISA drivers do it. This means the ethernet device has the potential of holdling rtnl_lock for a long time, and staying in the probe function, inside register_netdevice, for a long time. And... I agree that init() can be used as a constructor for filling in dev->xxx and dev->priv->xxx values, but I think doing so is pointless: you must fill in certain values before your "constructor" is called, just so your constructor has enough information to operate. Patch description: * Add alloc_etherdev, alloc_trdev, alloc_hippi_dev, alloc_fcdev, ... * Use declarator macros to create init_etherdev, init_trdev, etc., and remove duplicate code. * Move net_init EXPORT_SYMBOL from net/netsyms.c to net_init.c. * Convert drivers/net/8139too.c to use alloc_etherdev, as an example. Final word, mostly PCI drivers need this change. ISA drivers use the net_device::init constructor approach. IMHO they shouldn't stay in register_netdevice so long, but that is a wart. This patch fixes a bug, the huge span of time between init_etherdev and the indeterminant point in the future when the net device is ready for use. Comments? -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie --------------83E4F9CA0D3D41503ACE4163 Content-Type: text/plain; charset=us-ascii; name="alloc-netdev-2.4.3.3.patch.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="alloc-netdev-2.4.3.3.patch.txt" Index: drivers/net/8139too.c =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/drivers/net/8139too.c,v retrieving revision 1.1.1.29 diff -u -r1.1.1.29 8139too.c --- drivers/net/8139too.c 2001/03/05 23:46:05 1.1.1.29 +++ drivers/net/8139too.c 2001/03/07 20:58:24 @@ -647,11 +647,43 @@ (RX_DMA_BURST << RxCfgDMAShift); +static void __rtl8139_cleanup_dev (struct net_device *dev) +{ + struct rtl8139_private *tp; + struct pci_dev *pdev; + + assert (dev != NULL); + assert (dev->priv != NULL); + + tp = dev->priv; + assert (tp->pci_dev != NULL); + pdev = tp->pci_dev; + +#ifndef USE_IO_OPS + if (tp->mmio_addr) + iounmap (tp->mmio_addr); +#endif /* !USE_IO_OPS */ + + /* it's ok to call this even if we have no regions to free */ + pci_release_regions (pdev); + +#ifndef RTL8139_NDEBUG + /* poison memory before freeing */ + memset (dev, 0xBC, + sizeof (struct net_device) + + sizeof (struct rtl8139_private)); +#endif /* RTL8139_NDEBUG */ + + kfree (dev); + + pci_set_drvdata (pdev, NULL); +} + + static int __devinit rtl8139_init_board (struct pci_dev *pdev, - struct net_device **dev_out, - void **ioaddr_out) + struct net_device **dev_out) { - void *ioaddr = NULL; + void *ioaddr; struct net_device *dev; struct rtl8139_private *tp; u8 tmp8; @@ -663,20 +695,19 @@ DPRINTK ("ENTER\n"); assert (pdev != NULL); - assert (ioaddr_out != NULL); - *ioaddr_out = NULL; *dev_out = NULL; - /* dev zeroed in init_etherdev */ - dev = init_etherdev (NULL, sizeof (*tp)); + /* dev and dev->priv zeroed in alloc_etherdev */ + dev = alloc_etherdev (sizeof (*tp)); if (dev == NULL) { - printk (KERN_ERR PFX "unable to alloc new ethernet\n"); + printk (KERN_ERR PFX "Unable to alloc new net device\n"); DPRINTK ("EXIT, returning -ENOMEM\n"); return -ENOMEM; } SET_MODULE_OWNER(dev); tp = dev->priv; + tp->pci_dev = pdev; /* enable device (incl. PCI PM wakeup and hotplug setup) */ rc = pci_enable_device (pdev); @@ -722,7 +753,7 @@ goto err_out; } - rc = pci_request_regions (pdev, dev->name); + rc = pci_request_regions (pdev, "8139too"); if (rc) goto err_out; @@ -731,14 +762,17 @@ #ifdef USE_IO_OPS ioaddr = (void *) pio_start; + dev->base_addr = pio_start; #else /* ioremap MMIO region */ ioaddr = ioremap (mmio_start, mmio_len); if (ioaddr == NULL) { printk (KERN_ERR PFX "cannot remap MMIO, aborting\n"); rc = -EIO; - goto err_out_free_res; + goto err_out; } + dev->base_addr = (long) ioaddr; + tp->mmio_addr = ioaddr; #endif /* USE_IO_OPS */ /* Soft reset the chip. */ @@ -766,12 +800,12 @@ if ((tmp8 & Cfg1_PIO) == 0) { printk (KERN_ERR PFX "PIO not enabled, Cfg1=%02X, aborting\n", tmp8); rc = -EIO; - goto err_out_iounmap; + goto err_out; } if ((tmp8 & Cfg1_MMIO) == 0) { printk (KERN_ERR PFX "MMIO not enabled, Cfg1=%02X, aborting\n", tmp8); rc = -EIO; - goto err_out_iounmap; + goto err_out; } /* identify chip attached to board */ @@ -795,20 +829,11 @@ rtl_chip_info[tp->chipset].name); DPRINTK ("EXIT, returning 0\n"); - *ioaddr_out = ioaddr; *dev_out = dev; return 0; -err_out_iounmap: - assert (ioaddr > 0); -#ifndef USE_IO_OPS - iounmap (ioaddr); -err_out_free_res: -#endif /* !USE_IO_OPS */ - pci_release_regions (pdev); err_out: - unregister_netdev (dev); - kfree (dev); + __rtl8139_cleanup_dev (dev); DPRINTK ("EXIT, returning %d\n", rc); return rc; } @@ -820,7 +845,7 @@ struct net_device *dev = NULL; struct rtl8139_private *tp; int i, addr_len, option; - void *ioaddr = NULL; + void *ioaddr; static int board_idx = -1; static int printed_version; u8 tmp; @@ -837,13 +862,14 @@ printed_version = 1; } - i = rtl8139_init_board (pdev, &dev, &ioaddr); + i = rtl8139_init_board (pdev, &dev); if (i < 0) { DPRINTK ("EXIT, returning %d\n", i); return i; } tp = dev->priv; + ioaddr = tp->mmio_addr; assert (ioaddr != NULL); assert (dev != NULL); @@ -865,20 +891,22 @@ dev->watchdog_timeo = TX_TIMEOUT; dev->irq = pdev->irq; - dev->base_addr = (unsigned long) ioaddr; /* dev->priv/tp zeroed and aligned in init_etherdev */ tp = dev->priv; /* note: tp->chipset set in rtl8139_init_board */ tp->drv_flags = board_info[ent->driver_data].hw_flags; - tp->pci_dev = pdev; tp->mmio_addr = ioaddr; spin_lock_init (&tp->lock); init_waitqueue_head (&tp->thr_wait); init_MUTEX_LOCKED (&tp->thr_exited); + + /* dev is fully set up and ready to use now */ + i = register_netdev (dev); + if (i) goto err_out; - pci_set_drvdata(pdev, dev); + pci_set_drvdata (pdev, dev); printk (KERN_INFO "%s: %s at 0x%lx, " "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x, " @@ -954,6 +982,10 @@ DPRINTK ("EXIT - returning 0\n"); return 0; + +err_out: + __rtl8139_cleanup_dev (dev); + return i; } @@ -965,28 +997,12 @@ DPRINTK ("ENTER\n"); assert (dev != NULL); - - np = (struct rtl8139_private *) (dev->priv); + np = dev->priv; assert (np != NULL); unregister_netdev (dev); -#ifndef USE_IO_OPS - iounmap (np->mmio_addr); -#endif /* !USE_IO_OPS */ - - pci_release_regions (pdev); - -#ifndef RTL8139_NDEBUG - /* poison memory before freeing */ - memset (dev, 0xBC, - sizeof (struct net_device) + - sizeof (struct rtl8139_private)); -#endif /* RTL8139_NDEBUG */ - - kfree (dev); - - pci_set_drvdata (pdev, NULL); + __rtl8139_cleanup_dev (dev); DPRINTK ("EXIT\n"); } Index: drivers/net/Makefile =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/drivers/net/Makefile,v retrieving revision 1.1.1.25.2.1 diff -u -r1.1.1.25.2.1 Makefile --- drivers/net/Makefile 2001/03/07 09:28:01 1.1.1.25.2.1 +++ drivers/net/Makefile 2001/03/07 20:58:24 @@ -15,8 +15,9 @@ # All of the (potential) objects that export symbols. # This list comes from 'grep -l EXPORT_SYMBOL *.[hc]'. -export-objs := 8390.o arlan.o aironet4500_core.o aironet4500_card.o ppp_async.o \ - ppp_generic.o slhc.o pppox.o auto_irq.o +export-objs := 8390.o arlan.o aironet4500_core.o aironet4500_card.o \ + ppp_async.o ppp_generic.o slhc.o pppox.o auto_irq.o \ + net_init.o ifeq ($(CONFIG_TULIP),y) obj-y += tulip/tulip.o Index: drivers/net/net_init.c =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/drivers/net/net_init.c,v retrieving revision 1.1.1.8 diff -u -r1.1.1.8 net_init.c --- drivers/net/net_init.c 2001/02/27 03:03:50 1.1.1.8 +++ drivers/net/net_init.c 2001/03/07 20:58:25 @@ -28,10 +28,14 @@ up. We now share common code and have regularised name allocation setups. Abolished the 16 card limits. 03/19/2000 - jgarzik and Urban Widmark: init_etherdev 32-byte align + Mar 7, 2001 - jgarzik: Add alloc_dev for various protocols. + Coalesce duplicate functions into macro calls to + DECLARE_*DEV macros. */ #include +#include #include #include #include @@ -50,6 +54,7 @@ #include #include + /* The network devices currently exist only in the socket namespace, so these entries are unused. The only ones that make sense are open start the ethercard @@ -67,6 +72,69 @@ and a space waste] */ +#define DECLARE_INIT_DEV(suffix,mask,setup) \ + struct net_device *init_##suffix(struct net_device *dev, int sizeof_priv) \ + { return init_netdev(dev, sizeof_priv, mask, setup); } \ + EXPORT_SYMBOL(init_##suffix) + +#define DECLARE_ALLOC_DEV(suffix,mask,setup) \ + struct net_device *alloc_##suffix(unsigned int sizeof_priv) \ + { \ + struct net_device * dev; \ + dev = alloc_netdev(sizeof_priv, setup); \ + if (dev) \ + strcpy(dev->name, mask); \ + return dev; \ + } \ + EXPORT_SYMBOL(alloc_##suffix) + +#define DECLARE_REG_DEV(suffix) \ + int register_##suffix(struct net_device *dev) \ + { \ + dev_init_buffers(dev); \ + if (dev->init && dev->init(dev) != 0) { \ + unregister_##suffix(dev); \ + return -EIO; \ + } \ + return 0; \ + } \ + EXPORT_SYMBOL(register_##suffix) + +#define DECLARE_UNREG_DEV(suffix) \ + void unregister_##suffix(struct net_device *dev) \ + { unregister_netdev(dev); } \ + EXPORT_SYMBOL(unregister_##suffix) + +#define DECLARE_CHG_MTU(suffix,low,high) \ + static int suffix##_change_mtu(struct net_device *dev, int new_mtu) \ + { \ + if ((new_mtu < low) || (new_mtu > high)) \ + return -EINVAL; \ + dev->mtu = new_mtu; \ + return 0; \ + } + + + +static struct net_device *alloc_netdev(unsigned int sizeof_priv, + void (*setup)(struct net_device *)) +{ + struct net_device *dev; + int alloc_size; + + /* ensure 32-byte alignment of the private area */ + alloc_size = sizeof (*dev) + sizeof_priv + 31; + + dev = kmalloc (sizeof (*dev) + sizeof_priv, GFP_KERNEL); + if (!dev) + return NULL; + memset (dev, 0, sizeof (*dev) + sizeof_priv); + if (sizeof_priv) + dev->priv = (void *) (((long)(dev + 1) + 31) & ~31); + setup (dev); + return dev; +} + static struct net_device *init_alloc_dev(int sizeof_priv) { @@ -153,18 +221,16 @@ * * If no device structure is passed, a new one is constructed, complete with * a private data area of size @sizeof_priv. A 32-byte (not bit) - * alignment is enforced for this private data area. + * alignment is guaranteed for this private data area. * * If an empty string area is passed as dev->name, or a new structure is made, * a new name string is constructed. */ -struct net_device *init_etherdev(struct net_device *dev, int sizeof_priv) -{ - return init_netdev(dev, sizeof_priv, "eth%d", ether_setup); -} +DECLARE_INIT_DEV(etherdev, "eth%d", ether_setup); +DECLARE_ALLOC_DEV(etherdev, "eth%d", ether_setup); +DECLARE_CHG_MTU(eth, 68, 1500); - static int eth_mac_addr(struct net_device *dev, void *p) { struct sockaddr *addr=p; @@ -174,45 +240,14 @@ return 0; } -static int eth_change_mtu(struct net_device *dev, int new_mtu) -{ - if ((new_mtu < 68) || (new_mtu > 1500)) - return -EINVAL; - dev->mtu = new_mtu; - return 0; -} - #ifdef CONFIG_FDDI - -struct net_device *init_fddidev(struct net_device *dev, int sizeof_priv) -{ - return init_netdev(dev, sizeof_priv, "fddi%d", fddi_setup); -} - -static int fddi_change_mtu(struct net_device *dev, int new_mtu) -{ - if ((new_mtu < FDDI_K_SNAP_HLEN) || (new_mtu > FDDI_K_SNAP_DLEN)) - return(-EINVAL); - dev->mtu = new_mtu; - return(0); -} - +DECLARE_INIT_DEV(fddidev, "fddi%d", fddi_setup); +DECLARE_ALLOC_DEV(fddidev, "fddi%d", fddi_setup); +DECLARE_CHG_MTU(fddi, FDDI_K_SNAP_HLEN, FDDI_K_SNAP_DLEN); #endif /* CONFIG_FDDI */ #ifdef CONFIG_HIPPI -static int hippi_change_mtu(struct net_device *dev, int new_mtu) -{ - /* - * HIPPI's got these nice large MTUs. - */ - if ((new_mtu < 68) || (new_mtu > 65280)) - return -EINVAL; - dev->mtu = new_mtu; - return(0); -} - - /* * For HIPPI we will actually use the lower 4 bytes of the hardware * address as the I-FIELD rather than the actual hardware address. @@ -225,21 +260,11 @@ memcpy(dev->dev_addr, addr->sa_data, dev->addr_len); return 0; } - - -struct net_device *init_hippi_dev(struct net_device *dev, int sizeof_priv) -{ - return init_netdev(dev, sizeof_priv, "hip%d", hippi_setup); -} - - -void unregister_hipdev(struct net_device *dev) -{ - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); -} +DECLARE_INIT_DEV(hippi_dev, "hip%d", hippi_setup); +DECLARE_ALLOC_DEV(hippi_dev, "hip%d", hippi_setup); +DECLARE_UNREG_DEV(hipdev); /* "hipdev" is not a typo -jgarzik */ +DECLARE_CHG_MTU(hippi, 68, 65280); static int hippi_neigh_setup_dev(struct net_device *dev, struct neigh_parms *p) { @@ -283,9 +308,9 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(ether_setup); #ifdef CONFIG_FDDI - void fddi_setup(struct net_device *dev) { /* @@ -312,13 +337,13 @@ return; } - +EXPORT_SYMBOL(fddi_setup); #endif /* CONFIG_FDDI */ #ifdef CONFIG_HIPPI void hippi_setup(struct net_device *dev) { - dev->set_multicast_list = NULL; + dev->set_multicast_list = NULL; dev->change_mtu = hippi_change_mtu; dev->hard_header = hippi_header; dev->rebuild_header = hippi_rebuild_header; @@ -349,8 +374,10 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(hippi_setup); #endif /* CONFIG_HIPPI */ + #if defined(CONFIG_ATALK) || defined(CONFIG_ATALK_MODULE) static int ltalk_change_mtu(struct net_device *dev, int mtu) @@ -363,7 +390,6 @@ return -EINVAL; } - void ltalk_setup(struct net_device *dev) { /* Fill in the fields of the device structure with localtalk-generic values. */ @@ -387,9 +413,11 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(ltalk_setup); #endif /* CONFIG_ATALK || CONFIG_ATALK_MODULE */ + int register_netdev(struct net_device *dev) { int err; @@ -438,7 +466,10 @@ rtnl_unlock(); } +EXPORT_SYMBOL(register_netdev); +EXPORT_SYMBOL(unregister_netdev); + #ifdef CONFIG_TR static void tr_configure(struct net_device *dev) @@ -462,32 +493,17 @@ dev->flags = IFF_BROADCAST | IFF_MULTICAST ; } -struct net_device *init_trdev(struct net_device *dev, int sizeof_priv) -{ - return init_netdev(dev, sizeof_priv, "tr%d", tr_configure); -} +DECLARE_INIT_DEV(trdev, "tr%d", tr_configure); +DECLARE_ALLOC_DEV(trdev, "tr%d", tr_configure); +DECLARE_REG_DEV(trdev); +DECLARE_UNREG_DEV(trdev); void tr_setup(struct net_device *dev) { + tr_configure(dev); } +EXPORT_SYMBOL(tr_setup); -int register_trdev(struct net_device *dev) -{ - dev_init_buffers(dev); - - if (dev->init && dev->init(dev) != 0) { - unregister_trdev(dev); - return -EIO; - } - return 0; -} - -void unregister_trdev(struct net_device *dev) -{ - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); -} #endif /* CONFIG_TR */ @@ -511,29 +527,11 @@ dev_init_buffers(dev); return; } +EXPORT_SYMBOL(fc_setup); +DECLARE_INIT_DEV(fcdev, "fc%d", fc_setup); +DECLARE_ALLOC_DEV(fcdev, "fc%d", fc_setup); +DECLARE_REG_DEV(fcdev); +DECLARE_UNREG_DEV(fcdev); -struct net_device *init_fcdev(struct net_device *dev, int sizeof_priv) -{ - return init_netdev(dev, sizeof_priv, "fc%d", fc_setup); -} - -int register_fcdev(struct net_device *dev) -{ - dev_init_buffers(dev); - if (dev->init && dev->init(dev) != 0) { - unregister_fcdev(dev); - return -EIO; - } - return 0; -} - -void unregister_fcdev(struct net_device *dev) -{ - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); -} - #endif /* CONFIG_NET_FC */ - Index: include/linux/etherdevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/etherdevice.h,v retrieving revision 1.1.1.10 diff -u -r1.1.1.10 etherdevice.h --- include/linux/etherdevice.h 2001/03/07 08:25:42 1.1.1.10 +++ include/linux/etherdevice.h 2001/03/07 20:58:26 @@ -39,6 +39,7 @@ extern int eth_header_parse(struct sk_buff *skb, unsigned char *haddr); extern struct net_device * init_etherdev(struct net_device *, int); +extern struct net_device * alloc_etherdev(unsigned int); static __inline__ void eth_copy_and_sum (struct sk_buff *dest, unsigned char *src, int len, int base) { Index: include/linux/fcdevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/fcdevice.h,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 fcdevice.h --- include/linux/fcdevice.h 2000/10/22 19:36:14 1.1.1.1 +++ include/linux/fcdevice.h 2001/03/07 20:58:26 @@ -34,6 +34,7 @@ //extern unsigned short fc_type_trans(struct sk_buff *skb, struct net_device *dev); extern struct net_device * init_fcdev(struct net_device *, int); +extern struct net_device * alloc_fcdev(unsigned int); #endif Index: include/linux/fddidevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/fddidevice.h,v retrieving revision 1.1.1.2 diff -u -r1.1.1.2 fddidevice.h --- include/linux/fddidevice.h 2000/10/22 20:44:24 1.1.1.2 +++ include/linux/fddidevice.h 2001/03/07 20:58:26 @@ -35,6 +35,7 @@ extern unsigned short fddi_type_trans(struct sk_buff *skb, struct net_device *dev); extern struct net_device * init_fddidev(struct net_device *, int); +extern struct net_device * alloc_fddidev(unsigned int); #endif #endif /* _LINUX_FDDIDEVICE_H */ Index: include/linux/hippidevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/hippidevice.h,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 hippidevice.h --- include/linux/hippidevice.h 2000/10/22 19:36:13 1.1.1.1 +++ include/linux/hippidevice.h 2001/03/07 20:58:26 @@ -52,6 +52,7 @@ void hippi_setup(struct net_device *dev); extern struct net_device *init_hippi_dev(struct net_device *, int); +extern struct net_device *alloc_hippi_dev(unsigned int); extern void unregister_hipdev(struct net_device *dev); #endif Index: include/linux/trdevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/trdevice.h,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 trdevice.h --- include/linux/trdevice.h 2000/10/22 19:36:03 1.1.1.1 +++ include/linux/trdevice.h 2001/03/07 20:58:26 @@ -34,6 +34,7 @@ extern int tr_rebuild_header(struct sk_buff *skb); extern unsigned short tr_type_trans(struct sk_buff *skb, struct net_device *dev); extern struct net_device * init_trdev(struct net_device *, int); +extern struct net_device * alloc_trdev(unsigned int); #endif Index: net/netsyms.c =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/net/netsyms.c,v retrieving revision 1.1.1.19 diff -u -r1.1.1.19 netsyms.c --- net/netsyms.c 2001/03/07 08:27:32 1.1.1.19 +++ net/netsyms.c 2001/03/07 20:58:26 @@ -432,33 +432,19 @@ #endif /* CONFIG_INET */ #ifdef CONFIG_TR -EXPORT_SYMBOL(tr_setup); EXPORT_SYMBOL(tr_type_trans); -EXPORT_SYMBOL(register_trdev); -EXPORT_SYMBOL(unregister_trdev); -EXPORT_SYMBOL(init_trdev); #endif -#ifdef CONFIG_NET_FC -EXPORT_SYMBOL(register_fcdev); -EXPORT_SYMBOL(unregister_fcdev); -EXPORT_SYMBOL(init_fcdev); -#endif - /* Device callback registration */ EXPORT_SYMBOL(register_netdevice_notifier); EXPORT_SYMBOL(unregister_netdevice_notifier); /* support for loadable net drivers */ #ifdef CONFIG_NET -EXPORT_SYMBOL(init_etherdev); EXPORT_SYMBOL(loopback_dev); EXPORT_SYMBOL(register_netdevice); EXPORT_SYMBOL(unregister_netdevice); -EXPORT_SYMBOL(register_netdev); -EXPORT_SYMBOL(unregister_netdev); EXPORT_SYMBOL(netdev_state_change); -EXPORT_SYMBOL(ether_setup); EXPORT_SYMBOL(dev_new_index); EXPORT_SYMBOL(dev_get_by_index); EXPORT_SYMBOL(__dev_get_by_index); @@ -469,8 +455,6 @@ EXPORT_SYMBOL(eth_type_trans); #ifdef CONFIG_FDDI EXPORT_SYMBOL(fddi_type_trans); -EXPORT_SYMBOL(fddi_setup); -EXPORT_SYMBOL(init_fddidev); #endif /* CONFIG_FDDI */ #if 0 EXPORT_SYMBOL(eth_copy_and_sum); @@ -511,8 +495,6 @@ #ifdef CONFIG_HIPPI EXPORT_SYMBOL(hippi_type_trans); -EXPORT_SYMBOL(init_hippi_dev); -EXPORT_SYMBOL(unregister_hipdev); #endif #ifdef CONFIG_SYSCTL @@ -522,12 +504,6 @@ EXPORT_SYMBOL(sysctl_ip_default_ttl); #endif #endif - -#if defined(CONFIG_ATALK) || defined(CONFIG_ATALK_MODULE) -#include -EXPORT_SYMBOL(ltalk_setup); -#endif - /* Packet scheduler modules want these. */ EXPORT_SYMBOL(qdisc_destroy); --------------83E4F9CA0D3D41503ACE4163-- From owner-netdev@oss.sgi.com Wed Mar 7 13:24:09 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 13:23:59 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:20621 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Wed, 7 Mar 2001 13:23:45 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id D88611FE5; Wed, 7 Mar 2001 16:23:33 -0500 (EST) Message-ID: <3AA6A6D6.70877AA3@mandrakesoft.com> Date: Wed, 07 Mar 2001 16:23:34 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Linux Knernel Mailing List Cc: Andrew Morton , Linus Torvalds , Alan Cox , "David S. Miller" , netdev@oss.sgi.com Subject: Re: [PATCH] RFC: fix ethernet device initialization References: <3AA6A570.57FF2D36@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Oh, it should be noted that since this is intended as a stable 2.4 series change. The patch does not change any existing APIs, only adds a function. Existing 2.4 drivers are free to continue using init_etherdev... This bug, which I fix, isn't causing oops AFAIK, just exporting ugliness to user space etc. -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Wed Mar 7 14:17:39 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 14:17:30 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:26767 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Wed, 7 Mar 2001 14:17:08 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 6385A1FE5; Wed, 7 Mar 2001 17:16:56 -0500 (EST) Message-ID: <3AA6B359.83DDB814@mandrakesoft.com> Date: Wed, 07 Mar 2001 17:16:57 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Linux Knernel Mailing List Cc: netdev@oss.sgi.com Subject: Re: [PATCH] RFC: fix ethernet device initialization References: <3AA6A570.57FF2D36@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jeff Garzik wrote: > Our API already supports a solution -- setup the device, then call > register_netdev. The patch below adds a helper, alloc_etherdev, to > eliminate duplicate code in drivers. Ethernet device initialization, > after the patch, should now look like > > dev = alloc_etherdev(sizeof(struct netdev_private)); > ... initialize device ... > ... set up net_device struct members ... > rc = register_netdevice(dev); > if (rc) /* handle error */ > netif_start_queue(dev); Think-o in my example: netif_start_queue occurs in dev->open(), not in the probe phase. Simply ignore that line in the example and you are ok. Jeff -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Wed Mar 7 15:06:20 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 15:06:11 -0800 Received: from smtp03.mrf.mail.rcn.net ([207.172.4.62]:15550 "EHLO smtp03.mrf.mail.rcn.net") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 15:05:54 -0800 Received: from 66-44-55-43.s297.tnt1.lnhva.md.dialup.rcn.com ([66.44.55.43] helo=STARL2455) by smtp03.mrf.mail.rcn.net with smtp (Exim 3.16 #5) id 14an08-0003s4-00 for netdev@oss.sgi.com; Wed, 07 Mar 2001 18:05:49 -0500 Message-ID: <009501c0a75b$0287a340$3365a8c0@StargazerGroup.com> From: "pop.erols.com" To: Subject: How to get bandwidth value from an interface Date: Wed, 7 Mar 2001 18:04:45 -0500 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0092_01C0A731.175D9920" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. ------=_NextPart_000_0092_01C0A731.175D9920 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hello, How do I get a bandwidth value for an interface from a user space = program ? I was able to retrieve other values such as MAC, IP addr and = so on ... but I don't know how to retrieve the bandwidth value. Any help would be greatly appreciated. Thanks, Hoa. ------=_NextPart_000_0092_01C0A731.175D9920 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hello,
 
How do I get a bandwidth value for an = interface=20 from a user space program ? I was able to retrieve other values such as = MAC, IP=20 addr and so on ... but I don't know how to retrieve the bandwidth=20 value.
 
Any help would be greatly = appreciated.
 
Thanks,
Hoa.
 
------=_NextPart_000_0092_01C0A731.175D9920-- From owner-netdev@oss.sgi.com Wed Mar 7 15:14:20 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 15:14:11 -0800 Received: from horus.its.uow.edu.au ([130.130.68.25]:4753 "EHLO horus.its.uow.edu.au") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 15:14:07 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id KAA08265; Thu, 8 Mar 2001 10:13:05 +1100 (EST) Message-ID: <3AA6C080.99D35298@uow.edu.au> Date: Wed, 07 Mar 2001 23:13:04 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.4.1-pre10 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: Linux Knernel Mailing List , Linus Torvalds , Alan Cox , "David S. Miller" , netdev@oss.sgi.com, Arjan van de Ven Subject: Re: [PATCH] RFC: fix ethernet device initialization References: <3AA6A570.57FF2D36@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jeff Garzik wrote: > > People from time to time point out a wart in ethernet initialization: > They sure do. You were away at the time, but I had a 94 file, 140k patch late last year which fixed all this. It's at http://www.uow.edu.au/~andrewm/linux/netdevice.patch and the design doc is at http://www.uow.edu.au/~andrewm/linux/netdevice2.txt >From a quick look, I think the only substantive difference here is that my `prepare_etherdev()' function allocates and reserves the device's name (eth0), but prevents it from being available in netdevice namespace lookups. This was done because lots of drivers wanted to do: init_etherdev(); (Replaced with prepare_etherdev()) printk("%s: something", dev->name); The changes to dev.c and net_init.c were fairly subtle and took some thinking about - we should revisit them if you want to go ahead with this. The patch all worked OK, was back-compatible with unaltered drivers, and indeed altered all the drivers. But it kind of got lost. Too big, too late and dev_probe_lock() was there. Now, Arjan says that this race is causing oopses. This surprises me, because current kernels have the the dev_probe_lock() hack which I put in. This fixes the problem for PCI and Cardbus drivers. The ISA drivers generally use the dev->init() technique which is not racy. There isn't a lot left over. Arjan? Which driver? The other reason I'm surprised that it's causing oopses: most racy drivers do this: xxx_probe() { init_etherdev(); dev->open = xxx_open; return; } So the vastly most probably failure mode if the race occurs is this: the interface is opened while dev->open is NULL. This won't oops. Sure, the interface is screwed because the open() routine hasn't been called, but it should hang in there. A subsequent close() of the interface *will* call dev->close, and I guess the driver is likely to get upset if its close() routine is called without a corresponding open(). Yes, we can fix this if we want, and kill off dev_probe_lock(). It'll only take a few days. Do we want? If not, we can extend the dev_probe_lock() thing to cover probes for other busses. USB, I guess. - From owner-netdev@oss.sgi.com Wed Mar 7 15:25:31 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 15:25:20 -0800 Received: from [199.183.24.200] ([199.183.24.200]:29096 "EHLO devserv.devel.redhat.com") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 15:25:04 -0800 Received: (from alan@localhost) by devserv.devel.redhat.com (8.11.0/8.11.0) id f27NNYl21255; Wed, 7 Mar 2001 18:23:34 -0500 From: Alan Cox Message-Id: <200103072323.f27NNYl21255@devserv.devel.redhat.com> Subject: Re: [PATCH] RFC: fix ethernet device initialization To: andrewm@uow.edu.au (Andrew Morton) Date: Wed, 7 Mar 2001 18:23:34 -0500 (EST) Cc: jgarzik@mandrakesoft.com (Jeff Garzik), linux-kernel@vger.kernel.org (Linux Knernel Mailing List), torvalds@transmeta.com (Linus Torvalds), alan@redhat.com (Alan Cox), davem@redhat.com (David S. Miller), netdev@oss.sgi.com, arjan@fenrus.demon.nl (Arjan van de Ven) In-Reply-To: <3AA6C080.99D35298@uow.edu.au> from "Andrew Morton" at Mar 07, 2001 11:13:04 PM X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > It'll only take a few days. Do we want? If not, we can > extend the dev_probe_lock() thing to cover probes for > other busses. USB, I guess. cardbus.. usb.. insmod/rmmod I'd like it fixed, but you have to convince DaveM From owner-netdev@oss.sgi.com Thu Mar 8 00:10:53 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 00:10:44 -0800 Received: from host000012.arnet.net.ar ([200.45.0.12]:51208 "HELO smtp1.arnet.com.ar") by oss.sgi.com with SMTP id ; Thu, 8 Mar 2001 00:10:22 -0800 Received: (qmail 20280 invoked from network); 8 Mar 2001 07:50:01 -0000 Received: ThePolice Version 0.02 by GCM Received: from host000005.arnet.net.ar (HELO mail2.arnet.com.ar) (200.45.0.5) by host000012.arnet.net.ar with SMTP; 8 Mar 2001 07:50:01 -0000 Received: from mail pickup service by mail2.arnet.com.ar with Microsoft SMTPSVC; Thu, 8 Mar 2001 04:47:46 -0300 Received: from recife.arnet.com.ar ([200.45.0.70]) by mail2.arnet.com.ar with Microsoft SMTPSVC(5.5.1877.357.35); Wed, 7 Mar 2001 16:03:29 -0300 Received: (qmail 7168 invoked from network); 7 Mar 2001 18:59:10 -0000 Received: from oss.sgi.com (216.32.174.190) by recife.arnet.com.ar with SMTP; 7 Mar 2001 18:59:09 -0000 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 11:03:08 -0800 Received: from citadel.myri.com ([199.120.212.1]:22717 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 11:02:53 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id LAA14092; Wed, 7 Mar 2001 11:02:41 -0800 (PST) Date: Wed, 7 Mar 2001 11:02:41 -0800 (PST) From: Bob Felderman Message-Id: <200103071902.LAA14092@myri.com> To: feldy@myri.com, kuznet@ms2.inr.ac.ru Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => From kuznet@ms2.inr.ac.ru Wed Mar 7 10:55:04 2001 => Actually, the fact that eviction happens during your tests implies => that you should increase eviction threshold (ipfrag_high_thresh) => to value, when eviction never happens on LAN tests. It should happen => only if driver has lost some fragment... not good on LAN too. Is there a counter telling me when "eviction happens"? I rebooted both machines to get a clean start and I see good performance. The UDP sweet-spot is about 8KB (less than 1 mtu). 666MHz dual-processor P-II machines. rcc 31% udp_range rcc2-t UDP UNIDIRECTIONAL SEND TEST to rcc2-t Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 2097152 32768 10.00 62720 0 1644.89 2097152 10.00 7780 204.04 2097152 16384 11.00 119694 0 1426.33 2097152 11.00 19009 226.52 2097152 8192 6.86 201220 0 1922.43 2097152 6.86 166168 1587.55 2097152 4096 5.48 335840 0 2008.92 2097152 5.48 231505 1384.81 2097152 2048 9.26 504164 0 891.66 2097152 9.26 329154 582.14 2097152 1024 9.52 697557 0 600.01 2097152 9.52 517130 444.81 2097152 512 13.77 1146536 0 341.11 2097152 13.77 639733 190.33 2097152 256 11.00 1451247 0 270.30 2097152 11.00 640510 119.30 2097152 128 9.00 1432713 0 163.08 2097152 9.00 605639 68.94 2097152 64 4.73 1641784 0 177.88 2097152 4.73 625544 67.77 2097152 32 3.58 1436986 0 102.77 2097152 3.58 635321 45.44 2097152 16 9.99 1650593 0 21.14 2097152 9.99 642957 8.23 2097152 8 9.99 1410515 0 9.03 2097152 9.99 644561 4.13 2097152 4 6.36 1657061 0 8.33 2097152 6.36 643897 3.24 From owner-netdev@oss.sgi.com Thu Mar 8 00:21:03 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 00:20:54 -0800 Received: from host000012.arnet.net.ar ([200.45.0.12]:26884 "HELO smtp1.arnet.com.ar") by oss.sgi.com with SMTP id ; Thu, 8 Mar 2001 00:20:38 -0800 Received: (qmail 3076 invoked from network); 8 Mar 2001 07:25:41 -0000 Received: ThePolice Version 0.02 by GCM Received: from host000005.arnet.net.ar (HELO mail2.arnet.com.ar) (200.45.0.5) by host000012.arnet.net.ar with SMTP; 8 Mar 2001 07:25:41 -0000 Received: from mail pickup service by mail2.arnet.com.ar with Microsoft SMTPSVC; Thu, 8 Mar 2001 04:24:05 -0300 Received: from recife.arnet.com.ar ([200.45.0.70]) by mail2.arnet.com.ar with Microsoft SMTPSVC(5.5.1877.357.35); Wed, 7 Mar 2001 15:56:00 -0300 Received: (qmail 31890 invoked from network); 7 Mar 2001 18:51:41 -0000 Received: from oss.sgi.com (216.32.174.190) by recife.arnet.com.ar with SMTP; 7 Mar 2001 18:51:41 -0000 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 10:55:37 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:48134 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 7 Mar 2001 10:55:21 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA11377; Wed, 7 Mar 2001 21:48:25 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103071848.VAA11377@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Wed, 7 Mar 2001 21:48:25 +0300 (MSK) Cc: andrewm@uow.edu.au, davem@redhat.com, feldy@myri.com, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <200103071834.KAA13649@myri.com> from "Bob Felderman" at Mar 7, 1 10:34:19 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 303 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > is down quite a bit. Actually, the fact that eviction happens during your tests implies that you should increase eviction threshold (ipfrag_high_thresh) to value, when eviction never happens on LAN tests. It should happen only if driver has lost some fragment... not good on LAN too. Alexey From owner-netdev@oss.sgi.com Thu Mar 8 03:28:44 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 03:28:34 -0800 Received: from host000012.arnet.net.ar ([200.45.0.12]:64783 "HELO smtp1.arnet.com.ar") by oss.sgi.com with SMTP id ; Thu, 8 Mar 2001 03:28:12 -0800 Received: (qmail 8552 invoked from network); 8 Mar 2001 11:16:57 -0000 Received: ThePolice Version 0.02 by GCM Received: from host000005.arnet.net.ar (HELO mail2.arnet.com.ar) (200.45.0.5) by host000012.arnet.net.ar with SMTP; 8 Mar 2001 11:16:57 -0000 Received: from mail pickup service by mail2.arnet.com.ar with Microsoft SMTPSVC; Thu, 8 Mar 2001 08:12:56 -0300 Received: from recife.arnet.com.ar ([200.45.0.70]) by mail2.arnet.com.ar with Microsoft SMTPSVC(5.5.1877.357.35); Wed, 7 Mar 2001 16:16:31 -0300 Received: (qmail 21899 invoked from network); 7 Mar 2001 19:12:11 -0000 Received: from oss.sgi.com (216.32.174.190) by recife.arnet.com.ar with SMTP; 7 Mar 2001 19:12:11 -0000 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 11:16:08 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:7943 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 7 Mar 2001 11:16:03 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA11723; Wed, 7 Mar 2001 22:09:24 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103071909.WAA11723@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Wed, 7 Mar 2001 22:09:24 +0300 (MSK) Cc: feldy@myri.com, andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <200103071902.LAA14092@myri.com> from "Bob Felderman" at Mar 7, 1 11:02:41 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 262 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Is there a counter telling me when "eviction happens"? No separate count, but it is accounted in snmp IpReasmFails, together with timed out reassemblies. If I understand things correctly, we should not ever see IpReasmFails!=0 in LAN tests. Alexey From owner-netdev@oss.sgi.com Thu Mar 8 05:39:05 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 05:38:55 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:17643 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 05:38:49 -0800 Received: from fred.muc.de (noidentity@ns1032.munich.netsurf.de [195.180.235.32]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id OAA23405; Thu, 8 Mar 2001 14:33:55 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id 599DAE3911; Thu, 8 Mar 2001 14:41:34 +0100 (CET) Date: Thu, 8 Mar 2001 14:41:34 +0100 From: Andi Kleen To: kuznet@ms2.inr.ac.ru Cc: Bob Felderman , andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Message-ID: <20010308144134.A2382@fred.local> References: <200103071834.KAA13649@myri.com> <200103071848.VAA11377@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <200103071848.VAA11377@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Wed, Mar 07, 2001 at 07:48:25PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Mar 07, 2001 at 07:48:25PM +0100, A.N.Kuznetsov wrote: > Hello! > > > is down quite a bit. > > Actually, the fact that eviction happens during your tests implies > that you should increase eviction threshold (ipfrag_high_thresh) > to value, when eviction never happens on LAN tests. It should happen > only if driver has lost some fragment... not good on LAN too. I think the current default size of the defrag buffer is far too low anyways. It should probably scale by numbers of interfaces also and better by bandwidth of the interface. Something like 1-2MB is much more reasonable with today's memory sizes. In my experience upping it can increase NFS performance a lot. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Thu Mar 8 05:39:05 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 05:38:55 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:24043 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 05:38:51 -0800 Received: from fred.muc.de (noidentity@ns1032.munich.netsurf.de [195.180.235.32]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id OAA23448; Thu, 8 Mar 2001 14:34:02 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id 599DAE3911; Thu, 8 Mar 2001 14:41:34 +0100 (CET) Date: Thu, 8 Mar 2001 14:41:34 +0100 From: Andi Kleen To: kuznet@ms2.inr.ac.ru Cc: Bob Felderman , andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Message-ID: <20010308144134.A2382@fred.local> References: <200103071834.KAA13649@myri.com> <200103071848.VAA11377@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <200103071848.VAA11377@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Wed, Mar 07, 2001 at 07:48:25PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, Mar 07, 2001 at 07:48:25PM +0100, A.N.Kuznetsov wrote: > Hello! > > > is down quite a bit. > > Actually, the fact that eviction happens during your tests implies > that you should increase eviction threshold (ipfrag_high_thresh) > to value, when eviction never happens on LAN tests. It should happen > only if driver has lost some fragment... not good on LAN too. I think the current default size of the defrag buffer is far too low anyways. It should probably scale by numbers of interfaces also and better by bandwidth of the interface. Something like 1-2MB is much more reasonable with today's memory sizes. In my experience upping it can increase NFS performance a lot. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Thu Mar 8 09:36:59 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 09:36:49 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:6413 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 09:36:34 -0800 Received: from lxplus012.cern.ch (IDENT:root@lxplus012.cern.ch [137.138.161.115]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id SAA13998; Thu, 8 Mar 2001 18:35:55 +0100 (MET) Received: (from jes@localhost) by lxplus012.cern.ch (8.9.3/8.9.3) id SAA14119; Thu, 8 Mar 2001 18:35:54 +0100 To: Jeff Garzik Cc: Linux Knernel Mailing List , netdev@oss.sgi.com Subject: Re: [PATCH] RFC: fix ethernet device initialization References: <3AA6A570.57FF2D36@mandrakesoft.com> From: Jes Sorensen Date: 08 Mar 2001 18:35:54 +0100 In-Reply-To: Jeff Garzik's message of "Wed, 07 Mar 2001 16:17:36 -0500" Message-ID: Lines: 21 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Jeff" == Jeff Garzik writes: Jeff> People from time to time point out a wart in ethernet Jeff> initialization: The net_device is allocated and registered to Jeff> the system in init_etherdev, which is usually one of the first Jeff> things an ethernet driver probe function does. The net_device's Jeff> final members are setup at some time between then and the exit Jeff> of the probe function. There is never a clear point where the Jeff> net device is available to the system for use. I don't like the way you declare all the code in obscure macros in there. +#define DECLARE_CHG_MTU(suffix,low,high) \ + static int suffix##_change_mtu(struct net_device *dev, int new_mtu) \ ...... All it does is to make the code harder to read and debug for little/no gain. Jes From owner-netdev@oss.sgi.com Thu Mar 8 09:50:20 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 09:50:09 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:28686 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 8 Mar 2001 09:49:43 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA25910; Thu, 8 Mar 2001 20:42:51 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103081742.UAA25910@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: ak@muc.de (Andi Kleen) Date: Thu, 8 Mar 2001 20:42:51 +0300 (MSK) Cc: feldy@myri.com, andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <20010308144134.A2382@fred.local> from "Andi Kleen" at Mar 8, 1 02:41:34 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 341 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > In my experience upping it can increase NFS performance a lot. Indeed, 256K is too low. Even TCP needs ~512K window on gige. Actually, gentle hint to Bob was to estimate minimal threshold, when evicton completely stops on myrinet. This can be taken as new default, adjusted to lower values for low memory configurations. Alexey From owner-netdev@oss.sgi.com Thu Mar 8 10:55:50 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 10:55:40 -0800 Received: from citadel.myri.com ([199.120.212.1]:13012 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 10:55:22 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id KAA24522; Thu, 8 Mar 2001 10:55:00 -0800 (PST) Received: from localhost by orion.myri.com.myri.com (4.1/SMI-4.1) id AA05154; Thu, 8 Mar 01 10:56:43 PST Date: Thu, 8 Mar 2001 10:56:43 -0800 (PST) From: Bob Felderman To: kuznet@ms2.inr.ac.ru Cc: Bob Felderman , Andi Kleen , andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <200103081742.UAA25910@ms2.inr.ac.ru> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 8 Mar 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > In my experience upping it can increase NFS performance a lot. > > Indeed, 256K is too low. Even TCP needs ~512K window on gige. > > Actually, gentle hint to Bob was to estimate minimal threshold, > when evicton completely stops on myrinet. This can be taken as > new default, adjusted to lower values for low memory configurations. > > Alexey > I tried a little yesterday, but here's more info. Upping the threshold does not help a great deal. I went all the way to 4Meg and still saw similar behavior of reassem fails. Between 1 and 2 echo "1048576" > /proc/sys/net/ipv4/ipfrag_high_thresh between 2 and 3 echo "2097152" > /proc/sys/net/ipv4/ipfrag_high_thresh between 3 and 4 echo "4194304" > /proc/sys/net/ipv4/ipfrag_high_thresh Then I ran a netperf udp test of about the same length. grep "reassembles failed" /tmp/1 /tmp/2 /tmp/3 /tmp/4; /tmp/1: 608545 packet reassembles failed /tmp/2: 881139 packet reassembles failed /tmp/3: 1128584 packet reassembles failed /tmp/4: 1322635 packet reassembles failed I think the culprit here is the packet receive errors from UDP. These are socket overflows I think. I'm trying to track it down. When I used DaveM's zero-copy patches on a linux-2.4.0 kernel, most, if not all, of these packet receive errors went away. grep "packet receive errors" /tmp/1 /tmp/2 /tmp/3 /tmp/4 /tmp/1: 621834 packet receive errors /tmp/2: 826133 packet receive errors /tmp/3: 985257 packet receive errors /tmp/4: 1140693 packet receive errors From owner-netdev@oss.sgi.com Thu Mar 8 11:21:19 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 11:21:09 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:39951 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 8 Mar 2001 11:20:52 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA26654; Thu, 8 Mar 2001 22:14:00 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103081914.WAA26654@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Thu, 8 Mar 2001 22:14:00 +0300 (MSK) Cc: feldy@myri.com, ak@muc.de, andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: from "Bob Felderman" at Mar 8, 1 10:56:43 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 959 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > grep "reassembles failed" /tmp/1 /tmp/2 /tmp/3 /tmp/4; > /tmp/1: 608545 packet reassembles failed > /tmp/2: 881139 packet reassembles failed > /tmp/3: 1128584 packet reassembles failed > /tmp/4: 1322635 packet reassembles failed Very interesting result. The dependence is inverted. 8)8) Probably fragments are dropped due to backlog overflows. The more reassemblies succeed is the more time we lose merging, the more pressure on backlog. Negative feedback, equilibrium value is unpredicatble. 8) Look into /proc/net/softnet_stat, the second column. What does it show? > track it down. When I used DaveM's zero-copy patches on Wow! Did you test _without_ it? How was zerocopyless kernel able to hold 1.5Gig? It copies all twice! > a linux-2.4.0 kernel, most, if not all, of these > packet receive errors went away. They must disappear completely. Each loss on LAN is bug. User has right to expect that no losses happen. Alexey From owner-netdev@oss.sgi.com Thu Mar 8 11:36:59 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 11:36:40 -0800 Received: from citadel.myri.com ([199.120.212.1]:20949 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 11:36:33 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id LAA25028; Thu, 8 Mar 2001 11:36:25 -0800 (PST) Received: from localhost by orion.myri.com.myri.com (4.1/SMI-4.1) id AA05228; Thu, 8 Mar 01 11:38:08 PST Date: Thu, 8 Mar 2001 11:38:08 -0800 (PST) From: Bob Felderman To: kuznet@ms2.inr.ac.ru Cc: Bob Felderman , ak@muc.de, andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <200103081914.WAA26654@ms2.inr.ac.ru> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 8 Mar 2001 kuznet@ms2.inr.ac.ru wrote: > the more pressure on backlog. Negative feedback, equilibrium value > is unpredicatble. 8) Look into /proc/net/softnet_stat, > the second column. What does it show? [root@rcc intel_linux]# cat /proc/net/softnet_stat 00fe20ec 0000ae4e 00000ece 000000b6 00000000 00000000 00000000 00000000 0012e228 00eff495 0000b941 000010d0 000000cc 00000000 00000000 00000000 00000000 0006f8bd > > > > track it down. When I used DaveM's zero-copy patches on > > Wow! Did you test _without_ it? > How was zerocopyless kernel able to hold 1.5Gig? It copies all twice! linux-2.4.0 has always been SIGNIFICANTLY better than linux-2.2 even with a single processor, I'm able to get above 1.5Gbit/sec UDP and nearly 1.0Gbit/sec tcp. > > > > a linux-2.4.0 kernel, most, if not all, of these > > packet receive errors went away. > > They must disappear completely. Each loss on LAN is bug. > User has right to expect that no losses happen. I'd like them to go away, but UDP losees due to socket overflows are quite common on most operating systems. From owner-netdev@oss.sgi.com Thu Mar 8 11:56:29 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 11:56:10 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:54799 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 8 Mar 2001 11:55:58 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA26988; Thu, 8 Mar 2001 22:49:01 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103081949.WAA26988@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Thu, 8 Mar 2001 22:49:01 +0300 (MSK) Cc: feldy@myri.com, ak@muc.de, andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: from "Bob Felderman" at Mar 8, 1 11:38:08 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 755 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > [root@rcc intel_linux]# cat /proc/net/softnet_stat > 00fe20ec 0000ae4e 00000ece 000000b6 00000000 00000000 00000000 00000000 > 0012e228 > 00eff495 0000b941 000010d0 000000cc 00000000 00000000 00000000 00000000 > 0006f8bd Yes, statistics is sad. Each ~300th frame is lost, serious latency problems. Is this without zerocopy? I still hope, it looks better with it. > I'd like them to go away, but UDP losees due to socket overflows > are quite common on most operating systems. Well, provided we do not attach gigabit interface to 20MHz i386 and do not experience floods of tiny frames, all the rest is problem of user. If he selects enough high rcvbuf and do not sleep instead of working, he may expect that kernel will not not lose. Alexey From owner-netdev@oss.sgi.com Thu Mar 8 13:42:30 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 13:42:12 -0800 Received: from ottawa.linuxcare.com ([216.208.98.2]:36851 "EHLO itanic.thepuffingroup.com") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 13:42:07 -0800 Received: (from jes@localhost) by itanic.thepuffingroup.com (8.11.1/8.11.1) id f28LlS301042; Thu, 8 Mar 2001 16:47:28 -0500 Date: Thu, 8 Mar 2001 16:47:28 -0500 Message-Id: <200103082147.f28LlS301042@itanic.thepuffingroup.com> X-Authentication-Warning: itanic.thepuffingroup.com: jes set sender to jes@linuxcare.com using -f From: Jes Sorensen To: netdev@oss.sgi.com CC: davem@redhat.com Subject: initial acenic ZC cleanup Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi Here is an initial cleanup of the AceNIC ZC patch. This is far from where I want the code to go, but this is at least a starting point. It also includes a couple of changes that were lost since the ZC patch branced off from the official driver and removes the clearing of statistics on open(). Dave, please apply. Jes --- bork/drivers/net/acenic.c Thu Mar 8 15:17:43 2001 +++ linux-2.4.2-zc-010228/drivers/net/acenic.c Thu Mar 8 16:40:39 2001 @@ -2,7 +2,7 @@ * acenic.c: Linux driver for the Alteon AceNIC Gigabit Ethernet card * and other Tigon based cards. * - * Copyright 1998-2000 by Jes Sorensen, . + * Copyright 1998-2001 by Jes Sorensen, . * * Thanks to Alteon and 3Com for providing hardware and documentation * enabling me to write this driver. @@ -208,10 +208,20 @@ #if (LINUX_VERSION_CODE < 0x02032b) /* * SoftNet + * + * For pre-softnet kernels we need to tell the upper layer not to + * re-enter start_xmit() while we are in there. However softnet + * guarantees not to enter while we are in there so there is no need + * to do the netif_stop_queue() dance unless the transmit queue really + * gets stuck. This should also improve performance according to tests + * done by Aman Singla. */ -#define dev_kfree_skb_irq(a) dev_kfree_skb(a) -#define netif_wake_queue(dev) do { clear_bit(0, &dev->tbusy); mark_bh(NET_BH); } while (0) -#define netif_stop_queue(dev) set_bit(0, &dev->tbusy) +#define dev_kfree_skb_irq(a) dev_kfree_skb(a) +#define netif_wake_queue(dev) clear_bit(0, &dev->tbusy) +#define netif_stop_queue(dev) set_bit(0, &dev->tbusy) +#define late_stop_netif_stop_queue(dev) {do{} while(0);} +#define early_stop_netif_stop_queue(dev) test_and_set_bit(0,&dev->tbusy) +#define early_stop_netif_wake_queue(dev) netif_wake_queue(dev) static inline void netif_start_queue(struct net_device *dev) { @@ -220,9 +230,10 @@ dev->start = 1; } -#define netif_queue_stopped(dev) dev->tbusy -#define netif_running(dev) dev->start -#define ace_if_down(dev) {do{dev->start = 0;}while (0);} +#define ace_mark_net_bh() mark_bh(NET_BH) +#define netif_queue_stopped(dev) dev->tbusy +#define netif_running(dev) dev->start +#define ace_if_down(dev) {do{dev->start = 0;} while(0);} #define tasklet_struct tq_struct static inline void tasklet_schedule(struct tasklet_struct *tasklet) @@ -242,7 +253,11 @@ } #define tasklet_kill(tasklet) {do{} while(0);} #else -#define ace_if_down(dev) {do{} while(0);} +#define late_stop_netif_stop_queue(dev) netif_stop_queue(dev) +#define early_stop_netif_stop_queue(dev) 0 +#define early_stop_netif_wake_queue(dev) {do{} while(0);} +#define ace_mark_net_bh() {do{} while(0);} +#define ace_if_down(dev) {do{} while(0);} #endif #if (LINUX_VERSION_CODE >= 0x02031b) @@ -459,14 +474,15 @@ #define DEF_RX_MAX_DESC 25 #define DEF_TX_RATIO 21 /* 24 */ -#define DEF_JUMBO_TX_COAL DEF_TX_COAL /* was 20 */ -#define DEF_JUMBO_TX_MAX_DESC DEF_TX_MAX_DESC -#define DEF_JUMBO_RX_COAL DEF_RX_COAL /* was 30 */ +#define DEF_JUMBO_TX_COAL 20 +#define DEF_JUMBO_TX_MAX_DESC 60 +#define DEF_JUMBO_RX_COAL 30 #define DEF_JUMBO_RX_MAX_DESC 6 #define DEF_JUMBO_TX_RATIO 21 #if tigon2FwReleaseLocal < 20001118 -/* Standard firmware and early modifications duplicate +/* + * Standard firmware and early modifications duplicate * IRQ load without this flag (coal timer is never reset). * Note that with this flag tx_coal should be less than * time to xmit full tx ring. @@ -474,12 +490,16 @@ */ #define TX_COAL_INTS_ONLY 1 /* worth it */ #else -/* With modified firmware, this is not necessary, but still useful. */ +/* + * With modified firmware, this is not necessary, but still useful. + */ #define TX_COAL_INTS_ONLY 1 #endif + #define DEF_TRACE 0 #define DEF_STAT (2 * TICKS_PER_SEC) + static int link[ACE_MAX_MOD_PARMS]; static int trace[ACE_MAX_MOD_PARMS]; static int tx_coal_tick[ACE_MAX_MOD_PARMS]; @@ -490,7 +510,7 @@ static int dis_pci_mem_inval[ACE_MAX_MOD_PARMS] = {1, 1, 1, 1, 1, 1, 1, 1}; static char version[] __initdata = - "acenic.c: v0.50 02/02/2001 Jes Sorensen, linux-acenic@SunSITE.dk\n" + "acenic.c: v0.80 03/08/2001 Jes Sorensen, linux-acenic@SunSITE.dk\n" " http://home.cern.ch/~jes/gige/acenic.html\n"; static struct net_device *root_dev; @@ -932,14 +952,12 @@ if (ap->evt_ring == NULL) goto fail; -#if USE_HOST_TX_RING size = (sizeof(struct tx_desc) * TX_RING_ENTRIES); ap->tx_ring = pci_alloc_consistent(ap->pdev, size, &ap->tx_ring_dma); if (ap->tx_ring == NULL) goto fail; -#endif ap->evt_prd = pci_alloc_consistent(ap->pdev, sizeof(u32), &ap->evt_prd_dma); @@ -1250,7 +1268,7 @@ #endif writel(tmp, ®s->PciState); -#if 1 +#if 0 /* * I have received reports from people having problems when this * bit is enabled. @@ -1409,34 +1427,29 @@ *(ap->rx_ret_prd) = 0; writel(TX_RING_BASE, ®s->WinBase); -#if USE_HOST_TX_RING memset(ap->tx_ring, 0, TX_RING_ENTRIES * sizeof(struct tx_desc)); set_aceaddr(&info->tx_ctrl.rngptr, ap->tx_ring_dma); info->tx_ctrl.max_len = TX_RING_ENTRIES; - info->tx_ctrl.flags = RCB_FLG_TCP_UDP_SUM|RCB_FLG_NO_PSEUDO_HDR|RCB_FLG_TX_HOST_RING; -#else - ap->tx_ring = (struct tx_desc *)regs->Window; - for (i = 0; i < (TX_RING_ENTRIES * sizeof(struct tx_desc) / 4); i++) { - writel(0, (unsigned long)ap->tx_ring + i * 4); - } - - set_aceaddr(&info->tx_ctrl.rngptr, TX_RING_BASE); - info->tx_ctrl.max_len = TX_RING_ENTRIES; - info->tx_ctrl.flags = RCB_FLG_TCP_UDP_SUM|RCB_FLG_NO_PSEUDO_HDR; -#endif + tmp = RCB_FLG_TCP_UDP_SUM|RCB_FLG_NO_PSEUDO_HDR|RCB_FLG_TX_HOST_RING; #if TX_COAL_INTS_ONLY - info->tx_ctrl.flags |= RCB_FLG_COAL_INT_ONLY; + tmp |= RCB_FLG_COAL_INT_ONLY; #endif + info->tx_ctrl.flags = tmp; set_aceaddr(&info->tx_csm_ptr, ap->tx_csm_dma); /* * Potential item for tuning parameter */ +#if 0 /* NO */ writel(DMA_THRESH_16W, ®s->DmaReadCfg); writel(DMA_THRESH_16W, ®s->DmaWriteCfg); +#else + writel(DMA_THRESH_8W, ®s->DmaReadCfg); + writel(DMA_THRESH_8W, ®s->DmaWriteCfg); +#endif writel(0, ®s->MaskInt); writel(1, ®s->IfIdx); @@ -1525,10 +1538,8 @@ if (ap->version >= 2) writel(tmp, ®s->TuneFastLink); -#ifndef CONFIG_ACENIC_OMIT_TIGON_I if (ACE_IS_TIGON_I(ap)) writel(tigonFwStartAddr, ®s->Pc); -#endif if (ap->version == 2) writel(tigon2FwStartAddr, ®s->Pc); @@ -1548,6 +1559,11 @@ writel(0, ®s->RxRetCsm); /* + * Zero the stats before starting the interface + */ + memset(&ap->stats, 0, sizeof(ap->stats)); + + /* * Start the NIC CPU */ writel(readl(®s->CpuCtrl) & ~(CPU_HALT|CPU_TRACE), ®s->CpuCtrl); @@ -1648,8 +1664,10 @@ } } -static void ace_watchdog(struct net_device *dev) + +static void ace_watchdog(struct net_device *data) { + struct net_device *dev = data; struct ace_private *ap = dev->priv; struct ace_regs *regs = ap->regs; @@ -1910,8 +1928,8 @@ return; error_out: if (net_ratelimit()) - printk(KERN_INFO "Out of memory when allocating " - "jumbo receive buffers\n"); + printk(KERN_INFO "Out of memory when allocating " + "jumbo receive buffers\n"); goto out; } @@ -1944,11 +1962,12 @@ case E_C_LINK_UP: { u32 state = readl(&ap->regs->GigLnkState); - printk(KERN_WARNING "%s: Optical link UP (FD%c; FC: TX%c, RX%c)\n", + printk(KERN_WARNING "%s: Optical link UP " + "(%s Duplex, Flow Control: %s%s)\n", dev->name, - state&LNK_FULL_DUPLEX ? '+' : '-', - state&LNK_TX_FLOW_CTL_Y ? '+' : '-', - state&LNK_RX_FLOW_CTL_Y ? '+' : '-'); + state & LNK_FULL_DUPLEX ? "Full":"Half", + state & LNK_TX_FLOW_CTL_Y ? "TX " : "", + state & LNK_RX_FLOW_CTL_Y ? "RX" : ""); break; } case E_C_LINK_DOWN: @@ -2099,7 +2118,8 @@ skb->dev = dev; skb->protocol = eth_type_trans(skb, dev); - /* So, instead of forcing poor tigon mips cpu to calculate + /* + * Instead of forcing the poor tigon mips cpu to calculate * pseudo hdr checksum, we do this ourselves. */ if (bd_flags & BD_FLG_TCP_UDP_SUM) { @@ -2208,6 +2228,7 @@ */ } + static void ace_interrupt(int irq, void *dev_id, struct pt_regs *ptregs) { struct ace_private *ap; @@ -2228,7 +2249,8 @@ if (!(readl(®s->HostCtrl) & IN_INT)) return; - /* ACK intr now. Otherwise we will lose updates to rx_ret_prd, + /* + * ACK intr now. Otherwise we will lose updates to rx_ret_prd, * which happened _after_ rxretprd = *ap->rx_ret_prd; but before * writel(0, ®s->Mb0Lo). * @@ -2254,15 +2276,14 @@ idx = ap->tx_ret_csm; if (txcsm != idx) { -#if MAX_SKB_FRAGS - /* If each skb takes only one descriptor this check degenerates + /* + * If each skb takes only one descriptor this check degenerates * to identity, because new space has just been opened. * But if skbs are fragmented we must check that this index * update releases enough of space, otherwise we just * wait for device to make more work. */ if (!tx_ring_full(txcsm, ap->tx_prd)) -#endif ace_tx_int(dev, txcsm, idx); } @@ -2333,6 +2354,7 @@ } } + static int ace_open(struct net_device *dev) { struct ace_private *ap; @@ -2349,11 +2371,6 @@ writel(dev->mtu + ETH_HLEN + 4, ®s->IfMtu); - /* - * Zero the stats when restarting the interface... - */ - memset(&ap->stats, 0, sizeof(ap->stats)); - cmd.evt = C_CLEAR_STATS; cmd.code = 0; cmd.idx = 0; @@ -2390,6 +2407,9 @@ ACE_MOD_INC_USE_COUNT; + /* + * Setup the bottom half rx ring refill handler + */ tasklet_init(&ap->ace_tasklet, ace_tasklet, (unsigned long)dev); return 0; } @@ -2405,7 +2425,8 @@ ace_if_down(dev); - /* Without (or before) releasing irq and stopping hardware, this + /* + * Without (or before) releasing irq and stopping hardware, this * is an absolute non-sense, by the way. It will be reset instantly * by the first irq. */ @@ -2449,13 +2470,7 @@ mapping = info->mapping; if (mapping) { -#if USE_HOST_TX_RING memset(ap->tx_ring+i, 0, sizeof(struct tx_desc)); -#else - writel(0, &ap->tx_ring[i].addr.addrhi); - writel(0, &ap->tx_ring[i].addr.addrlo); - writel(0, &ap->tx_ring[i].flagsize); -#endif pci_unmap_single(ap->pdev, mapping, info->maplen, PCI_DMA_TODEVICE); info->mapping = 0; @@ -2480,7 +2495,9 @@ return 0; } -/* Following below should be (in more clean form!) in arch/ARCH/kernel/pci_*. + +/* + * Following below should be (in more clean form!) in arch/ARCH/kernel/pci_*. * For now, let it stay here. */ #if defined(CONFIG_HIGHMEM) && MAX_SKB_FRAGS @@ -2489,43 +2506,39 @@ #endif #if defined(CONFIG_X86) -#define BITS_PER_DMAADDR 64 #define DMAADDR_OFFSET 0 typedef unsigned long long dmaaddr_high_t; #elif defined(CONFIG_PPC) -#define BITS_PER_DMAADDR BITS_PER_LONG #define DMAADDR_OFFSET PCI_DRAM_OFFSET typedef unsigned long dmaaddr_high_t; #endif + static inline dmaaddr_high_t pci_map_single_high(struct pci_dev *hwdev, struct page *page, int offset, size_t size, int dir) { dmaaddr_high_t phys; - phys = (page-mem_map) * - (dmaaddr_high_t) PAGE_SIZE + - offset; + phys = (page-mem_map) * (dmaaddr_high_t) PAGE_SIZE + offset; - return phys+DMAADDR_OFFSET; + return (phys + DMAADDR_OFFSET); } #else typedef unsigned long dmaaddr_high_t; -#define BITS_PER_DMAADDR BITS_PER_LONG - static inline dmaaddr_high_t pci_map_single_high(struct pci_dev *hwdev, struct page *page, int offset, size_t size, int dir) { - return pci_map_single(hwdev, page_address(page)+offset, size, dir); + return pci_map_single(hwdev, page_address(page) + offset, size, dir); } #endif + static inline dmaaddr_high_t ace_map_tx_skb(struct ace_private *ap, struct sk_buff *skb, struct sk_buff *tail, u32 idx) @@ -2544,6 +2557,7 @@ return addr; } + static inline void ace_load_tx_bd(struct tx_desc *desc, dmaaddr_high_t addr, u32 flagsize) { @@ -2551,22 +2565,14 @@ flagsize &= ~BD_FLG_COAL_NOW; #endif -#if !USE_HOST_TX_RING - /* Do not reload addrhi, when it is zero. */ -#if (BITS_PER_DMAADDR == 64) - writel(addr >> 32, &desc->addr.addrhi); -#endif - writel(addr & 0xffffffff, &desc->addr.addrlo); - writel(flagsize, &desc->flagsize); -#else -#if (BITS_PER_DMAADDR == 64) - desc->addr.addrhi = addr>>32; +#ifdef ACE_64BIT_PTR + desc->addr.addrhi = addr >> 32; #endif desc->addr.addrlo = addr; desc->flagsize = flagsize; -#endif } + static int ace_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct ace_private *ap = dev->priv; @@ -2574,13 +2580,11 @@ struct tx_desc *desc; u32 idx, flagsize; -#if (LINUX_VERSION_CODE < 0x02032b) /* * This only happens with pre-softnet, ie. 2.2.x kernels. */ - if (netif_queue_stopped(dev)) + if (early_stop_netif_stop_queue(dev)) return 1; -#endif restart: @@ -2631,7 +2635,10 @@ info = ap->skb->tx_skbuff + idx; desc = ap->tx_ring + idx; - phys = pci_map_single_high(ap->pdev, frag->page, frag->page_offset, frag->size, PCI_DMA_TODEVICE); + phys = pci_map_single_high(ap->pdev, frag->page, + frag->page_offset, + frag->size, + PCI_DMA_TODEVICE); flagsize = (frag->size << 16); if (skb->ip_summed == CHECKSUM_HW) @@ -2664,7 +2671,7 @@ ap->tx_prd = idx; ace_set_txprd(regs, ap, idx); - if (flagsize&BD_FLG_COAL_NOW) { + if (flagsize & BD_FLG_COAL_NOW) { netif_stop_queue(dev); /* @@ -2681,8 +2688,9 @@ return 0; overflow: - /* This race condition is unavoidable with lock-free drivers. - * We wake up queue _before_ tx_prd is advanced, so that we can + /* + * This race condition is unavoidable with lock-free drivers. + * We wake up the queue _before_ tx_prd is advanced, so that we can * enter hard_start_xmit too early, while tx ring still looks closed. * This happens ~1-4 times per 100000 packets, so that we can allow * to loop syncing to other CPU. Probably, we need an additional @@ -2700,6 +2708,7 @@ goto restart; } + static int ace_change_mtu(struct net_device *dev, int new_mtu) { struct ace_private *ap = dev->priv; @@ -2745,53 +2754,7 @@ #ifdef SIOCETHTOOL struct ethtool_cmd ecmd; u32 link, speed; -#endif - if (cmd == (SIOCDEVPRIVATE+0x01)) { - int p[5]; - p[0] = readl(®s->TuneTxCoalTicks); - p[1] = readl(®s->TuneMaxTxDesc); - p[2] = readl(®s->TuneRxCoalTicks); - p[3] = readl(®s->TuneMaxRxDesc); - p[4] = readl(®s->TxBufRat); - - if (copy_to_user(ifr->ifr_data, &p, sizeof(p))) - return -EFAULT; - return 0; - } else if (cmd == (SIOCDEVPRIVATE+0x02)) { - int p[5]; - if (copy_from_user(&p, ifr->ifr_data, sizeof(p))) - return -EFAULT; - writel(p[0], ®s->TuneTxCoalTicks); - writel(p[1], ®s->TuneMaxTxDesc); - writel(p[2], ®s->TuneRxCoalTicks); - writel(p[3], ®s->TuneMaxRxDesc); - /* This register not used after tigon is booted. - * We still allow to modify it, because it gives free - * variable, which could be used by modified firmware - * for another purposes. - */ - writel(p[4], ®s->TxBufRat); - return 0; - } else if (cmd == (SIOCDEVPRIVATE+0x03)) { - if (copy_to_user(ifr->ifr_data, &ap->info->s, sizeof(ap->info->s))) - return -EFAULT; - return 0; - } else if (cmd == (SIOCDEVPRIVATE+0x0e)) { - printk(KERN_NOTICE "%s: dumping debug info\n", dev->name); - printk(KERN_NOTICE "%s: tbusy %d, tx_csm %i tx_ret_csm %i, " - "tx_prd %i\n", dev->name, netif_queue_stopped(dev), - *ap->tx_csm, ap->tx_ret_csm, ap->tx_prd); - printk(KERN_NOTICE "%s: cur_rx %i, std_refill %li, " - "mini_rx %i, mini_refill %li\n", dev->name, - atomic_read(&ap->cur_rx_bufs), ap->std_refill_busy, - atomic_read(&ap->cur_mini_bufs), ap->mini_refill_busy); - printk(KERN_NOTICE "%s: CpuCtrl %08x\n", - dev->name, readl(®s->CpuCtrl)); - return 0; - } - -#ifdef SIOCETHTOOL if (cmd != SIOCETHTOOL) return -EOPNOTSUPP; if (copy_from_user(&ecmd, ifr->ifr_data, sizeof(ecmd))) @@ -2928,7 +2891,8 @@ regs = ((struct ace_private *)dev->priv)->regs; writel(da[0] << 8 | da[1], ®s->MacAddrHi); - writel((da[2] << 24) | (da[3] << 16) | (da[4] << 8) | da[5] , ®s->MacAddrLo); + writel((da[2] << 24) | (da[3] << 16) | (da[4] << 8) | da[5], + ®s->MacAddrLo); cmd.evt = C_SET_MAC_ADDR; cmd.code = 0; @@ -3003,11 +2967,12 @@ ap->stats.multicast = readl(&mac_stats->kept_mc); ap->stats.collisions = readl(&mac_stats->coll); - return(&ap->stats); + return &ap->stats; } -static void __init ace_copy(struct ace_regs *regs, void *src, u32 dest, int size) +static void __init ace_copy(struct ace_regs *regs, void *src, + u32 dest, int size) { unsigned long tdest; u32 *wsrc; @@ -3092,14 +3057,12 @@ */ ace_clear(regs, 0x2000, 0x80000-0x2000); if (ACE_IS_TIGON_I(ap)) { -#ifndef CONFIG_ACENIC_OMIT_TIGON_I ace_copy(regs, tigonFwText, tigonFwTextAddr, tigonFwTextLen); ace_copy(regs, tigonFwData, tigonFwDataAddr, tigonFwDataLen); ace_copy(regs, tigonFwRodata, tigonFwRodataAddr, tigonFwRodataLen); ace_clear(regs, tigonFwBssAddr, tigonFwBssLen); ace_clear(regs, tigonFwSbssAddr, tigonFwSbssLen); -#endif }else if (ap->version == 2) { ace_clear(regs, tigon2FwBssAddr, tigon2FwBssLen); ace_clear(regs, tigon2FwSbssAddr, tigon2FwSbssLen); --- bork/drivers/net/acenic.h Thu Mar 8 15:17:43 2001 +++ linux-2.4.2-zc-010228/drivers/net/acenic.h Thu Mar 8 16:31:24 2001 @@ -1,14 +1,9 @@ #ifndef _ACENIC_H_ #define _ACENIC_H_ -/* Use TX ring based in host memory. This avoids PIO stalling - * cpu, but results in more bus traffic per packet and increase - * of latency by up to 1 usec (sigh, alteon firmware is slooow). - * With normal traffic the profit overweights. - */ -#define USE_HOST_TX_RING 1 -/* Generate TX index update each time, when TX ring is closed. +/* + * Generate TX index update each time, when TX ring is closed. * Normally, this is not useful, because results in more dma (and irqs * without TX_COAL_INTS_ONLY). */ @@ -18,6 +13,7 @@ #define MAX_SKB_FRAGS 0 #endif + /* * Addressing: * @@ -442,6 +438,10 @@ /* * TX ring */ +#define TX_RING_ENTRIES 256 +#define TX_RING_SIZE (TX_RING_ENTRIES * sizeof(struct tx_desc)) +#define TX_RING_BASE 0x3800 + struct tx_desc{ aceaddr addr; u32 flagsize; @@ -465,14 +465,6 @@ u32 vlanres; }; -/* - * TX ring size can be 128, 256 or 512. - * (any other value will result in a crash.) - */ -#define TX_RING_ENTRIES 128 -#define TX_RING_SIZE (TX_RING_ENTRIES * sizeof(struct tx_desc)) -#define TX_RING_END 0x4000 -#define TX_RING_BASE (TX_RING_END - sizeof(struct tx_desc)*TX_RING_ENTRIES) #define RX_STD_RING_ENTRIES 512 #define RX_STD_RING_SIZE (RX_STD_RING_ENTRIES * sizeof(struct rx_desc)) @@ -679,9 +671,7 @@ volatile u32 *evt_prd, *rx_ret_prd, *tx_csm; -#ifdef USE_HOST_TX_RING dma_addr_t tx_ring_dma; /* 32/64 bit */ -#endif dma_addr_t rx_ring_base_dma; dma_addr_t evt_ring_dma; dma_addr_t evt_prd_dma, rx_ret_prd_dma, tx_csm_dma; @@ -702,16 +692,21 @@ struct net_device_stats stats; }; + #define TX_RESERVED MAX_SKB_FRAGS static inline int tx_space (u32 csm, u32 prd) { - return (csm - prd - 1) & (TX_RING_ENTRIES-1); + return (csm - prd - 1) & (TX_RING_ENTRIES - 1); } -#define tx_free(ap) tx_space((ap)->tx_ret_csm, (ap)->tx_prd) +#define tx_free(ap) tx_space((ap)->tx_ret_csm, (ap)->tx_prd) -#define tx_ring_full(csm, prd) (tx_space(csm, prd) <= TX_RESERVED) +#if MAX_SKB_FRAGS +#define tx_ring_full(csm, prd) (tx_space(csm, prd) <= TX_RESERVED) +#else +#define tx_ring_full 0 +#endif static inline void set_aceaddr(aceaddr *aa, dma_addr_t addr) @@ -778,9 +773,6 @@ static void ace_dump_trace(struct ace_private *ap); static void ace_set_multicast_list(struct net_device *dev); static int ace_change_mtu(struct net_device *dev, int new_mtu); -#ifdef SKB_RECYCLE -extern int ace_recycle(struct sk_buff *skb); -#endif static int ace_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd); static int ace_set_mac_addr(struct net_device *dev, void *p); static void ace_set_rxtx_parms(struct net_device *dev, int jumbo); --- bork/drivers/net/acenic_firmware.h Thu Mar 8 15:51:25 2001 +++ linux-2.4.2-zc-010228/drivers/net/acenic_firmware.h Thu Mar 8 16:34:19 2001 @@ -17,10 +17,11 @@ #define tigonFwSbssLen 0x38 #define tigonFwBssAddr 0x00015dd0 #define tigonFwBssLen 0x2080 -u32 tigonFwText[]; -u32 tigonFwData[]; -u32 tigonFwRodata[]; #ifndef CONFIG_ACENIC_OMIT_TIGON_I +#define tigonFwText 0 +#define tigonFwData 0 +#define tigonFwRodata 0 +#else /* Generated by genfw.c */ u32 tigonFwText[(MAX_TEXT_LEN/4) + 1] __initdata = { 0x10000003, From owner-netdev@oss.sgi.com Thu Mar 8 13:49:19 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 13:49:10 -0800 Received: from pizda.ninka.net ([216.101.162.242]:23936 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 13:49:04 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id NAA21375; Thu, 8 Mar 2001 13:48:52 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15015.65092.349145.143015@pizda.ninka.net> Date: Thu, 8 Mar 2001 13:48:52 -0800 (PST) To: Jes Sorensen Cc: netdev@oss.sgi.com Subject: Re: initial acenic ZC cleanup In-Reply-To: <200103082147.f28LlS301042@itanic.thepuffingroup.com> References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jes Sorensen writes: > Dave, please apply. I don't want to take a patch which removes the SIOCDEVPRIVATE debugging hacks and does not replace them with something which can provide equivalent information. Please, retain them, or provide new ones which are in a style you like. But it cannot be flatly removed. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Thu Mar 8 13:57:20 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 13:57:00 -0800 Received: from schmee.sfgoth.com ([63.205.85.133]:63494 "EHLO schmee.sfgoth.com") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 13:56:51 -0800 Received: (from mitch@localhost) by schmee.sfgoth.com (8.9.3/8.9.3) id NAA46722; Thu, 8 Mar 2001 13:56:49 -0800 (PST) Date: Thu, 8 Mar 2001 13:56:49 -0800 From: Mitchell Blank Jr To: Jes Sorensen Cc: netdev@oss.sgi.com Subject: Re: initial acenic ZC cleanup Message-ID: <20010308135649.C43038@sfgoth.com> References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <200103082147.f28LlS301042@itanic.thepuffingroup.com>; from jes@linuxcare.com on Thu, Mar 08, 2001 at 04:47:28PM -0500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jes Sorensen wrote: > +#define late_stop_netif_stop_queue(dev) {do{} while(0);} (Warning: minor nit pick ahead) What is the point of using the "do{}while(0)" construct and then surrounding it with braces? Remember, the entire point of using the "do { /*...*/ } while(0)" is to make macros safe for parsing in the case of "if (foo) macro(); else whatver();" since you need something that will consume the semicolon following the macro call. But putting an extra set of braces around it breaks this anyway, making the do/while pointless. -Mitch From owner-netdev@oss.sgi.com Thu Mar 8 13:59:49 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 13:59:29 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:10758 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 13:59:20 -0800 Received: from lxplus012.cern.ch (IDENT:root@lxplus012.cern.ch [137.138.161.115]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id WAA18388; Thu, 8 Mar 2001 22:59:09 +0100 (MET) Received: (from jes@localhost) by lxplus012.cern.ch (8.9.3/8.9.3) id WAA24185; Thu, 8 Mar 2001 22:59:08 +0100 To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: initial acenic ZC cleanup References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> <15015.65092.349145.143015@pizda.ninka.net> From: Jes Sorensen Date: 08 Mar 2001 22:59:08 +0100 In-Reply-To: "David S. Miller"'s message of "Thu, 8 Mar 2001 13:48:52 -0800 (PST)" Message-ID: Lines: 26 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "David" == David S Miller writes: David> Jes Sorensen writes: >> Dave, please apply. David> I don't want to take a patch which removes the SIOCDEVPRIVATE David> debugging hacks and does not replace them with something which David> can provide equivalent information. David> Please, retain them, or provide new ones which are in a style David> you like. But it cannot be flatly removed. The functionality has been there for ages via the ethtool interface, however it was never added to the official ethtool since you at the time seemed reluctant to move ethtool to include/linux from include/asm-sparc. The SIOCDEVPRIVATE ioctls that someone put in there are gross and undocumented. The only thing missing right now is the interface to read the current amount of free buffers available in the ring. Last it was brought up, Alexey mentioned that most of that stuff wasn't useful any longer anyway, it had only been used back when the ZC patch was originally implemented. Jes From owner-netdev@oss.sgi.com Thu Mar 8 14:16:20 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 14:16:10 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:13502 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Thu, 8 Mar 2001 14:15:47 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 260571F6A; Thu, 8 Mar 2001 17:15:35 -0500 (EST) Message-ID: <3AA80487.3C7E26A6@mandrakesoft.com> Date: Thu, 08 Mar 2001 17:15:35 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jes Sorensen Cc: "David S. Miller" , netdev@oss.sgi.com Subject: Re: initial acenic ZC cleanup References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> <15015.65092.349145.143015@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jes Sorensen wrote: > The functionality has been there for ages via the ethtool interface, > however it was never added to the official ethtool since you at the > time seemed reluctant to move ethtool to include/linux from > include/asm-sparc. ethtool is now public for all arches in 2.4.0. Does ethtool need to be updated for you? Also, due to the great base of existing code for MII phys, I would like to (if possible) write a single mii_phy.c that takes function pointers from drivers' mdio_{read,write} functions. mii_phy.c would implement ethtool and dev->if_port media interfaces -once-. -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Thu Mar 8 15:28:40 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 15:28:31 -0800 Received: from citadel.myri.com ([199.120.212.1]:56281 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 15:28:06 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id PAA27101; Thu, 8 Mar 2001 15:27:10 -0800 (PST) Date: Thu, 8 Mar 2001 15:27:10 -0800 (PST) From: Bob Felderman Message-Id: <200103082327.PAA27101@myri.com> To: feldy@myri.com, kuznet@ms2.inr.ac.ru Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: ak@muc.de, andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => From kuznet@ms2.inr.ac.ru Thu Mar 8 11:55:39 2001 => => Yes, statistics is sad. Each ~300th frame is lost, serious latency => problems. => => Is this without zerocopy? I still hope, it looks better with it. Yes - this is with linux-2.4.2 with your patch for fixing the skb panics. From owner-netdev@oss.sgi.com Thu Mar 8 15:40:51 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 15:40:42 -0800 Received: from ash25.internode.on.net ([203.16.214.248]:49678 "EHLO ash25.adelaide.on.net") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 15:40:26 -0800 Received: from w95vmware.ns.com ([150.101.232.161]) by internode.on.net (PMDF V6.0-24 #37831) with SMTP id <01K0ZLMTMH9Q009CEB@internode.on.net> for netdev@oss.sgi.com; Fri, 09 Mar 2001 09:49:02 +1030 Date: Fri, 09 Mar 2001 09:59:54 +1000 From: Richard Sharpe Subject: Patches to SIS900 driver for Linux 2.2.x for SIS630S support X-Sender: ns@203.16.214.248 To: netdev@oss.sgi.com Message-id: <3.0.6.32.20010309095954.01208550@203.16.214.248> MIME-version: 1.0 X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32) Content-type: multipart/mixed; boundary="=====================_984058194==_" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing --=====================_984058194==_ Content-Type: text/plain; charset="us-ascii" Hi, I recently got a PCChips M810MLR mobo with what it said was a SIS900 on board ... Turns out to be a bit different. Here is a patch with bits from the latest (2.4.x only driver it seems) ported back so that the driver for 2.2.x now understands that chip. Hope these are useful. --=====================_984058194==_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="sis900-2.2.x.patch" --- sis900.c Fri Mar 9 07:44:11 2001 +++ sis900-new.c Fri Mar 9 07:42:29 2001 @@ -57,7 +57,7 @@ static int multicast_filter_limit =3D 128; #define sis900_debug debug -static int sis900_debug =3D 0; +static int sis900_debug =3D 1; /* Time in jiffies before concluding the transmitter is hung. */ #define TX_TIMEOUT (4*HZ) @@ -82,6 +82,7 @@ static void sis900_read_mode(struct device *net_dev, int phy_addr, int= *speed, int *duplex); static void amd79c901_read_mode(struct device *net_dev, int phy_addr, int= *speed, int *duplex); +static void ics1893_read_mode(struct device *net_dev, int phy_addr, int= *speed, int *duplex); static struct mii_chip_info { const char * name; @@ -93,6 +94,7 @@ {"SiS 7014 Physical Layer Solution", 0x0016, 0xf830,sis900_read_mode}, {"AMD 79C901 10BASE-T PHY", 0x0000, 0x35b9, amd79c901_read_mode}, {"AMD 79C901 HomePNA PHY", 0x0000, 0x35c8, amd79c901_read_mode}, + {"ICS 1893 Integrated PHYciever", 0x0015, 0xf441, ics1893_read_mode}, {0,}, }; @@ -226,7 +228,7 @@ if (signature =3D=3D 0xffff || signature =3D=3D 0x0000) { printk (KERN_INFO "%s: Error EERPOM read %x\n", net_dev->name, signature); - return 0; + /*return 0;*/ } /* get MAC address from EEPROM */ @@ -299,7 +301,7 @@ return NULL; pci_read_config_byte(pci_dev, PCI_CLASS_REVISION, &revision); - if (revision =3D=3D SIS630E_REV) + if (revision =3D=3D SIS630E_REV || revision =3D=3D SIS630S_REV) ret =3D sis630e_get_mac_addr(pci_dev, net_dev); else if (revision =3D=3D SIS630EA1_REV) { ret =3D sis630e_get_mac_addr(pci_dev, net_dev); @@ -956,6 +958,38 @@ printk(KERN_INFO "%s: Media Link Off\n", net_dev->name); } } + +/* ICS1893 PHY use Quick Poll Detailed Status Register to get its status= */ +static void ics1893_read_mode(struct device *net_dev, int phy_addr, int= *speed, int *duplex) +{ + int i =3D 0; + u32 status; + + /* MII_QPDSTS is Latched, read twice in succession will reflect the= current state */ + for (i =3D 0; i < 2; i++) + status =3D mdio_read(net_dev, phy_addr, MII_QPDSTS); + + if (status & MII_STSICS_SPD) + *speed =3D HW_SPEED_100_MBPS; + else + *speed =3D HW_SPEED_10_MBPS; + + if (status & MII_STSICS_DPLX) + *duplex =3D FDX_CAPABLE_FULL_SELECTED; + else + *duplex =3D FDX_CAPABLE_HALF_SELECTED; + + if (status & MII_STSICS_LINKSTS) + printk(KERN_INFO "%s: Media Link On %s %s-duplex \n", + net_dev->name, + *speed =3D=3D HW_SPEED_100_MBPS ? + "100mbps" : "10mbps", + *duplex =3D=3D FDX_CAPABLE_FULL_SELECTED ? + "full" : "half"); + else + printk(KERN_INFO "%s: Media Link Off\n", net_dev->name); +} + static void sis900_tx_timeout(struct device *net_dev) { struct sis900_private *sis_priv =3D (struct sis900_private= *)net_dev->priv; --- sis900.h Fri Mar 9 07:44:11 2001 +++ sis900-new.h Fri Mar 9 07:42:29 2001 @@ -168,6 +168,12 @@ MII_MASK =3D 0x0013, MII_RESV =3D 0x0014 }; +/* mii registers specific to the ICS 1893 */ +enum ics_mii_registers { + MII_EXTCTRL =3D 0x0010, MII_QPDSTS =3D 0x0011, MII_10BTOP =3D= 0x0012, + MII_EXTCTRL2 =3D 0x0013 +}; + /* mii registers specific to AMD 79C901 */ enum amd_mii_registers { MII_STATUS_SUMMARY =3D 0x0018 @@ -212,13 +218,19 @@ MII_STSOUT_SPD =3D 0x0080, MII_STSOUT_DPLX =3D 0x0040 }; +enum mii_stsics_register_bits { + MII_STSICS_SPD =3D 0x8000, MII_STSICS_DPLX =3D= 0x4000, + MII_STSICS_LINKSTS =3D 0x0001 +}; + enum mii_stssum_register_bits { MII_STSSUM_LINK =3D 0x0008, MII_STSSUM_DPLX =3D 0x0004, MII_STSSUM_AUTO =3D 0x0002, MII_STSSUM_SPD =3D 0x0001 }; enum sis630_revision_id { - SIS630E_REV =3D 0x81, SIS630EA1_REV =3D 0x83 + SIS630E_REV =3D 0x81, SIS630EA1_REV =3D 0x83, + SIS630S_REV =3D 0x82 }; #define FDX_CAPABLE_DUPLEX_UNKNOWN 0 --=====================_984058194==_ Content-Type: text/plain; charset="us-ascii" Regards ------- Richard Sharpe, sharpe@ns.aus.com Samba (Team member, www.samba.org), Ethereal (Team member, www.ethereal.com) Contributing author, SAMS Teach Yourself Samba in 24 Hours Author, Special Edition, Using Samba --=====================_984058194==_-- From owner-netdev@oss.sgi.com Thu Mar 8 17:06:32 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 17:06:22 -0800 Received: from enst.enst.fr ([137.194.2.16]:4242 "HELO enst.enst.fr") by oss.sgi.com with SMTP id ; Thu, 8 Mar 2001 17:06:15 -0800 Received: from email.enst.fr (muse.enst.fr [137.194.2.33]) by enst.enst.fr (Postfix) with ESMTP id 0BC511C911 for ; Fri, 9 Mar 2001 02:06:13 +0100 (MET) Received: from FABRICE (droopy.maisel1.rezel.enst.fr [137.194.8.239]) by email.enst.fr (8.9.3/8.9.3) with ESMTP id CAA29018 for ; Fri, 9 Mar 2001 02:06:12 +0100 (MET) Date: Fri, 09 Mar 2001 02:05:21 +0100 From: Fabrice Gautier To: netdev@oss.sgi.com Subject: Command line parameters for Kernel Level autoconfig Message-Id: <20010309015802.6CF0.GAUTIER@email.enst.fr> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.00.01 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, The help topic for the kernel option "IP: kernel level autoconfiguration" (CONFIG_IP_PNP) says that it is possible to supply Config infos to the kernel on the command line at boot time. But I didn't find which parameters I had to pass. Is there a document in which they are described? (or if not a source file?) I've also read that Kernel Level Configuration may be deprecated in the future but i would like to use it will it is still here (or am i too late??) Thanks -- Fabrice Gautier From owner-netdev@oss.sgi.com Thu Mar 8 18:22:13 2001 Received: by oss.sgi.com id ; Thu, 8 Mar 2001 18:22:04 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:29455 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Thu, 8 Mar 2001 18:21:46 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id DAA25431; Fri, 9 Mar 2001 03:21:33 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id DAA06680; Fri, 9 Mar 2001 03:21:33 +0100 To: Jeff Garzik Cc: netdev@oss.sgi.com Subject: Re: initial acenic ZC cleanup References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> <15015.65092.349145.143015@pizda.ninka.net> <3AA80487.3C7E26A6@mandrakesoft.com> From: Jes Sorensen Date: 09 Mar 2001 03:21:33 +0100 In-Reply-To: Jeff Garzik's message of "Thu, 08 Mar 2001 17:15:35 -0500" Message-ID: Lines: 18 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Jeff" == Jeff Garzik writes: Jeff> Jes Sorensen wrote: >> The functionality has been there for ages via the ethtool >> interface, however it was never added to the official ethtool since >> you at the time seemed reluctant to move ethtool to include/linux >> from include/asm-sparc. Jeff> ethtool is now public for all arches in 2.4.0. Jeff> Does ethtool need to be updated for you? A little, it needs to be taught about the tuning parameters for the AceNIC. We probably should put a list of NIC specific tuning items in there so it knows how to grog the different cards' different parameters. Jes From owner-netdev@oss.sgi.com Fri Mar 9 05:34:40 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 05:34:31 -0800 Received: from laurin.munich.netsurf.de ([194.64.166.1]:33408 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 05:34:11 -0800 Received: from fred.muc.de (noidentity@ns1102.munich.netsurf.de [195.180.235.102]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id OAA09168; Fri, 9 Mar 2001 14:29:12 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id A026DE3C87; Fri, 9 Mar 2001 14:34:11 +0100 (CET) Date: Fri, 9 Mar 2001 14:34:11 +0100 From: Andi Kleen To: Bob Felderman Cc: kuznet@ms2.inr.ac.ru, Andi Kleen , andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Message-ID: <20010309143411.A2181@fred.local> References: <200103081742.UAA25910@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from feldy@myri.com on Thu, Mar 08, 2001 at 07:56:43PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, Mar 08, 2001 at 07:56:43PM +0100, Bob Felderman wrote: > I think the culprit here is the packet receive errors from > UDP. These are socket overflows I think. I'm trying to > track it down. When I used DaveM's zero-copy patches on > a linux-2.4.0 kernel, most, if not all, of these > packet receive errors went away. So does it help when you increase the socket receive buffer of the receiver ? (via /proc/sys/net/core/rmem_{default,max}) -Andi > From owner-netdev@oss.sgi.com Fri Mar 9 05:35:40 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 05:35:30 -0800 Received: from imo-m01.mx.aol.com ([64.12.136.4]:25044 "EHLO imo-m01.mx.aol.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 05:35:24 -0800 Received: from Ganeshn@aol.com by imo-m01.mx.aol.com (mail_out_v29.5.) id 4.7b.114a2ee2 (16790) for ; Fri, 9 Mar 2001 08:35:09 -0500 (EST) From: Ganeshn@aol.com Message-ID: <7b.114a2ee2.27da360c@aol.com> Date: Fri, 9 Mar 2001 08:35:08 EST Subject: Problems in Linux 2.4 To: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: AOL 5.0 for Windows sub 129 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, I am Ganesan Narayanasamy from Newyork. I have been using linux for the past couple of months and trying to run many socket programs. I am having a problem where I am opening an unix domain socket and bound to the socket and listening for the connections ( listen(socket fd,5) where 5 is the max queu size ) and accept from the client. Actually When I tested this program , listen takes more than 5 clients in the queue. If more than 5 clients are waiting then i should get connection refused on the client side . Could you tell why it is so ? Also I see two same i-node entries on the same directories Under :/proc/sys/net/unix total 0 dr-xr-xr-x 2 root root 0 Mar 9 08:29 . dr-xr-xr-x 7 root root 0 Mar 9 07:35 .. -rw------- 1 root root 0 Mar 9 08:30 max_dgram_qlen -rw------- 1 root root 0 Mar 9 08:30 max_dgram_qlen Third one is I don't see the following entry and also net.ipv4.tcp_syncookies = 1 fails /proc/sys/net/ipv4/tcp_syncookies Expecting your mail soon Thanks Ganesh ganeshn@aol.com From owner-netdev@oss.sgi.com Fri Mar 9 06:17:20 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 06:17:11 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:37512 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 06:16:49 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id JAA06994; Fri, 9 Mar 2001 09:13:17 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Fri, 9 Mar 2001 09:13:16 -0500 (EST) From: jamal To: Andi Kleen cc: Bob Felderman , , , , , Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <20010309143411.A2181@fred.local> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Alexey's assesment that the culprit might be he might be loosing some of those frags in the backlog What kind of hardware is this thing running on? 1.5 G seems a lot on a regular PC. very Large MTUs? Bob, One of the things that could be done is have the driver stop sending packets after the toplayer gets congested (return is NET_RX_CN_HIGH). linux/include/linux/netdevice.h has the NET_RX_* which are returned by netif_rx() cheers, jamal On Fri, 9 Mar 2001, Andi Kleen wrote: > On Thu, Mar 08, 2001 at 07:56:43PM +0100, Bob Felderman wrote: > > I think the culprit here is the packet receive errors from > > UDP. These are socket overflows I think. I'm trying to > > track it down. When I used DaveM's zero-copy patches on > > a linux-2.4.0 kernel, most, if not all, of these > > packet receive errors went away. > > So does it help when you increase the socket receive buffer of the receiver ? > > (via /proc/sys/net/core/rmem_{default,max}) > > > -Andi > > > From owner-netdev@oss.sgi.com Fri Mar 9 10:08:53 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 10:08:42 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:20755 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 9 Mar 2001 10:08:30 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA25902; Fri, 9 Mar 2001 21:01:28 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103091801.VAA25902@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: hadi@cyberus.ca (jamal) Date: Fri, 9 Mar 2001 21:01:28 +0300 (MSK) Cc: ak@muc.de, feldy@myri.com, andrewm@uow.edu.au, davem@redhat.com, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: from "jamal" at Mar 9, 1 09:13:16 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 457 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Bob, One of the things that could be done is have the driver stop sending > packets after the toplayer gets congested (return is NET_RX_CN_HIGH). > linux/include/linux/netdevice.h has the NET_RX_* which are returned > by netif_rx() Bulk jumbo transfers at giga rates is one of the cases, when the engine should and can be tuned to _never_ slowdown or to drop in backlog. We do not make any work at BHs, data is not accessed. It must fly. Alexey From owner-netdev@oss.sgi.com Fri Mar 9 11:02:13 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 11:02:03 -0800 Received: from citadel.myri.com ([199.120.212.1]:42727 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 11:01:46 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id LAA04730; Fri, 9 Mar 2001 11:00:47 -0800 (PST) Date: Fri, 9 Mar 2001 11:00:47 -0800 (PST) From: Bob Felderman Message-Id: <200103091900.LAA04730@myri.com> To: ak@muc.de, feldy@myri.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: andrewm@uow.edu.au, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => On Thu, Mar 08, 2001 at 07:56:43PM +0100, Bob Felderman wrote: => > I think the culprit here is the packet receive errors from => > UDP. These are socket overflows I think. I'm trying to => > track it down. When I used DaveM's zero-copy patches on => > a linux-2.4.0 kernel, most, if not all, of these => > packet receive errors went away. => => So does it help when you increase the socket receive buffer of the receiver ? => => (via /proc/sys/net/core/rmem_{default,max}) I don't see a big change here either. Once I'm up to 512KB-1meg or so, I don't see much improvement. From owner-netdev@oss.sgi.com Fri Mar 9 11:14:22 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 11:14:13 -0800 Received: from citadel.myri.com ([199.120.212.1]:58855 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 11:13:53 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id LAA04899; Fri, 9 Mar 2001 11:12:48 -0800 (PST) Date: Fri, 9 Mar 2001 11:12:48 -0800 (PST) From: Bob Felderman Message-Id: <200103091912.LAA04899@myri.com> To: ak@muc.de, hadi@cyberus.ca Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: andrewm@uow.edu.au, davem@redhat.com, feldy@myri.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => Alexey's assesment that the culprit might be he might be loosing some => of those frags in the backlog => What kind of hardware is this thing running on? 1.5 G seems a lot on => a regular PC. very Large MTUs? 9KB MTU Supermicro 370DLE motherboard P-III 666MHz using a pci 64bit 66mhz slot with a Myrinet card. This board uses the serverworks chipset for PCI and we have measured 400-500MegaBytes/sec (3.2 - 4.0 Gigabit/sec) transfer rates across the bus. Our link speed is 2Gigabit/sec. => Bob, One of the things that could be done is have the driver stop sending => packets after the toplayer gets congested (return is NET_RX_CN_HIGH). => linux/include/linux/netdevice.h has the NET_RX_* which are returned => by netif_rx() I'll look at it, but DaveM's zero-copy patches also solved the packet drop for me. I have not tried yet to put that patch on my 2.4.2 fixed kernels. From owner-netdev@oss.sgi.com Fri Mar 9 11:37:52 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 11:37:32 -0800 Received: from shell.cyberus.ca ([209.195.95.7]:57992 "EHLO shell.cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 11:37:17 -0800 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id OAA08500; Fri, 9 Mar 2001 14:33:53 -0500 (EST) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Fri, 9 Mar 2001 14:33:53 -0500 (EST) From: jamal To: Bob Felderman cc: , , , , , Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <200103091912.LAA04899@myri.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 9 Mar 2001, Bob Felderman wrote: > I'll look at it, but DaveM's zero-copy patches also solved > the packet drop for me. I have not tried yet to put that > patch on my 2.4.2 fixed kernels. > Unless your NIC knows how to use those patches, this sounds strange. cheers, jamal From owner-netdev@oss.sgi.com Fri Mar 9 11:57:22 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 11:57:02 -0800 Received: from citadel.myri.com ([199.120.212.1]:52200 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 11:56:40 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id LAA05561; Fri, 9 Mar 2001 11:55:20 -0800 (PST) Date: Fri, 9 Mar 2001 11:55:20 -0800 (PST) From: Bob Felderman Message-Id: <200103091955.LAA05561@myri.com> To: feldy@myri.com, hadi@cyberus.ca Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: ak@muc.de, andrewm@uow.edu.au, davem@redhat.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => On Fri, 9 Mar 2001, Bob Felderman wrote: => => > I'll look at it, but DaveM's zero-copy patches also solved => > the packet drop for me. I have not tried yet to put that => > patch on my 2.4.2 fixed kernels. => > => => Unless your NIC knows how to use those patches, this sounds strange. => => cheers, => jamal Dave had said in some email that NFS re-assembly might be improved. #Finally, regardless of networking card, there should be a measurable #performance boost for NFS clients with this patch due to the delayed #fragment coalescing. KNFSD does not take full advantage of this #facility yet. From owner-netdev@oss.sgi.com Fri Mar 9 12:03:42 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 12:03:23 -0800 Received: from pizda.ninka.net ([216.101.162.242]:7808 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 12:03:08 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id MAA01086; Fri, 9 Mar 2001 12:02:52 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15017.14060.373448.198308@pizda.ninka.net> Date: Fri, 9 Mar 2001 12:02:52 -0800 (PST) To: jamal Cc: Bob Felderman , , , , , Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: References: <200103091912.LAA04899@myri.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal writes: > Unless your NIC knows how to use those patches, this sounds strange. He's using fragmented UDP packets, so the lazy receive defrag stuff (ie. 2 copies turn into 1) work wonders. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Mar 9 12:43:13 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 12:42:53 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:48900 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 9 Mar 2001 12:42:48 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA27248; Fri, 9 Mar 2001 23:35:29 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103092035.XAA27248@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: davem@redhat.com (David S. Miller) Date: Fri, 9 Mar 2001 23:35:29 +0300 (MSK) Cc: hadi@cyberus.ca, feldy@myri.com, ak@muc.de, andrewm@uow.edu.au, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <15017.14060.373448.198308@pizda.ninka.net> from "David S. Miller" at Mar 9, 1 12:02:52 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 243 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > He's using fragmented UDP packets, so the lazy receive defrag stuff > (ie. 2 copies turn into 1) work wonders. _No_ copy on softirq. It is more important. It is the reason why all the tricks with backlog are not essential. Alexey From owner-netdev@oss.sgi.com Fri Mar 9 12:47:13 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 12:46:54 -0800 Received: from citadel.myri.com ([199.120.212.1]:29417 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 12:46:46 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id MAA06127; Fri, 9 Mar 2001 12:46:42 -0800 (PST) Received: from localhost by orion.myri.com.myri.com (4.1/SMI-4.1) id AA11202; Fri, 9 Mar 01 12:48:26 PST Date: Fri, 9 Mar 2001 12:48:26 -0800 (PST) From: Bob Felderman To: kuznet@ms2.inr.ac.ru Cc: "David S. Miller" , hadi@cyberus.ca, ak@muc.de, andrewm@uow.edu.au, netdev@oss.sgi.com, Bob Felderman , pp@evil.netppl.fi Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: <200103092035.XAA27248@ms2.inr.ac.ru> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 9 Mar 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > He's using fragmented UDP packets, so the lazy receive defrag stuff > > (ie. 2 copies turn into 1) work wonders. > > _No_ copy on softirq. It is more important. > > It is the reason why all the tricks with backlog are not essential. Can you explain this a bit more? Are you saying that a softirq will reduce the number of copies? Should the driver be calling softirq to deliver the packets using netif_rx? From owner-netdev@oss.sgi.com Fri Mar 9 13:25:33 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 13:25:13 -0800 Received: from pizda.ninka.net ([216.101.162.242]:6528 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 13:24:41 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id NAA01212; Fri, 9 Mar 2001 13:23:27 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15017.18895.120319.875228@pizda.ninka.net> Date: Fri, 9 Mar 2001 13:23:27 -0800 (PST) To: Bob Felderman Cc: kuznet@ms2.inr.ac.ru, hadi@cyberus.ca, ak@muc.de, andrewm@uow.edu.au, netdev@oss.sgi.com, pp@evil.netppl.fi Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-Reply-To: References: <200103092035.XAA27248@ms2.inr.ac.ru> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Bob Felderman writes: > On Fri, 9 Mar 2001 kuznet@ms2.inr.ac.ru wrote: > > _No_ copy on softirq. It is more important. > > > > It is the reason why all the tricks with backlog are not essential. > > Can you explain this a bit more? Are you saying that a softirq > will reduce the number of copies? Should the driver be calling > softirq to deliver the packets using netif_rx? Network input packet processing (ie. whet the kernel first processes input packets after you give them to netif_rx()) occur in softirq context. This is what Alexey is talking above. Previously, IP input when seeing the final fragment of a packet, would allocate a new linear SKB to fit the whole thing, then copy each fragment into the new buffer. This all occured in softirq context. Now, a link list of the frags is just held onto, and later in _user_ context the individual frags are copied by hand into user space or wherever appropriate. Alexey is saying that moving the copy from network soft interrupt processing into the user context is what is helping moreso than the reduction in number of copies performed. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Mar 9 13:52:32 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 13:52:12 -0800 Received: from citadel.myri.com ([199.120.212.1]:58602 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 13:52:02 -0800 Received: from frisbee.myri.com (frisbee.myri.com [199.120.212.209]) by myri.com (8.9.3+Sun/8.9.1) with ESMTP id NAA06922; Fri, 9 Mar 2001 13:51:58 -0800 (PST) Received: (from feldy@localhost) by frisbee.myri.com (8.9.3/8.9.1) id NAA02593; Fri, 9 Mar 2001 13:51:56 -0800 Date: Fri, 9 Mar 2001 13:51:56 -0800 From: Bob Felderman Message-Id: <200103092151.NAA02593@frisbee.myri.com> To: davem@redhat.com, feldy@myri.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: ak@muc.de, andrewm@uow.edu.au, hadi@cyberus.ca, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing OK, I applied DaveM's patch to my working 2.4.2 kernel. When the sender is a dual-processor 666MHz machine and the receiver is a dual-processor 866MHz machine I see only a few drops Udp: 7658021 packets received 0 packets to unknown port received. 2948 packet receive errors 18516945 packets sent and I get great udp bandwidth. I can probably get these numbers (or something close) using single processors The only /proc defaults I changed were echo "1048576" > /proc/sys/net/core/rmem_max echo "1048576" > /proc/sys/net/core/wmem_max echo "1048576" > /proc/sys/net/core/wmem_default echo "1048576" > /proc/sys/net/core/rmem_default echo "1048576" > /proc/sys/net/core/optmem_max TCP still doesn't want to get above 1Gigabit/sec and is usually much lower. TCP STREAM TEST to rcc-t Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 1048576 1048576 4194304 10.00 688.31 1048576 1048576 2097152 10.01 704.52 1048576 1048576 1048576 10.01 691.78 1048576 1048576 524288 10.00 704.24 1048576 1048576 262144 10.00 687.31 1048576 1048576 131072 10.00 682.91 1048576 1048576 65536 10.00 658.33 1048576 1048576 32768 10.01 668.97 rcc 23% udp_range rcc2-t UDP UNIDIRECTIONAL SEND TEST to rcc2-t Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 2097152 32768 9.65 60233 0 1635.59 2097152 9.65 59695 1620.98 2097152 16384 9.00 116904 0 1703.44 2097152 9.00 114426 1667.33 2097152 8192 10.00 215262 0 1411.22 2097152 10.00 215262 1411.22 2097152 4096 6.81 322213 0 1550.98 2097152 6.81 322189 1550.86 2097152 2048 5.54 506954 0 1499.05 2097152 5.54 506033 1496.33 2097152 1024 10.00 606952 0 497.34 2097152 10.00 606601 497.06 2097152 512 9.58 809222 0 345.97 2097152 9.58 706001 301.84 2097152 256 13.83 707528 0 104.78 2097152 13.83 707528 104.78 2097152 128 13.82 1336720 0 99.07 2097152 13.82 712852 52.83 2097152 64 8.99 730505 0 41.58 2097152 8.99 730505 41.58 2097152 32 11.00 1351319 0 31.45 2097152 11.00 740745 17.24 2097152 16 15.47 900296 0 7.45 2097152 15.47 741616 6.14 2097152 8 12.17 1317200 0 6.92 2097152 12.17 755124 3.97 2097152 4 9.00 738768 0 2.63 2097152 9.00 738768 2.63 From owner-netdev@oss.sgi.com Fri Mar 9 14:25:22 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 14:25:02 -0800 Received: from citadel.myri.com ([199.120.212.1]:41451 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 14:24:38 -0800 Received: from frisbee.myri.com (frisbee.myri.com [199.120.212.209]) by myri.com (8.9.3+Sun/8.9.1) with ESMTP id OAA07242; Fri, 9 Mar 2001 14:24:33 -0800 (PST) Received: (from feldy@localhost) by frisbee.myri.com (8.9.3/8.9.1) id OAA02690; Fri, 9 Mar 2001 14:24:32 -0800 Date: Fri, 9 Mar 2001 14:24:32 -0800 From: Bob Felderman Message-Id: <200103092224.OAA02690@frisbee.myri.com> To: davem@redhat.com, feldy@myri.com, feldy@myri.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: ak@muc.de, andrewm@uow.edu.au, hadi@cyberus.ca, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Darn - I got an oops running Netpipe (a tcp test) This is with the Alexey patch to 2.4.2 to fix the ip_fragment.c oopses and the DaveM 2.4.2 zerocopy patches. I'm going to back out the davem patch. Bob ksymoops 0.7c on i686 2.4.2-zerocopy. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.2-zerocopy/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Warning (compare_maps): mismatch on symbol __module_author , gm says d088df40, sbin/gm says d088fbe0. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_description , gm says d088df5f, sbin/gm says d088fbff. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_parm_gm_net_copy_threshold , gm says d088dfc5, sbin/gm says d088fc65. Ignoring sbin/gm entry Warning (compare_maps): mismatch on symbol __module_parm_gmip_hw_checksum , gm says d088dfad, sbin/gm says d088fc4d. Ignoring sbin/gm entry Unable to handle kernel NULL pointer dereference at virtual address 0000000d c0211bc4 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: 00000000 ebx: c8591308 ecx: 00000000 edx: 00000283 esi: 29655d0d edi: 2965596b ebp: c8591308 esp: c8679e00 ds: 0018 es: 0018 ss: 0018 Process NPtcp.intel_lin (pid: 1064, stackpage=c8679000) Stack: c8591308 29655d0d 2965596b c809bc20 c9f3bacc c02168b7 c8591160 c9f3baa0 00000001 c0211f3e c8591160 c809bc20 c8591308 2965596b 29655d0d c9f3bd20 c8591234 c8591160 c8678000 c8591160 c8678000 c8591308 c0211fda c8591160 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] Code: f6 40 0d 03 75 40 8b 75 78 8b 1d d4 5d 2b c0 85 db 7f 0a f7 >>EIP; c0211bc4 <===== Trace; c02168b7 Trace; c0211f3e Trace; c0211fda Trace; c021217f Trace; c0212d37 Trace; c021a0c4 Trace; c020346f Trace; c02037da Trace; c01fae0e Trace; c01193aa Trace; c01090c1 Trace; c010002b Code; c0211bc4 00000000 <_EIP>: Code; c0211bc4 <===== 0: f6 40 0d 03 testb $0x3,0xd(%eax) <===== Code; c0211bc8 4: 75 40 jne 46 <_EIP+0x46> c0211c0a Code; c0211bca 6: 8b 75 78 mov 0x78(%ebp),%esi Code; c0211bcd 9: 8b 1d d4 5d 2b c0 mov 0xc02b5dd4,%ebx Code; c0211bd3 f: 85 db test %ebx,%ebx Code; c0211bd5 11: 7f 0a jg 1d <_EIP+0x1d> c0211be1 Code; c0211bd7 13: f7 00 00 00 00 00 testl $0x0,(%eax) Kernel panic: Aiee, killing interrupt handler! 5 warnings issued. Results may not be reliable. From owner-netdev@oss.sgi.com Fri Mar 9 15:32:52 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 15:32:43 -0800 Received: from citadel.myri.com ([199.120.212.1]:64492 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 15:32:31 -0800 Received: from frisbee.myri.com (frisbee.myri.com [199.120.212.209]) by myri.com (8.9.3+Sun/8.9.1) with ESMTP id PAA08005; Fri, 9 Mar 2001 15:30:27 -0800 (PST) Received: (from feldy@localhost) by frisbee.myri.com (8.9.3/8.9.1) id PAA02820; Fri, 9 Mar 2001 15:30:25 -0800 Date: Fri, 9 Mar 2001 15:30:25 -0800 From: Bob Felderman Message-Id: <200103092330.PAA02820@frisbee.myri.com> To: davem@redhat.com, feldy@myri.com, feldy@myri.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: ak@muc.de, andrewm@uow.edu.au, hadi@cyberus.ca, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => => I'm going to back out the davem patch. => When I removed the patch, I can't make tcp tests panic the machine. -rw-rw-r-- 1 feldy arpa 361878 Mar 9 12:24 zerocopy-2.4.2-1.diff From owner-netdev@oss.sgi.com Fri Mar 9 15:56:42 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 15:56:33 -0800 Received: from netcore.fi ([193.94.160.1]:29458 "EHLO netcore.fi") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 15:56:30 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f29NuJq25920; Sat, 10 Mar 2001 01:56:20 +0200 Date: Sat, 10 Mar 2001 01:56:19 +0200 (EET) From: Pekka Savola To: cc: Subject: Re: weird implementation of ipip and sit tunnels In-Reply-To: <200103041833.VAA17757@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, 4 Mar 2001 kuznet@ms2.inr.ac.ru wrote: > > Or didn't > > anybody really need this so much as to spend time for doing it "right" ? > > The time has been spent several years ago. Keyword is "ip tunnel". It seems you can only add point-to-point tunnels. I might want to add equivalents of /30 too. You can do this (in the limited fashion, as described earlier) with ifconfig. Is there a reason why only P-t-P tunnels can be used with 'ip tunnel' ? -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Sat Mar 10 00:14:05 2001 Received: by oss.sgi.com id ; Sat, 10 Mar 2001 00:13:56 -0800 Received: from mail2.radar.com.ar ([200.45.87.67]:61199 "EHLO mail2.radar.com.ar") by oss.sgi.com with ESMTP id ; Sat, 10 Mar 2001 00:13:36 -0800 Received: from mail2.arnet.com.ar ([200.45.0.5]) by mail2.radar.com.ar with Microsoft SMTPSVC(5.5.1877.507.50); Sat, 10 Mar 2001 04:57:49 -0300 Received: from mail pickup service by mail2.arnet.com.ar with Microsoft SMTPSVC; Sat, 10 Mar 2001 05:04:24 -0300 Received: from recife.arnet.com.ar ([200.45.0.70]) by mail2.arnet.com.ar with Microsoft SMTPSVC(5.5.1877.357.35); Fri, 9 Mar 2001 16:14:27 -0300 Received: (qmail 5410 invoked from network); 9 Mar 2001 19:10:05 -0000 Received: from oss.sgi.com (216.32.174.190) by recife.arnet.com.ar with SMTP; 9 Mar 2001 19:10:05 -0000 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 11:14:13 -0800 Received: from citadel.myri.com ([199.120.212.1]:58855 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 11:13:53 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id LAA04899; Fri, 9 Mar 2001 11:12:48 -0800 (PST) Date: Fri, 9 Mar 2001 11:12:48 -0800 (PST) From: Bob Felderman Message-Id: <200103091912.LAA04899@myri.com> To: ak@muc.de, hadi@cyberus.ca Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: andrewm@uow.edu.au, davem@redhat.com, feldy@myri.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing => Alexey's assesment that the culprit might be he might be loosing some => of those frags in the backlog => What kind of hardware is this thing running on? 1.5 G seems a lot on => a regular PC. very Large MTUs? 9KB MTU Supermicro 370DLE motherboard P-III 666MHz using a pci 64bit 66mhz slot with a Myrinet card. This board uses the serverworks chipset for PCI and we have measured 400-500MegaBytes/sec (3.2 - 4.0 Gigabit/sec) transfer rates across the bus. Our link speed is 2Gigabit/sec. => Bob, One of the things that could be done is have the driver stop sending => packets after the toplayer gets congested (return is NET_RX_CN_HIGH). => linux/include/linux/netdevice.h has the NET_RX_* which are returned => by netif_rx() I'll look at it, but DaveM's zero-copy patches also solved the packet drop for me. I have not tried yet to put that patch on my 2.4.2 fixed kernels. From owner-netdev@oss.sgi.com Sat Mar 10 05:24:19 2001 Received: by oss.sgi.com id ; Sat, 10 Mar 2001 05:23:59 -0800 Received: from [63.84.169.221] ([63.84.169.221]:56337 "EHLO topgraphx.com") by oss.sgi.com with ESMTP id ; Sat, 10 Mar 2001 05:23:39 -0800 Received: from wapiti [193.248.251.93] by topgraphx.com with ESMTP (SMTPD32-6.05) id AACC40503AC; Sat, 10 Mar 2001 07:23:24 -0600 From: "Bernard MAUDRY" To: "David S. Miller" Date: Thu, 8 Mar 2001 18:34:28 +0100 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: PROBLEM: a local TCP socket close does not trigger a poll on the other end CC: netdev@oss.sgi.com Message-ID: <3AA7D0B4.8123.10893F11@localhost> In-reply-to: <15008.52456.168979.593594@pizda.ninka.net> References: <3AA0DA38.23766.573498DB@localhost> X-mailer: Pegasus Mail for Win32 (v3.12c) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1413 Lines: 37 > Set the FDs[0].events to some value other than zero, you are > telling the kernel you are interested in "no events". Read the > poll() man page for details. The man pages state : The field events is an input parameter, a bitmask specifying the events the application is interested in. The field revents is an output parameter, filled by the kernel with the events that actually occurred, either of the type requested, or of one of the types POLLERR or POLLHUP or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the revents field whenever the corresponding condition is true.) So putting 0 in the events fields is a normal value when the application is only insterested in the error cases. Anyway, I tried to put POLLPRI in the events field to avoid potential trouble (if any) with the 0 value. The behavior is the same: poll does not indicate the hangup of the other end. Best regards. Bernard. +--------------------------------------+ | Bernard MAUDRY | | Top Graph'X Customer Support | | 10, allee de la mare Jacob | | 91290 La Norville | | FRANCE | | Tel: (33) 1 69 26 97 88 | | Fax: (33) 1 69 26 97 89 | | email: support@topgraphx.com | +--------------------------------------+ From owner-netdev@oss.sgi.com Sat Mar 10 15:27:47 2001 Received: by oss.sgi.com id ; Sat, 10 Mar 2001 15:27:28 -0800 Received: from host000012.arnet.net.ar ([200.45.0.12]:8454 "HELO smtp1.arnet.com.ar") by oss.sgi.com with SMTP id ; Sat, 10 Mar 2001 15:27:01 -0800 Received: (qmail 27070 invoked from network); 10 Mar 2001 22:41:55 -0000 Received: ThePolice Version 0.02 by GCM Received: from host000004.arnet.net.ar (HELO mail1.arnet.com.ar) (200.45.0.4) by host000012.arnet.net.ar with SMTP; 10 Mar 2001 22:41:55 -0000 Received: from mail pickup service by mail1.arnet.com.ar with Microsoft SMTPSVC; Sat, 10 Mar 2001 19:39:16 -0300 Received: from recife.arnet.com.ar ([200.45.0.70]) by mail1.arnet.com.ar with Microsoft SMTPSVC(5.5.1877.357.35); Sat, 10 Mar 2001 10:24:25 -0300 Received: (qmail 18171 invoked from network); 10 Mar 2001 13:20:02 -0000 Received: from oss.sgi.com (216.32.174.190) by recife.arnet.com.ar with SMTP; 10 Mar 2001 13:20:02 -0000 Received: by oss.sgi.com id ; Sat, 10 Mar 2001 05:23:59 -0800 Received: from [63.84.169.221] ([63.84.169.221]:56337 "EHLO topgraphx.com") by oss.sgi.com with ESMTP id ; Sat, 10 Mar 2001 05:23:39 -0800 Received: from wapiti [193.248.251.93] by topgraphx.com with ESMTP (SMTPD32-6.05) id AACC40503AC; Sat, 10 Mar 2001 07:23:24 -0600 From: "Bernard MAUDRY" To: "David S. Miller" Date: Thu, 8 Mar 2001 18:34:28 +0100 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: PROBLEM: a local TCP socket close does not trigger a poll on the other end CC: netdev@oss.sgi.com Message-ID: <3AA7D0B4.8123.10893F11@localhost> In-reply-to: <15008.52456.168979.593594@pizda.ninka.net> References: <3AA0DA38.23766.573498DB@localhost> X-mailer: Pegasus Mail for Win32 (v3.12c) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1413 Lines: 37 > Set the FDs[0].events to some value other than zero, you are > telling the kernel you are interested in "no events". Read the > poll() man page for details. The man pages state : The field events is an input parameter, a bitmask specifying the events the application is interested in. The field revents is an output parameter, filled by the kernel with the events that actually occurred, either of the type requested, or of one of the types POLLERR or POLLHUP or POLLNVAL. (These three bits are meaningless in the events field, and will be set in the revents field whenever the corresponding condition is true.) So putting 0 in the events fields is a normal value when the application is only insterested in the error cases. Anyway, I tried to put POLLPRI in the events field to avoid potential trouble (if any) with the 0 value. The behavior is the same: poll does not indicate the hangup of the other end. Best regards. Bernard. +--------------------------------------+ | Bernard MAUDRY | | Top Graph'X Customer Support | | 10, allee de la mare Jacob | | 91290 La Norville | | FRANCE | | Tel: (33) 1 69 26 97 88 | | Fax: (33) 1 69 26 97 89 | | email: support@topgraphx.com | +--------------------------------------+ From owner-netdev@oss.sgi.com Sat Mar 10 15:54:38 2001 Received: by oss.sgi.com id ; Sat, 10 Mar 2001 15:54:17 -0800 Received: from ppp0.ocs.com.au ([203.34.97.3]:26896 "HELO mail.ocs.com.au") by oss.sgi.com with SMTP id ; Sat, 10 Mar 2001 15:54:05 -0800 Received: (qmail 11551 invoked from network); 10 Mar 2001 23:54:00 -0000 Received: from ocs3.ocs-net (192.168.255.3) by mail.ocs.com.au with SMTP; 10 Mar 2001 23:54:00 -0000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Bob Felderman cc: netdev@oss.sgi.com Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] In-reply-to: Your message of "Fri, 09 Mar 2001 14:24:32 -0800." <200103092224.OAA02690@frisbee.myri.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 11 Mar 2001 10:53:59 +1100 Message-ID: <13439.984268439@ocs3.ocs-net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 494 Lines: 10 On Fri, 9 Mar 2001 14:24:32 -0800, Bob Felderman wrote: >ksymoops 0.7c on i686 2.4.2-zerocopy. Options used >Warning (compare_maps): mismatch on symbol __module_author , gm says d088df40, sbin/gm says d088fbe0. Ignoring sbin/gm entry You are running an old ksymoops and an old modutils, you need much more recent versions for a 2.4 kernel. See Documentation/Changes - I wonder what else you are using that is out of date and might be causing problems (wrong compiler?). From owner-netdev@oss.sgi.com Sat Mar 10 17:10:28 2001 Received: by oss.sgi.com id ; Sat, 10 Mar 2001 17:10:09 -0800 Received: from SLASH.REM.CMU.EDU ([128.2.87.44]:51461 "EHLO SLASH.REM.CMU.EDU") by oss.sgi.com with ESMTP id ; Sat, 10 Mar 2001 17:09:49 -0800 Received: from localhost (mukesh@localhost) by SLASH.REM.CMU.EDU (8.9.3/8.9.3) with ESMTP id VAA31765 for ; Sat, 10 Mar 2001 21:32:19 -0500 X-Authentication-Warning: SLASH.REM.CMU.EDU: mukesh owned process doing -bs Date: Sat, 10 Mar 2001 21:32:19 -0500 (EST) From: mukesh agrawal X-Sender: mukesh@SLASH.REM.CMU.EDU To: netdev@oss.sgi.com Subject: handling of passive open -- syn and ack queues Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2947 Lines: 66 I'm curious about the way Linux handles passive opens. I've been looking at the code, and running some experiments. It seems like there could be improvements made for high load (actually, overload) conditions. My understanding of how things presently work goes like this: Conceptually, there's a syn queue and an ack queue. The syn queue contains connections for which we've received a syn, but not the ack of our syn+ack. The ack queue is for connections where the three-way handshake is complete. The maximum size of the syn queue is governed by /proc/sys/net/ipv4/tcp_max_synbacklog. The size of the ack queue is limited by min(backlog parameter to listen call, SOMAXCONN). The syn and ack queues are actually part of the same queue, but the size of each is limited independently of the other. If we receive an ack for one of our syn+acks, and the ack queue is full, we silently drop the ack. The problem that I run into in my experiments is this: I have a single machine running as a web server. There are three machines submitting requests to the web server. The volume of data to be returned is greater than the link capacity (the network is overloaded). Obviously, requests will queue up at the server. The webserver (Apache) uses one request per process. When all processes are busy, the ack queue in the kernel starts to build up. When the ack queue is filled, the syn queue starts to build up. The problem is that most of the connections in the syn queue remain in the syn queue, because the acks arrive more quickly than the webserver finishes requests [1]. Because we've dropped and ignored the ack, the connection request won't move to the ack queue until the client has retransmitted the ack (at least 3 seconds, because the client has to use the initial RTT estimate). Eventually, the ack queue decreases (because no new requests are making it in), and the number of requests being served decreases as well. Between the time that this happens, and the time that the clients timeout and retransmit their acks, the server is idle. Questions 1. Why the separate limits for the syn and ack queues? In particular, why do we not always move from the syn queue to the ack queue. Is it because of the cost of allocating the full socket? Something else entirely? 2. If we have to maintain separate limits, why not still track that we have received an ack for something in the syn queue, so that we can move it later. This will alleviate the wait for the retransmit from the client. Thanks mukesh [1] In the initial experiments, there was a different problem that also led to the server idling. The outbound syn+acks were getting dropped at the device queue (they were contending with the response data for queue space, and losing). Given that the penalty for losing a syn+ack is much higher than the penalty for losing a data packet, we prioritized syn+acks higher than data packets. From owner-netdev@oss.sgi.com Sat Mar 10 19:20:29 2001 Received: by oss.sgi.com id ; Sat, 10 Mar 2001 19:20:09 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:32224 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Sat, 10 Mar 2001 19:19:47 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id C4E101F6A; Sat, 10 Mar 2001 22:19:35 -0500 (EST) Message-ID: <3AAAEEC8.9375ED6@mandrakesoft.com> Date: Sat, 10 Mar 2001 22:19:36 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jes Sorensen Cc: netdev@oss.sgi.com Subject: Re: initial acenic ZC cleanup References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> <15015.65092.349145.143015@pizda.ninka.net> <3AA80487.3C7E26A6@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jes Sorensen wrote: > >>>>> "Jeff" == Jeff Garzik writes: > Jeff> Does ethtool need to be updated for you? > A little, it needs to be taught about the tuning parameters for the > AceNIC. We probably should put a list of NIC specific tuning items in > there so it knows how to grog the different cards' different > parameters. If they are NIC-specific, then you should just use ioctls... It's a silly redirection to add NIC-specific stuff to the ethtool ioctl, when you could just add a private ioctl. -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Sat Mar 10 19:26:19 2001 Received: by oss.sgi.com id ; Sat, 10 Mar 2001 19:26:00 -0800 Received: from panic.ohr.gatech.edu ([130.207.47.194]:49632 "HELO havoc.gtf.org") by oss.sgi.com with SMTP id ; Sat, 10 Mar 2001 19:25:49 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id DEFBD1F6A; Sat, 10 Mar 2001 22:25:37 -0500 (EST) Message-ID: <3AAAF032.24AC716D@mandrakesoft.com> Date: Sat, 10 Mar 2001 22:25:38 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jes Sorensen Cc: Linux Knernel Mailing List , netdev@oss.sgi.com Subject: Re: [PATCH] RFC: fix ethernet device initialization References: <3AA6A570.57FF2D36@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Jes Sorensen wrote: > I don't like the way you declare all the code in obscure macros in > there. > > +#define DECLARE_CHG_MTU(suffix,low,high) \ > + static int suffix##_change_mtu(struct net_device *dev, int new_mtu) \ > ...... > > All it does is to make the code harder to read and debug for little/no > gain. I disagree, but you probably knew that when you saw the code :) These macros are not used inside code, they declare entire functions. These functions are 100% duplicated across 2-4 protocols. Duplicated code means bugs in some portions of the code and no others, more difficult to maintain, etc. I even proved this point while developing the patch -- one of the functions was missing an EXPORT_xxx symbol. Using a standard macro automatically fixed this, a small oversight that had been in the kernel probably for over a year. -- Jeff Garzik | "You see, in this world there's two kinds of Building 1024 | people, my friend: Those with loaded guns MandrakeSoft | and those who dig. You dig." --Blondie From owner-netdev@oss.sgi.com Sun Mar 11 09:42:56 2001 Received: by oss.sgi.com id ; Sun, 11 Mar 2001 09:42:36 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:16645 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 11 Mar 2001 09:42:28 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA23882; Sun, 11 Mar 2001 20:41:42 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103111741.UAA23882@ms2.inr.ac.ru> Subject: Re: weird implementation of ipip and sit tunnels To: pekkas@netcore.fi (Pekka Savola) Date: Sun, 11 Mar 2001 20:41:42 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Pekka Savola" at Mar 11, 1 01:13:57 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 291 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > It seems you can only add point-to-point tunnels. and > In short: with 'ip tunnel', traceroute6 seems to set TTL to IPv4 packet > header too, not only IPv6. Sigh, it is that rare case when I wrote pretty detailed manual, occasionally answering these questions in bold. 8) Alexey From owner-netdev@oss.sgi.com Sun Mar 11 09:53:26 2001 Received: by oss.sgi.com id ; Sun, 11 Mar 2001 09:53:06 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:37637 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 11 Mar 2001 09:52:46 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA24160; Sun, 11 Mar 2001 20:52:34 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103111752.UAA24160@ms2.inr.ac.ru> Subject: Re: PROBLEM: a local TCP socket close does not trigger a poll on the other end To: support@topgraphx.COM (Bernard MAUDRY) Date: Sun, 11 Mar 2001 20:52:34 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <3AA7D0B4.8123.10893F11@localhost> from "Bernard MAUDRY" at Mar 10, 1 04:45:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 389 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > So putting 0 in the events fields is a normal value when the application is only > insterested in the error cases. Absolutely right! If poll returns, it will return you all the events. But it will not return until an unmaskable condition will happen. In your case it will never return. > hangup of the other end. There is no hangup. You may write to this socket. Alexey From owner-netdev@oss.sgi.com Sun Mar 11 09:53:56 2001 Received: by oss.sgi.com id ; Sun, 11 Mar 2001 09:53:36 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:40453 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 11 Mar 2001 09:53:21 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA23954; Sun, 11 Mar 2001 20:45:35 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103111745.UAA23954@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Sun, 11 Mar 2001 20:45:35 +0300 (MSK) Cc: davem@redhat.com, feldy@myri.com, ak@muc.de, andrewm@uow.edu.au, hadi@cyberus.ca, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <200103092330.PAA02820@frisbee.myri.com> from "Bob Felderman" at Mar 9, 1 03:30:25 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 154 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > When I removed the patch, I can't make tcp tests panic It is not wonderful, the function exists only in the patch. 8) Investigating... Alexey From owner-netdev@oss.sgi.com Sun Mar 11 10:30:26 2001 Received: by oss.sgi.com id ; Sun, 11 Mar 2001 10:30:16 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:25350 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 11 Mar 2001 10:30:04 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA24855; Sun, 11 Mar 2001 21:29:53 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103111829.VAA24855@ms2.inr.ac.ru> Subject: Re: handling of passive open -- syn and ack queues To: agrawal@ais.ORG (mukesh agrawal) Date: Sun, 11 Mar 2001 21:29:52 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "mukesh agrawal" at Mar 11, 1 04:15:01 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 663 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > because of the cost of allocating the full socket? Simpler. If server does not hold traffic, it is not difficult to conclude that opposite strategy would result in creating infinite amount of sockets, which server has no chances ever to serve. Bursts of load controlled by backlog length. Increase it, if it is not enough. If increasing does not result in success, then, as you said: > syn queue, because the acks arrive more quickly than the webserver > finishes requests [1]. ... which may mean only one thing. Your server does not hold traffic and must be upgraded to faster one (hardware, getting rid of apache, splitting load, etc). Alexey From owner-netdev@oss.sgi.com Sun Mar 11 10:52:46 2001 Received: by oss.sgi.com id ; Sun, 11 Mar 2001 10:52:26 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:40710 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 11 Mar 2001 10:52:07 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA25011; Sun, 11 Mar 2001 21:45:13 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103111845.VAA25011@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Sun, 11 Mar 2001 21:45:13 +0300 (MSK) Cc: davem@redhat.com, feldy@myri.com, ak@muc.de, andrewm@uow.edu.au, hadi@cyberus.ca, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <200103092224.OAA02690@frisbee.myri.com> from "Bob Felderman" at Mar 9, 1 02:24:32 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 519 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > >>EIP; c0211bc4 <===== Please, look how function tcp_collapse() looks in your tree. (I am sorry, I do not know what exactly of patches do you use). Does it contain the following? for (skb = head; skb != tail; ) { /* No new bits? It is possible on ofo queue. */ if (!before(start, TCP_SKB_CB(skb)->end_seq)) { struct sk_buff *next = skb->next; __skb_unlink(skb, skb->list); __kfree_skb(skb); NET_INC_STATS_BH(TCPRcvCollapsed); skb = next; continue; } Alexey From owner-netdev@oss.sgi.com Sun Mar 11 10:54:16 2001 Received: by oss.sgi.com id ; Sun, 11 Mar 2001 10:53:56 -0800 Received: from netcore.fi ([193.94.160.1]:19204 "EHLO netcore.fi") by oss.sgi.com with ESMTP id ; Sun, 11 Mar 2001 10:53:48 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f2BIrgF11088; Sun, 11 Mar 2001 20:53:42 +0200 Date: Sun, 11 Mar 2001 20:53:42 +0200 (EET) From: Pekka Savola To: cc: Subject: Re: weird implementation of ipip and sit tunnels In-Reply-To: <200103111741.UAA23882@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sun, 11 Mar 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > It seems you can only add point-to-point tunnels. > and > > In short: with 'ip tunnel', traceroute6 seems to set TTL to IPv4 packet > > header too, not only IPv6. > > Sigh, it is that rare case when I wrote pretty detailed manual, > occasionally answering these questions in bold. 8) I'm sorry for this not-so-FAQ. The problem is, man(1) page is the place where people would actually look for, and expect to find, information like this -- I didn't realize .tex documentation even existed before now. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Sun Mar 11 11:04:17 2001 Received: by oss.sgi.com id ; Sun, 11 Mar 2001 11:04:07 -0800 Received: from citadel.myri.com ([199.120.212.1]:31104 "EHLO myri.com") by oss.sgi.com with ESMTP id ; Sun, 11 Mar 2001 11:03:44 -0800 Received: from orion.myri.com.myri.com (orion [199.120.212.245]) by myri.com (8.9.3+Sun/8.9.1) with SMTP id LAA20918; Sun, 11 Mar 2001 11:02:25 -0800 (PST) Date: Sun, 11 Mar 2001 11:02:25 -0800 (PST) From: Bob Felderman Message-Id: <200103111902.LAA20918@myri.com> To: feldy@myri.com, kuznet@ms2.inr.ac.ru Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] Cc: ak@muc.de, andrewm@uow.edu.au, davem@redhat.com, hadi@cyberus.ca, netdev@oss.sgi.com, pp@evil.netppl.fi Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing 361878 Mar 9 12:24 zerocopy-2.4.2-1.diff -#define TCP_DONT_COLLAPSE (TCP_FLAG_FIN|TCP_FLAG_URG|TCP_FLAG_SYN) - !(tcp_flag_word(skb->h.th)&TCP_DONT_COLLAPSE) && - !(tcp_flag_word(skb_next->h.th)&TCP_DONT_COLLAPSE)) { - /* OK to collapse two skbs to one */ - memcpy(skb_put(skb, skb_next->len), skb_next->data, skb_next->len); - __skb_unlink(skb_next, skb_next->list); - scb->end_seq = scb_next->end_seq; - __kfree_skb(skb_next); + /* First, check that queue is collapsable and find + * the point where collapsing can be useful. */ + for (skb = head; skb != tail; ) { + /* No new bits? It is possible on ofo queue. */ + if (!before(start, TCP_SKB_CB(skb)->seq)) { + struct sk_buff *next = skb->next; + __skb_unlink(skb, skb->list); + __kfree_skb(skb); NET_INC_STATS_BH(TCPRcvCollapsed); From owner-netdev@oss.sgi.com Sun Mar 11 11:24:27 2001 Received: by oss.sgi.com id ; Sun, 11 Mar 2001 11:24:17 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:4359 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sun, 11 Mar 2001 11:24:10 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA25328; Sun, 11 Mar 2001 22:16:56 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103111916.WAA25328@ms2.inr.ac.ru> Subject: Re: [Fwd: Re: possible bug x86 2.4.2 SMP in IP receive stack] To: feldy@myri.com (Bob Felderman) Date: Sun, 11 Mar 2001 22:16:56 +0300 (MSK) Cc: feldy@myri.com, ak@muc.de, andrewm@uow.edu.au, davem@redhat.com, hadi@cyberus.ca, netdev@oss.sgi.com, pp@evil.netppl.fi In-Reply-To: <200103111902.LAA20918@myri.com> from "Bob Felderman" at Mar 11, 1 11:02:25 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 3400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > + /* First, check that queue is collapsable and find > + * the point where collapsing can be useful. */ > + for (skb = head; skb != tail; ) { > + /* No new bits? It is possible on ofo queue. */ > + if (!before(start, TCP_SKB_CB(skb)->seq)) { > + struct sk_buff *next = skb->next; > + __skb_unlink(skb, skb->list); > + __kfree_skb(skb); > NET_INC_STATS_BH(TCPRcvCollapsed); Yep, it is not a right patch. The next ones and previous ones, which do not contain this "if" are right. Either delete all the if: > + /* No new bits? It is possible on ofo queue. */ > + if (!before(start, TCP_SKB_CB(skb)->seq)) { or, better, replace tcp_collapse with correct one. (Appended. I am sorry, I cannot prepare incremental patch) Alexey static void tcp_collapse(struct sock *sk, struct sk_buff *head, struct sk_buff *tail, u32 start, u32 end) { struct sk_buff *skb; /* First, check that queue is collapsable and find * the point where collapsing can be useful. */ for (skb = head; skb != tail; ) { /* No new bits? It is possible on ofo queue. */ if (!before(start, TCP_SKB_CB(skb)->end_seq)) { struct sk_buff *next = skb->next; __skb_unlink(skb, skb->list); __kfree_skb(skb); NET_INC_STATS_BH(TCPRcvCollapsed); skb = next; continue; } /* The first skb to collapse is: * - not SYN/FIN and * - bloated or contains data before "start" or * overlaps to the next one. */ if (!skb->h.th->syn && !skb->h.th->fin && (tcp_win_from_space(skb->truesize) > skb->len || before(TCP_SKB_CB(skb)->seq, start) || (skb->next != tail && TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb->next)->seq))) break; /* Decided to skip this, advance start seq. */ start = TCP_SKB_CB(skb)->end_seq; skb = skb->next; } if (skb == tail || skb->h.th->syn || skb->h.th->fin) return; while (before(start, end)) { struct sk_buff *nskb; int header = skb_headroom(skb); int copy = (PAGE_SIZE - sizeof(struct sk_buff) - sizeof(struct skb_shared_info) - header - 31)&~15; /* Too big header? This can happen with IPv6. */ if (copy < 0) return; if (end-start < copy) copy = end-start; nskb = alloc_skb(copy+header, GFP_ATOMIC); if (!nskb) return; skb_reserve(nskb, header); memcpy(nskb->head, skb->head, header); nskb->nh.raw = nskb->head + (skb->nh.raw-skb->head); nskb->h.raw = nskb->head + (skb->h.raw-skb->head); nskb->mac.raw = nskb->head + (skb->mac.raw-skb->head); memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start; __skb_insert(nskb, skb->prev, skb, skb->list); tcp_set_owner_r(nskb, sk); /* Copy data, releasing collapsed skbs. */ while (copy > 0) { int offset = start - TCP_SKB_CB(skb)->seq; int size = TCP_SKB_CB(skb)->end_seq - start; if (offset < 0) BUG(); if (size > 0) { size = min(copy, size); if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) BUG(); TCP_SKB_CB(nskb)->end_seq += size; copy -= size; start += size; } if (!before(start, TCP_SKB_CB(skb)->end_seq)) { struct sk_buff *next = skb->next; __skb_unlink(skb, skb->list); __kfree_skb(skb); NET_INC_STATS_BH(TCPRcvCollapsed); skb = next; if (skb == tail || skb->h.th->syn || skb->h.th->fin) return; } } } } From owner-netdev@oss.sgi.com Mon Mar 12 02:26:46 2001 Received: by oss.sgi.com id ; Mon, 12 Mar 2001 02:26:26 -0800 Received: from [63.84.169.221] ([63.84.169.221]:20228 "EHLO topgraphx.com") by oss.sgi.com with ESMTP id ; Mon, 12 Mar 2001 02:25:58 -0800 Received: from wapiti [193.248.252.41] by topgraphx.com with ESMTP (SMTPD32-6.05) id A3E0284E0304; Mon, 12 Mar 2001 04:24:32 -0600 From: "Bernard MAUDRY" To: kuznet@ms2.inr.ac.ru Date: Mon, 12 Mar 2001 11:13:30 +0100 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: PROBLEM: a local TCP socket close does not trigger a poll on the other end CC: netdev@oss.sgi.com Message-ID: <3AACAF5A.4528.238EFA68@localhost> In-reply-to: <200103111752.UAA24160@ms2.inr.ac.ru> References: <3AA7D0B4.8123.10893F11@localhost> from "Bernard MAUDRY" at Mar 10, 1 04:45:01 pm X-mailer: Pegasus Mail for Win32 (v3.12c) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > > So putting 0 in the events fields is a normal value when the application > > is only insterested in the error cases. > > Absolutely right! It is of no use if the poll is not triggered when such an error occurs. > If poll returns, it will return you all the events. > But it will not return until an unmaskable condition will happen. > In your case it will never return. If the other end is closed, no more unmaskable conditions will happen. This is not fair at all, because the socket is informed that the other end has closed, so it is possible to trigger the poll. The man pages states: These three bits (POLLERR or POLLHUP or POLLNVAL) are meaningless in the events field, and will be set in the revents field WHENEVER the corresponding condition is true. so an application can expect the poll to be triggered when the other end is closed. I didn't find any other way to detect the close of the other end, but if you have a solution for this problem other than a "read" (because of the complexity of managing it in a multi-threaded application), I am very interested. You can also write (or ask someone who knows) a patch to trigger the poll :-)) > > hangup of the other end. > > There is no hangup. You may write to this socket. Writing successfully to a socket when the other end is closed seems to me a very confusing behavior (this is in fact the situation we discover first, before investigating why the poll was not informed). IMHO, in such a situation, the write should fail, and I am very curious of the reasons of such a behavior. Thanks for your reply. Best regards. Bernard +--------------------------------------+ | Bernard MAUDRY | | Top Graph'X Customer Support | | 10, allee de la mare Jacob | | 91290 La Norville | | FRANCE | | Tel: (33) 1 69 26 97 88 | | Fax: (33) 1 69 26 97 89 | | email: support@topgraphx.com | +--------------------------------------+ From owner-netdev@oss.sgi.com Mon Mar 12 06:57:59 2001 Received: by oss.sgi.com id ; Mon, 12 Mar 2001 06:57:39 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:6667 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Mon, 12 Mar 2001 06:57:16 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id PAA09182; Mon, 12 Mar 2001 15:57:09 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id PAA01667; Mon, 12 Mar 2001 15:57:08 +0100 To: Jeff Garzik Cc: netdev@oss.sgi.com Subject: Re: initial acenic ZC cleanup References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> <15015.65092.349145.143015@pizda.ninka.net> <3AA80487.3C7E26A6@mandrakesoft.com> <3AAAEEC8.9375ED6@mandrakesoft.com> From: Jes Sorensen Date: 12 Mar 2001 15:57:08 +0100 In-Reply-To: Jeff Garzik's message of "Sat, 10 Mar 2001 22:19:36 -0500" Message-ID: Lines: 29 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Jeff" == Jeff Garzik writes: Jeff> Jes Sorensen wrote: >> >>>>> "Jeff" == Jeff Garzik writes: Jeff> Does ethtool need to be updated for you? >> A little, it needs to be taught about the tuning parameters for the >> AceNIC. We probably should put a list of NIC specific tuning items >> in there so it knows how to grog the different cards' different >> parameters. Jeff> If they are NIC-specific, then you should just use ioctls... Jeff> It's a silly redirection to add NIC-specific stuff to the Jeff> ethtool ioctl, when you could just add a private ioctl. And another tool to access it so we end up with ethtool and acetool and sktool and hmetool and 3c905tool etc etc. all of them basically doing the same thing. It seems silly to have two tools, one for setting the link rate and flow control and one for setting the interrupt coalescing counters. The point is that a lot of the NICs have the same type of tuning variables, sometimes they are identical sometimes they are slightly different. My suggestion is that we either teach ethtool about the different names of tuning parameters or if we are lazy just allow one to set `NIC private parameters 1-16 with it' and define those for the different NIC as different things. Jes From owner-netdev@oss.sgi.com Mon Mar 12 07:09:29 2001 Received: by oss.sgi.com id ; Mon, 12 Mar 2001 07:09:19 -0800 Received: from smtp1.cern.ch ([137.138.128.38]:59140 "EHLO smtp1.cern.ch") by oss.sgi.com with ESMTP id ; Mon, 12 Mar 2001 07:08:58 -0800 Received: from lxplus015.cern.ch (IDENT:root@lxplus015.cern.ch [137.138.161.112]) by smtp1.cern.ch (8.9.3/8.9.3) with ESMTP id QAA06588; Mon, 12 Mar 2001 16:08:48 +0100 (MET) Received: (from jes@localhost) by lxplus015.cern.ch (8.9.3/8.9.3) id QAA05314; Mon, 12 Mar 2001 16:08:47 +0100 To: Werner Almesberger Cc: Jeff Garzik , netdev@oss.sgi.com, Linux Knernel Mailing List Subject: Re: New net features for added performance References: <3A9842DC.B42ECD7A@mandrakesoft.com> <20010225132249.J18271@almesberger.net> From: Jes Sorensen Date: 12 Mar 2001 16:08:47 +0100 In-Reply-To: Werner Almesberger's message of "Sun, 25 Feb 2001 13:22:49 +0100" Message-ID: Lines: 19 User-Agent: Gnus/5.070096 (Pterodactyl Gnus v0.96) Emacs/20.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing >>>>> "Werner" == Werner Almesberger writes: Werner> Jeff Garzik wrote: >> 3) Slabbier packet allocation. Werner> Hmm, this may actually be worse during bursts: if you burst Werner> exceeds the preallocated size, you have to perform more Werner> expensive/slower operations (e.g. running a tasklet) to refill Werner> your cache. You may want to look at how I did this in the acenic driver. If the water mark goes below a certain level I schedule the tasklet, if it gets below an urgent watermark I do the allocation in the interrupt handler itself. This is of course mainly useful for cards which give you deep queues. Jes From owner-netdev@oss.sgi.com Mon Mar 12 07:25:29 2001 Received: by oss.sgi.com id ; Mon, 12 Mar 2001 07:25:09 -0800 Received: from shiva.jussieu.fr ([134.157.0.129]:61965 "EHLO shiva.jussieu.fr") by oss.sgi.com with ESMTP id ; Mon, 12 Mar 2001 07:24:42 -0800 Received: from cedre.inetsrv (qosmos.pepiniere.jussieu.fr [134.157.171.9]) by shiva.jussieu.fr (8.11.3/jtpda-5.3.3) with ESMTP id f2CFOdL70640 for ; Mon, 12 Mar 2001 16:24:39 +0100 (CET) Received: from qosmos.net (IDENT:harmel@olivier.foret [192.168.2.8]) by cedre.inetsrv (8.9.3/8.9.3) with ESMTP id QAA16942 for ; Mon, 12 Mar 2001 16:24:38 +0100 Message-ID: <3AACEBDB.4FBCFAE5@qosmos.net> Date: Mon, 12 Mar 2001 16:31:39 +0100 From: Gautier Harmel Reply-To: Gautier.Harmel@qosmos.net Organization: QOSMOS.NET X-Mailer: Mozilla 4.76 [fr] (X11; U; Linux 2.4.2 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: network tunning pb under kernel 2.4 ? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello, I need to flood a network with a little program which basically sends UDP packets in an infinite loop. Under the kernel 2.2.x there is no pb, I can flood my network by sending my packets at 12 Mbs. Under kernel 2.4.2 we can't flood it faster than 4.6 Mbs ! While my CPU doesn't work too much, I guess that there is a soft limit in the kernel, but I can't find it on the sources. I've tried to modify somes of the /proc/sys/net/ without success. thank you very much for your help, Gautier From owner-netdev@oss.sgi.com Mon Mar 12 09:41:40 2001 Received: by oss.sgi.com id ; Mon, 12 Mar 2001 09:41:30 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:12560 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 12 Mar 2001 09:41:05 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA08751; Mon, 12 Mar 2001 20:40:52 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103121740.UAA08751@ms2.inr.ac.ru> Subject: Re: PROBLEM: a local TCP socket close does not trigger a poll on the other end To: support@topgraphx.com (Bernard MAUDRY) Date: Mon, 12 Mar 2001 20:40:52 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <3AACAF5A.4528.238EFA68@localhost> from "Bernard MAUDRY" at Mar 12, 1 11:13:30 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 687 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > This is not fair at all, because the socket is informed that the other end has > closed, so it is possible to trigger the poll. It is triggered. You masked the event. POLLIN. It is normal _maskable_ event, not an error and not an out of band data. > Writing successfully to a socket when the other end is closed Write will fail in the case, if other end happened to be fully closed. Seems, you think that tcp is a psychic. Alas, this poor guy is not a psychic and can distingusih close and half close only actively or if it is implied by application level protocol. See? You _must_ write something to detect death of other end or you must close actively yourself. Alexey From owner-netdev@oss.sgi.com Tue Mar 13 06:37:34 2001 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 06:37:15 -0800 Received: from [63.84.169.221] ([63.84.169.221]:16656 "EHLO topgraphx.com") by oss.sgi.com with ESMTP id ; Tue, 13 Mar 2001 06:37:05 -0800 Received: from wapiti [193.248.252.67] by topgraphx.com with ESMTP (SMTPD32-6.05) id A0604561012E; Tue, 13 Mar 2001 08:36:16 -0600 From: "Bernard MAUDRY" To: kuznet@ms2.inr.ac.ru Date: Tue, 13 Mar 2001 15:37:51 +0100 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: Re: PROBLEM: a local TCP socket close does not trigger a poll on the other end CC: netdev@oss.sgi.com Message-ID: <3AAE3ECF.28273.29A75A51@localhost> In-reply-to: <200103121740.UAA08751@ms2.inr.ac.ru> References: <3AACAF5A.4528.238EFA68@localhost> from "Bernard MAUDRY" at Mar 12, 1 11:13:30 am X-mailer: Pegasus Mail for Win32 (v3.12c) Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! Thanks for your reply. > > This is not fair at all, because the socket is informed that the other > > end has closed, so it is possible to trigger the poll. > > It is triggered. You masked the event. POLLIN. > It is normal _maskable_ event, not an error and not an out of band data. But POLLIN indicates available input, and a close of the other end does not mean available input, POLLHUP should be a reasonable choice for this purpose but, unfortunately, it is not maskable (why ???). The use of POLLIN is very misleading. I will try to find a way to deal with it as it appears that I have no other choice. > > Writing successfully to a socket when the other end is closed > > Write will fail in the case, if other end happened to be fully closed. Write succeeds in this case, I tested it, and this is the original symptom of my close detection problem. > Seems, you think that tcp is a psychic. Alas, this poor guy is not a > psychic and can distingusih close and half close only actively or if it is > implied by application level protocol. See? You _must_ write something to > detect death of other end or you must close actively yourself. The call I do to poll was intended to be an active detection of the close of the other end, and it seems that socket library does not provide any direct mean to achieve this goal. I don't think that tcp is a psychic, just that the socket library hides the tcp driver knowledge of the socket state. I wonder which system call can be used to get the precise state of the socket. Thanks for your help. Best regards. Bernard. +--------------------------------------+ | Bernard MAUDRY | | Top Graph'X Customer Support | | 10, allee de la mare Jacob | | 91290 La Norville | | FRANCE | | Tel: (33) 1 69 26 97 88 | | Fax: (33) 1 69 26 97 89 | | email: support@topgraphx.com | +--------------------------------------+ From owner-netdev@oss.sgi.com Tue Mar 13 07:52:55 2001 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 07:52:35 -0800 Received: from igw3.watson.ibm.com ([198.81.209.18]:39113 "EHLO igw3.watson.ibm.com") by oss.sgi.com with ESMTP id ; Tue, 13 Mar 2001 07:52:09 -0800 Received: from sp1n189at0.watson.ibm.com (sp1n189at0.watson.ibm.com [9.2.104.62]) by igw3.watson.ibm.com (8.11.2/8.11.2) with ESMTP id f2DFq2604390 for ; Tue, 13 Mar 2001 10:52:02 -0500 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.251.57]) by sp1n189at0.watson.ibm.com (8.9.3/Feb-20-98) with ESMTP id KAA22798 for ; Tue, 13 Mar 2001 10:52:02 -0500 Received: from slug.watson.ibm.com (slug.watson.ibm.com [9.2.234.187]) by kitch0.watson.ibm.com (AIX4.3/8.9.3/8.9.3/01-10-2000) with SMTP id KAA260062 for ; Tue, 13 Mar 2001 10:52:01 -0500 From: Michal Ostrowski MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15022.16934.52573.377722@slug.watson.ibm.com> Date: Tue, 13 Mar 2001 10:52:06 -0500 (EST) To: netdev@oss.sgi.com Subject: TCP sockets not flagged as writable? X-Mailer: VM 6.72 under 21.1 (patch 10) "Capitol Reef" XEmacs Lucid Reply-To: mostrows@speakeasy.net Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing When poling the state of a TCP socket, the criteria used to determine whether or not we can write to the socket are: if (tcp_wspace(sk) >= tcp_min_write_space(sk)) { This translates to: sk->sndbuf - sk->wmem_queued >= sk->wmem_queued/2 or sk->sndbuf >= 3/2 * sk->wmem_queued In tcp_sendmsg a different set of criteria is used. Here the test is done with tcp_memory_free() and is equivalent to: sk->wmem_queued < sk->sndbuf The condition in tcp_sendmsg() is less demanding than the condition in tcp_poll() and so it appears possible for poll() or select() to return without flagging a socket as being writable when in fact a write operation to the socket could complete without blocking. Is this wrong or is this just my imagination? Please cc me directly on an replies. Michal Ostrowski mostrows@speakeasy.net From owner-netdev@oss.sgi.com Tue Mar 13 10:17:20 2001 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 10:17:11 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:60434 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 13 Mar 2001 10:17:01 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA29024; Tue, 13 Mar 2001 21:16:42 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103131816.VAA29024@ms2.inr.ac.ru> Subject: Re: TCP sockets not flagged as writable? To: mostrows@speakeasy.net Date: Tue, 13 Mar 2001 21:16:42 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <15022.16934.52573.377722@slug.watson.ibm.com> from "Michal Ostrowski" at Mar 13, 1 07:15:02 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 416 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Is this wrong or is this just my imagination? Not this and not this, the third alternative is "simply right". 8) Wakeup happens as soon as there is _enough_ space to wake up. 1 byte is not an enough. What's about exact form of wakeup predicate, criterium can be different. Current one (1/3 of space is free) was selected in 2.2 after several other variants were tried. It is not very essential. Alexey From owner-netdev@oss.sgi.com Tue Mar 13 10:23:31 2001 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 10:23:21 -0800 Received: from igw3.watson.ibm.com ([198.81.209.18]:23755 "EHLO igw3.watson.ibm.com") by oss.sgi.com with ESMTP id ; Tue, 13 Mar 2001 10:23:03 -0800 Received: from sp1n189at0.watson.ibm.com (sp1n189at0.watson.ibm.com [9.2.104.62]) by igw3.watson.ibm.com (8.11.2/8.11.2) with ESMTP id f2DIMu609950; Tue, 13 Mar 2001 13:22:56 -0500 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.251.57]) by sp1n189at0.watson.ibm.com (8.9.3/Feb-20-98) with ESMTP id NAA36488; Tue, 13 Mar 2001 13:22:56 -0500 Received: from slug.watson.ibm.com (slug.watson.ibm.com [9.2.234.187]) by kitch0.watson.ibm.com (AIX4.3/8.9.3/8.9.3/01-10-2000) with SMTP id NAA149060; Tue, 13 Mar 2001 13:22:55 -0500 From: Michal Ostrowski MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15022.25987.855027.503534@slug.watson.ibm.com> Date: Tue, 13 Mar 2001 13:22:59 -0500 (EST) To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: TCP sockets not flagged as writable? In-Reply-To: <200103131816.VAA29024@ms2.inr.ac.ru> References: <15022.16934.52573.377722@slug.watson.ibm.com> <200103131816.VAA29024@ms2.inr.ac.ru> X-Mailer: VM 6.72 under 21.1 (patch 10) "Capitol Reef" XEmacs Lucid Reply-To: mostrows@speakeasy.net Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru writes: > Hello! > > > Is this wrong or is this just my imagination? > > Not this and not this, the third alternative is "simply right". 8) > > Wakeup happens as soon as there is _enough_ space to wake up. > 1 byte is not an enough. But my concern is that when you have just a little bit of space available a write operation may succeed without blocking, even though poll() says it won't. I understand why the criteria for poll() are as strict as they are, I'm just puzzled as to why the criteria are relaxed in tcp_sendmsg(). Should tcp_sendmsg() not also rely on the 1/3 free requirement? Michal Ostrowski mostrows@speakeasy.net From owner-netdev@oss.sgi.com Tue Mar 13 10:55:21 2001 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 10:55:11 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:9491 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 13 Mar 2001 10:54:51 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA29227; Tue, 13 Mar 2001 21:54:41 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103131854.VAA29227@ms2.inr.ac.ru> Subject: Re: TCP sockets not flagged as writable? To: mostrows@speakeasy.net Date: Tue, 13 Mar 2001 21:54:41 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <15022.25987.855027.503534@slug.watson.ibm.com> from "Michal Ostrowski" at Mar 13, 1 01:22:59 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 411 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > But my concern is that when you have just a little bit of space > available a write operation may succeed without blocking, even though > poll() says it won't. Moreover, write() can succeed even if there is no space _really_. > Should tcp_sendmsg() not also rely on the 1/3 free requirement? Which is fully equavalent to decreasing sndbuf by 1/3. 8) Wake is delayed exaclty to batch work. Alexey From owner-netdev@oss.sgi.com Tue Mar 13 11:08:11 2001 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 11:08:01 -0800 Received: from [63.116.109.10] ([63.116.109.10]:32755 "EHLO akamai.com") by oss.sgi.com with ESMTP id ; Tue, 13 Mar 2001 11:07:37 -0800 Received: from akamai.com (vwall3.sanmateo.akamai.com [172.23.1.73]) by akamai.com (8.11.1/8.10.1) with ESMTP id f2DJ7UG02524 for ; Tue, 13 Mar 2001 11:07:30 -0800 (PST) Received: from akamai.com (telly.sanmateo.akamai.com [172.23.1.17]) by akamai.com (8.11.1/8.10.1) with ESMTP id f2DJ7TF02512; Tue, 13 Mar 2001 11:07:29 -0800 (PST) Received: from akamai.com (dhcp-36-8.sanmateo.akamai.com [172.23.4.250]) by akamai.com (8.11.1/8.11.1) with ESMTP id f2DJ7k329470; Tue, 13 Mar 2001 11:07:47 -0800 (PST) Message-ID: <3AAE6FEF.DB21003@akamai.com> Date: Tue, 13 Mar 2001 11:07:27 -0800 From: Dancer X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.12-20smp i686) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: mostrows@speakeasy.net, netdev@oss.sgi.com Subject: Re: TCP sockets not flagged as writable? References: <200103131854.VAA29227@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > Hello! > > > But my concern is that when you have just a little bit of space > > available a write operation may succeed without blocking, even though > > poll() says it won't. > > Moreover, write() can succeed even if there is no space _really_. Under what circumstances, out of curiosity? D From owner-netdev@oss.sgi.com Tue Mar 13 11:12:41 2001 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 11:12:21 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:21779 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 13 Mar 2001 11:12:11 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA29388; Tue, 13 Mar 2001 22:12:00 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103131912.WAA29388@ms2.inr.ac.ru> Subject: Re: PROBLEM: a local TCP socket close does not trigger a poll on the other end To: support@topgraphx.com (Bernard MAUDRY) Date: Tue, 13 Mar 2001 22:12:00 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <3AAE3ECF.28273.29A75A51@localhost> from "Bernard MAUDRY" at Mar 13, 1 03:37:51 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 823 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > purpose but, unfortunately, it is not maskable (why ???). Standards require this. Apparently, SVR4 poll() was not designed to understand bidirectional close. It was mainly for terminal io and does not fit even for wouldbe generic STREAMS io. It can be modified to make _maskable_ POLLINEOF to get close in read direction and _maskable_ POLLOUTEOF to invalidate write direction. And reserving unmaskable POLLHUP for the case, when no io events are expected in future. But even existing SVR4 interface is still enough for any practical purposes yet. > Write succeeds in this case, I tested it, It succeeds locally by the same reason, why this condition is not set in poll. We simply do not know this at the moment of write()/poll(). But after rtt the connection will be reset and POLLERR will be set. Alexey From owner-netdev@oss.sgi.com Tue Mar 13 11:14:10 2001 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 11:14:01 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:24083 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 13 Mar 2001 11:13:56 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA29416; Tue, 13 Mar 2001 22:13:42 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103131913.WAA29416@ms2.inr.ac.ru> Subject: Re: TCP sockets not flagged as writable? To: dvesperm@akamai.com (Dancer) Date: Tue, 13 Mar 2001 22:13:42 +0300 (MSK) Cc: mostrows@speakeasy.net, netdev@oss.sgi.com In-Reply-To: <3AAE6FEF.DB21003@akamai.com> from "Dancer" at Mar 13, 1 11:07:27 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 175 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Under what circumstances, out of curiosity? When space becomes available immediately after of during poll(). F.e. on gige it is rule rather then exception. Alexey From owner-netdev@oss.sgi.com Wed Mar 14 00:12:45 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 00:12:26 -0800 Received: from netcore.fi ([193.94.160.1]:54023 "EHLO netcore.fi") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 00:11:56 -0800 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f2E8BgR28019 for ; Wed, 14 Mar 2001 10:11:43 +0200 Date: Wed, 14 Mar 2001 10:11:42 +0200 (EET) From: Pekka Savola To: Subject: Bind on the same port for both IPv4 and IPv6 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello all, Current code does not seem to allow binding to "0.0.0.0" and "::" ("INADDR_ANY") on the same port for both IPv4 and IPv6 simultaneously. This can be a bad thing for people who want to test out IPv4 and IPv6 on the same system, because you can't easily provide services for both. Perhaps the old restrictions are based on the old design where you were supposed to use IPv4 MAPPED addresses to get "dual-stack" behaviour? These have since been deprecated. This behaviour has been tested with sendmail, bind9 and ftpd-bsd. If you use a limited address to bind to, e.g. with the following Bind9 config: listen-on { 1.2.3.4; }; listen-on-v6 { any; }; This will work. But it this is not a global solution for a problem, and this kind of facility is not available for too many network daemons. After applying USAGI patch and enabling CONFIG_IPV6_DOUBLE_BIND, this appears to work fine (ie: how it works with KAME IPv6 stack on BSD). Are there reasons why this has not been changed? Are there workarounds for the problem on stock kernels? -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Wed Mar 14 02:47:55 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 02:47:46 -0800 Received: from slip139-92-103-15.tur.it.prserv.net ([139.92.103.15]:57354 "EHLO mozart") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 02:47:29 -0800 Received: from rustcorp.com.au (really [127.0.0.1]) by rustcorp.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.111) for ; Wed, 14 Mar 2001 21:50:26 +1100 (EST) Message-Id: From: Rusty Russell To: netdev@oss.sgi.com Subject: [PATCH] Minor bug. Date: Wed, 14 Mar 2001 21:50:23 +1100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing We use a tailer on skbs, so accessing a few bytes over the end of the skb will not crash the machine, but maybe one day it will. Truncated ICMP packets being forwarded which evoke ICMP errors will do this, if my reading is correct. This patch still returns an ICMP error on such truncated packets. The other choice would be to assume the worst, and return. Just reading the code while on holidays... Rusty. -- Premature optmztion is rt of all evl. --DK --- working-2.4.2-conntrack-fix/net/ipv4/icmp.c.~1~ Sat Aug 5 11:18:49 2000 +++ working-2.4.2-conntrack-fix/net/ipv4/icmp.c Wed Mar 14 18:24:32 2001 @@ -588,7 +588,8 @@ /* * We are an error, check if we are replying to an ICMP error */ - if (iph->protocol==IPPROTO_ICMP) { + if (iph->protocol==IPPROTO_ICMP + && skb_in->tail-(u8*)iph >= sizeof(struct icmphdr)) { icmph = (struct icmphdr *)((char *)iph + (iph->ihl<<2)); /* * Assume any unknown ICMP type is an error. This isn't From owner-netdev@oss.sgi.com Wed Mar 14 04:11:26 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 04:11:17 -0800 Received: from colin.muc.de ([193.149.48.1]:62733 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Wed, 14 Mar 2001 04:10:58 -0800 Received: by colin.muc.de id <140609-3>; Wed, 14 Mar 2001 13:10:49 +0100 Message-ID: <20010314131048.56577@colin.muc.de> Date: Wed, 14 Mar 2001 13:10:48 +0100 From: Andi Kleen To: mostrows@speakeasy.net Cc: netdev@oss.sgi.com Subject: Re: TCP sockets not flagged as writable? References: <15022.16934.52573.377722@slug.watson.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <15022.16934.52573.377722@slug.watson.ibm.com>; from Michal Ostrowski on Tue, Mar 13, 2001 at 04:52:06PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, Mar 13, 2001 at 04:52:06PM +0100, Michal Ostrowski wrote: > > When poling the state of a TCP socket, the criteria used to determine > whether or not we can write to the socket are: > > if (tcp_wspace(sk) >= tcp_min_write_space(sk)) { > > This translates to: > > sk->sndbuf - sk->wmem_queued >= sk->wmem_queued/2 > > or > > sk->sndbuf >= 3/2 * sk->wmem_queued > > > In tcp_sendmsg a different set of criteria is used. Here the test is > done with tcp_memory_free() and is equivalent to: > > sk->wmem_queued < sk->sndbuf > > The condition in tcp_sendmsg() is less demanding than the condition in > tcp_poll() and so it appears possible for poll() or select() to return > without flagging a socket as being writable when in fact a write > operation to the socket could complete without blocking. > > Is this wrong or is this just my imagination? It is done intentionally for performance reasons. poll -> context switch -> send is relatively expensive and for good bandwidth shouldn't be done that often. TCP sendmsg has to work always though. -Andi From owner-netdev@oss.sgi.com Wed Mar 14 10:27:49 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 10:27:28 -0800 Received: from pizda.ninka.net ([216.101.162.242]:18573 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 10:27:27 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id KAA18679; Wed, 14 Mar 2001 10:27:11 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15023.47103.676096.126678@pizda.ninka.net> Date: Wed, 14 Mar 2001 10:27:11 -0800 (PST) To: Rusty Russell Cc: netdev@oss.sgi.com Subject: Re: [PATCH] Minor bug. In-Reply-To: References: X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rusty Russell writes: > This patch still returns an ICMP error on such truncated packets. The > other choice would be to assume the worst, and return. I think the zerocopy version of this code gets it right. Please verify. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed Mar 14 10:41:58 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 10:41:48 -0800 Received: from sj-msg-core-1.cisco.com ([171.71.163.11]:8101 "EHLO sj-msg-core-1.cisco.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 10:41:38 -0800 Received: from kaspit.cisco.com (kaspit.cisco.com [144.254.91.49]) by sj-msg-core-1.cisco.com (8.9.3/8.9.1) with ESMTP id KAA09492; Wed, 14 Mar 2001 10:41:34 -0800 (PST) Received: from drgoldstw2k (dhcp-64-103-121-200.cisco.com [64.103.121.200]) by kaspit.cisco.com (Mirapoint) with SMTP id ACI13646; Wed, 14 Mar 2001 20:41:28 +0200 (GMT-2) From: "Dror Goldstein" To: Cc: "Dror Goldstein \(E-mail\)" Subject: Problem forwarding fragmented multicast packets. Date: Wed, 14 Mar 2001 20:43:54 +0200 Message-ID: <001d01c0acb6$b92048e0$c8796740@drgoldstw2k> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook CWS, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, When multicast data packet should be fragmented there is a bug and the packets Are not forward. Following is a description of the problem as I found it in the Kernel code, and after that a patch that solved it. In the function ip_fragment (ip_output.c) when datagrame is fragmented the Flags field (that is included in skb->cb field) is not copied from the old Skb to the new skb (skb2). Letter on ip_mc_output (ip_output.c) checks this flag in order to determine If the packet should be loop back. Note: - I check the fix only for multicast traffic when the flag was IPSKB_FORWARD I didn't check it for the IPSK_TRANSLATED and IPSK_MASQUERADED flags. - I compare the ip_output.c of kernel 2.4.0-test9 and kernel 2.4.2. --- ../../../../linux-2.4.2/linux/net/ipv4/ip_output.c Fri Oct 27 20:03:14 2000 +++ ip_output.c Wed Mar 14 18:42:21 2001 @@ -5,7 +5,7 @@ * * The Internet Protocol (IP) output module. * - * Version: $Id: ip_output.c,v 1.87 2000/10/25 20:07:22 davem Exp $ + * Version: $Id: ip_output.c,v 1.85 2000/08/31 23:39:12 davem Exp $ * * Authors: Ross Biro, * Fred N. van Kempen, @@ -37,6 +37,8 @@ * and more readibility. * Marc Boucher : When call_out_firewall returns FW_QUEUE, * silently drop skb instead of failing with -EPERM. + * Dror Goldstein: Copy the the flags (of inet_skb_parm structure) + * to each IP packet fragment. */ #include @@ -828,6 +830,10 @@ */ if (offset == 0) ip_options_fragment(skb); + + /* Dror g.:Copy the flags to each fragment */ + IPCB(skb2)->flags = IPCB(skb)->flags; + /* * Added AC : If we are fragmenting a fragment that's not the From owner-netdev@oss.sgi.com Wed Mar 14 11:41:10 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 11:40:51 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:59147 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 14 Mar 2001 11:40:42 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA14316; Wed, 14 Mar 2001 22:40:14 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103141940.WAA14316@ms2.inr.ac.ru> Subject: Re: [PATCH] Minor bug. To: rusty@rustcorp.COM.AU (Rusty Russell) Date: Wed, 14 Mar 2001 22:40:14 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Rusty Russell" at Mar 14, 1 02:15:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 445 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > skb will not crash the machine, but maybe one day it will. It will not. Safe access to tail octets is critical for parsing and allows to avoids lots of useless tests (sort of this one), so that we are not going to lose this property. > Just reading the code while on holidays... BTW you read wrong code. This place does not exist in this form more. New variant is more sensitive (and, hence, more paranoid) to such things. Alexey From owner-netdev@oss.sgi.com Thu Mar 15 00:10:26 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 00:10:07 -0800 Received: from pizda.ninka.net ([216.101.162.242]:64897 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Thu, 15 Mar 2001 00:09:51 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id AAA19567; Thu, 15 Mar 2001 00:08:48 -0800 Date: Thu, 15 Mar 2001 00:08:48 -0800 Message-Id: <200103150808.AAA19567@pizda.ninka.net> From: "David S. Miller" To: linux-kernel@vger.kernel.org CC: linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: [UPDATE] Zerocopy against 2.4.3-pre4 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Available at: ftp://ftp.kernel.org/pub/linux/kernel/people/davem/zerocopy-2.4.3p3-1.diff.gz This is basically identical to the networking in Alan's current patches. It is only provided for people who want the zerocopy stuff but for some reason don't feel like getting all the other changes in Alan's tree :-))) Please report any regressions found vs. 2.4.3-pre4 vanilla, thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Thu Mar 15 04:57:45 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 04:57:25 -0800 Received: from perninha.conectiva.com.br ([200.250.58.156]:28936 "HELO postfix.conectiva.com.br") by oss.sgi.com with SMTP id ; Thu, 15 Mar 2001 04:57:02 -0800 Received: from burns.conectiva (burns.conectiva [10.0.0.4]) by postfix.conectiva.com.br (Postfix) with SMTP id D224816B18 for ; Thu, 15 Mar 2001 09:56:48 -0300 (EST) Received: (qmail 20190 invoked by uid 0); 15 Mar 2001 12:56:07 -0000 Received: from dial15.ras.conectiva (HELO imladris.rielhome.conectiva) (root@10.0.8.15) by burns.conectiva with SMTP; 15 Mar 2001 12:56:07 -0000 Received: from localhost (IDENT:riel@localhost [127.0.0.1]) by imladris.rielhome.conectiva (8.11.1/8.11.1) with ESMTP id f2FCWM604842; Thu, 15 Mar 2001 09:32:22 -0300 Date: Thu, 15 Mar 2001 09:32:22 -0300 (BRST) From: Rik van Riel X-Sender: riel@imladris.rielhome.conectiva To: =?ISO-8859-1?Q?M=E5rten_Wikstr=F6m?= Cc: "'linux-kernel@vger.kernel.org'" , netdev@oss.sgi.com Subject: Re: How to optimize routing performance In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: 8BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 15 Mar 2001, [ISO-8859-1] Mๅrten Wikstr๖m wrote: > I've performed a test on the routing capacity of a Linux 2.4.2 box > versus a FreeBSD 4.2 box. I used two Pentium Pro 200Mhz computers with > 64Mb memory, and two DEC 100Mbit ethernet cards. I used a Smartbits > test-tool to measure the packet throughput and the packet size was set > to 64 bytes. Linux dropped no packets up to about 27000 packets/s, but > then it started to drop packets at higher rates. Worse yet, the output > rate actually decreased, so at the input rate of 40000 packets/s > almost no packets got through. The behaviour of FreeBSD was different, > it showed a steadily increased output rate up to about 70000 packets/s > before the output rate decreased. (Then the output rate was apprx. > 40000 packets/s). > So, my question is: are these figures true, or is it possible to > optimize the kernel somehow? The only changes I have made to the > kernel config was to disable advanced routing. There are some flow control options in the kernel which should help. From your description, it looks like they aren't enabled by default ... At the NordU/USENIX conference in Stockholm (this february) I saw a nice presentation on the flow control code in the Linux networking code and how it improved networking performance. I'm pretty convinced that flow control _should_ be saving your system in this case. OTOH, if they _are_ enabled, the networking people seem to have a new item for their TODO list. ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ From owner-netdev@oss.sgi.com Thu Mar 15 06:20:15 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 06:19:56 -0800 Received: from robur.slu.se ([130.238.98.12]:17681 "EHLO robur.slu.se") convert rfc822-to-8bit by oss.sgi.com with ESMTP id ; Thu, 15 Mar 2001 06:19:31 -0800 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id PAA25702; Thu, 15 Mar 2001 15:19:23 +0100 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT Message-ID: <15024.53099.41814.716733@robur.slu.se> Date: Thu, 15 Mar 2001 15:19:23 +0100 (CET) To: =?ISO-8859-1?Q?M=E5rten=5FWikstr=F6m?= Cc: Rik van Riel , "'linux-kernel@vger.kernel.org'" , netdev@oss.sgi.com Subject: Re: How to optimize routing performance X-Mailer: VM 6.75 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rik van Riel writes: > On Thu, 15 Mar 2001, [ISO-8859-1] Mๅrten Wikstr๖m wrote: > > > I've performed a test on the routing capacity of a Linux 2.4.2 box > > versus a FreeBSD 4.2 box. I used two Pentium Pro 200Mhz computers with > > 64Mb memory, and two DEC 100Mbit ethernet cards. I used a Smartbits > > test-tool to measure the packet throughput and the packet size was set > > to 64 bytes. Linux dropped no packets up to about 27000 packets/s, but > > then it started to drop packets at higher rates. Worse yet, the output > > rate actually decreased, so at the input rate of 40000 packets/s It is a known problem yes. And just as Rik says its has been adressed in 2.1.x by Alexey for first time. > > almost no packets got through. The behaviour of FreeBSD was different, > > it showed a steadily increased output rate up to about 70000 packets/s > > before the output rate decreased. (Then the output rate was apprx. > > 40000 packets/s). > > > So, my question is: are these figures true, or is it possible to > > optimize the kernel somehow? The only changes I have made to the > > kernel config was to disable advanced routing. > > There are some flow control options in the kernel which should > help. From your description, it looks like they aren't enabled > by default ... CONFIG_NET_HW_FLOWCONTROL enables kernel code for it. But device drivers has to have support for it. But unfortunely very few drivers has support for it. Also we done experiments were we move the device RX processing to SoftIRQ rather than IRQ. With this RX is in better balance with other kernel tasks and TX. Under very high load and under DoS attacks the system is now manageable. It's in practical use already. > At the NordU/USENIX conference in Stockholm (this february) I > saw a nice presentation on the flow control code in the Linux > networking code and how it improved networking performance. > I'm pretty convinced that flow control _should_ be saving your > system in this case. Thanks Rik. This is work/experiments by Jamal and me with support from Gurus. :-) Jamal did this presentation at OLS 2000. At NordU/USENIX I gave an updated presentation of it. The presentation is not yet available form the usenix webb I think. It can ftp from robur.slu.se: /pub/Linux/tmp/FF-NordUSENIX.pdf or .ps In summary Linux is very decent router. Wire speed small packets @ 100 Mbps and capable of Gigabit routing (1440 pkts tested) we used. Also if people are interested we have done profiling on a Linux production router with full BGP at pretty loaded site. This to give us costs for route lookup, skb malloc/free, interrupts etc. http://Linux/net-development/experiments/010313 I'm on netdev but not the kernel list. Cheers. --ro From owner-netdev@oss.sgi.com Thu Mar 15 08:29:06 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 08:28:46 -0800 Received: from tux.rsn.hk-r.se ([194.47.143.135]:30092 "EHLO tux.rsn.bth.se") convert rfc822-to-8bit by oss.sgi.com with ESMTP id ; Thu, 15 Mar 2001 08:28:21 -0800 Received: from localhost (gandalf@localhost) by tux.rsn.bth.se (8.11.3/8.11.2/Debian 8.11.2-1) with ESMTP id f2FGKdU28092; Thu, 15 Mar 2001 17:20:39 +0100 Date: Thu, 15 Mar 2001 17:20:18 +0100 (CET) From: Martin Josefsson X-Sender: gandalf@tux.rsn.bth.se To: Rik van Riel cc: =?ISO-8859-1?Q?M=E5rten_Wikstr=F6m?= , "'linux-kernel@vger.kernel.org'" , netdev@oss.sgi.com Subject: Re: How to optimize routing performance In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 15 Mar 2001, Rik van Riel wrote: > On Thu, 15 Mar 2001, [ISO-8859-1] Mๅrten Wikstr๖m wrote: > > > I've performed a test on the routing capacity of a Linux 2.4.2 box > > versus a FreeBSD 4.2 box. I used two Pentium Pro 200Mhz computers with > > 64Mb memory, and two DEC 100Mbit ethernet cards. I used a Smartbits > > test-tool to measure the packet throughput and the packet size was set > > to 64 bytes. Linux dropped no packets up to about 27000 packets/s, but > > then it started to drop packets at higher rates. Worse yet, the output > > rate actually decreased, so at the input rate of 40000 packets/s > > almost no packets got through. The behaviour of FreeBSD was different, > > it showed a steadily increased output rate up to about 70000 packets/s > > before the output rate decreased. (Then the output rate was apprx. > > 40000 packets/s). > > > So, my question is: are these figures true, or is it possible to > > optimize the kernel somehow? The only changes I have made to the > > kernel config was to disable advanced routing. > > There are some flow control options in the kernel which should > help. From your description, it looks like they aren't enabled > by default ... You want to have CONFIG_NET_HW_FLOWCONTROL enabled. If you don't the kernel gets _alot_ of interrupts from the NIC and dosn't have any cycles left to do anything. So you want to turn this on! > At the NordU/USENIX conference in Stockholm (this february) I > saw a nice presentation on the flow control code in the Linux > networking code and how it improved networking performance. > I'm pretty convinced that flow control _should_ be saving your > system in this case. That was probably Jamal Hadi and Robert Olsson. They have been optimizing the tulip driver. These optimizations havn't been integrated with the "vanilla" driver yet, but I hope the can integrate it soon. They have one version that is very optimized and then they have one version that have even more optimizations, ie. it uses polling at high interruptload. you will find these drivers here: ftp://robur.slu.se/pub/Linux/net-development/ The latest versions are: tulip-ss010111.tar.gz and tulip-ss010116-poll.tar.gz > OTOH, if they _are_ enabled, the networking people seem to have > a new item for their TODO list. ;) Yup. You can take a look here too: http://robur.slu.se/Linux/net-development/jamal/FF-html/ This is the presentation they gave at OLS (IIRC) And this is the final result: http://robur.slu.se/Linux/net-development/jamal/FF-html/img26.htm As you can see the throughput is a _lot_ higher with this driver. One final note: The makefile in at least tulip-ss010111.tar.gz is in the old format (not the new as 2.4.0-testX introduced), but you can copy the makefile from the "vanilla" driver and It'lll work like a charm. Please redo your tests with this driver and report the results to me and this list. I really want to know how it compares against FreeBSD. /Martin From owner-netdev@oss.sgi.com Thu Mar 15 09:25:56 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 09:25:47 -0800 Received: from perninha.conectiva.com.br ([200.250.58.156]:58642 "HELO postfix.conectiva.com.br") by oss.sgi.com with SMTP id ; Thu, 15 Mar 2001 09:25:29 -0800 Received: from burns.conectiva (burns.conectiva [10.0.0.4]) by postfix.conectiva.com.br (Postfix) with SMTP id DC8DC16B16 for ; Thu, 15 Mar 2001 14:25:08 -0300 (EST) Received: (qmail 27519 invoked by uid 0); 15 Mar 2001 17:24:31 -0000 Received: from duckman.distro.conectiva (HELO brutus.conectiva.com.br) (10.0.17.2) by burns.conectiva with SMTP; 15 Mar 2001 17:24:31 -0000 Received: from localhost (riel@localhost) by brutus.conectiva.com.br (8.11.2/8.11.2) with ESMTP id f2G0cDK03006; Thu, 15 Mar 2001 21:38:15 -0300 X-Authentication-Warning: duckman.distro.conectiva: riel owned process doing -bs Date: Thu, 15 Mar 2001 21:38:13 -0300 (BRST) From: Rik van Riel X-X-Sender: To: Robert Olsson Cc: =?ISO-8859-1?Q?M=E5rten=5FWikstr=F6m?= , "'linux-kernel@vger.kernel.org'" , Subject: Re: How to optimize routing performance In-Reply-To: <15024.53099.41814.716733@robur.slu.se> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 15 Mar 2001, Robert Olsson wrote: > CONFIG_NET_HW_FLOWCONTROL enables kernel code for it. But device > drivers has to have support for it. But unfortunely very few drivers > has support for it. Isn't it possible to put something like this in the layer just above the driver ? It probably won't work as well as putting it directly in the driver, but it'll at least keep Linux from collapsing under really heavy loads ... regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ From owner-netdev@oss.sgi.com Thu Mar 15 10:15:07 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 10:14:57 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:30213 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 15 Mar 2001 10:14:36 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA29476; Thu, 15 Mar 2001 21:14:03 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103151814.VAA29476@ms2.inr.ac.ru> Subject: Re: How to optimize routing performance To: riel@conectiva.COM.BR (Rik van Riel) Date: Thu, 15 Mar 2001 21:14:03 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Rik van Riel" at Mar 15, 1 08:45:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 455 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Isn't it possible to put something like this in the layer just > above the driver ? "Something like this" but above driver is the scheme which used in linux for ages and well known not only to be not working, but even responsible for livelocks. 8) You will have time to discuss this with Jamal, the first stage of the work is in 2.4. Polling is planned for 2.5. However all the measures to avoid livelocks require remake of drivers. Alexey From owner-netdev@oss.sgi.com Thu Mar 15 10:24:36 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 10:24:26 -0800 Received: from perninha.conectiva.com.br ([200.250.58.156]:49425 "HELO postfix.conectiva.com.br") by oss.sgi.com with SMTP id ; Thu, 15 Mar 2001 10:24:14 -0800 Received: from burns.conectiva (burns.conectiva [10.0.0.4]) by postfix.conectiva.com.br (Postfix) with SMTP id 2E1DD16B1A for ; Thu, 15 Mar 2001 15:24:07 -0300 (EST) Received: (qmail 14813 invoked by uid 0); 15 Mar 2001 18:23:29 -0000 Received: from duckman.distro.conectiva (HELO brutus.conectiva.com.br) (10.0.17.2) by burns.conectiva with SMTP; 15 Mar 2001 18:23:29 -0000 Received: from localhost (riel@localhost) by brutus.conectiva.com.br (8.11.2/8.11.2) with ESMTP id f2G1bGl04360; Thu, 15 Mar 2001 22:37:16 -0300 X-Authentication-Warning: duckman.distro.conectiva: riel owned process doing -bs Date: Thu, 15 Mar 2001 22:37:16 -0300 (BRST) From: Rik van Riel X-X-Sender: To: Cc: Subject: Re: How to optimize routing performance In-Reply-To: <200103151814.VAA29476@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 15 Mar 2001 kuznet@ms2.inr.ac.ru wrote: > > Isn't it possible to put something like this in the layer just > > above the driver ? > > "Something like this" but above driver is the scheme which used > in linux for ages and well known not only to be not working, > but even responsible for livelocks. 8) So basically Linux is still easily DoSable ? > You will have time to discuss this with Jamal, the first stage > of the work is in 2.4. Polling is planned for 2.5. > > However all the measures to avoid livelocks require remake of > drivers. Since not having these changes makes Linux die under heavy network traffic, I guess this is something we might want to fix during the 2.4 series, at least for the more often used drivers ... regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ From owner-netdev@oss.sgi.com Thu Mar 15 10:34:37 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 10:34:26 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:53253 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 15 Mar 2001 10:34:06 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA29873; Thu, 15 Mar 2001 21:33:48 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103151833.VAA29873@ms2.inr.ac.ru> Subject: Re: How to optimize routing performance To: riel@conectiva.com.br (Rik van Riel) Date: Thu, 15 Mar 2001 21:33:48 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: from "Rik van Riel" at Mar 15, 1 10:37:16 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 415 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > So basically Linux is still easily DoSable ? Like any OS attached to fast link with dumb logic. This problem is older than linux is. Any irq driven system is easily DoSable. In fact, giving intruder access to 100Mbit link you give him easy ability to flood you with irq each 7 usecs. 8)8) > fix during the 2.4 series, at least for the more often used > drivers ... This could be fixed in 2.1. Alexey From owner-netdev@oss.sgi.com Thu Mar 15 10:46:47 2001 Received: by oss.sgi.com id ; Thu, 15 Mar 2001 10:46:37 -0800 Received: from robur.slu.se ([130.238.98.12]:29201 "EHLO robur.slu.se") by oss.sgi.com with ESMTP id ; Thu, 15 Mar 2001 10:46:20 -0800 Received: (from robert@localhost) by robur.slu.se (8.8.7/8.8.7) id TAA30223; Thu, 15 Mar 2001 19:45:53 +0100 From: Robert Olsson MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15025.3553.176799.382488@robur.slu.se> Date: Thu, 15 Mar 2001 19:45:53 +0100 (CET) To: Rik van Riel Cc: Robert Olsson , =?ISO-8859-1?Q?M=E5rten=5FWikstr=F6m?= , "'linux-kernel@vger.kernel.org'" , Subject: Re: How to optimize routing performance In-Reply-To: References: <15024.53099.41814.716733@robur.slu.se> X-Mailer: VM 6.75 under Emacs 19.34.1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing [Sorry for the length] Rik van Riel writes: > On Thu, 15 Mar 2001, Robert Olsson wrote: > > > CONFIG_NET_HW_FLOWCONTROL enables kernel code for it. But device > > drivers has to have support for it. But unfortunely very few drivers > > has support for it. > > Isn't it possible to put something like this in the layer just > above the driver ? There is a dropping point in netif_rx. The problem is that knowledge of congestion has to be pushed back to the devices that is causing this. Alexey added netdev_dropping for drivers to check. And via netdev_wakeup() the drivers xon_metod can be called when the backlog below a certain threshold. So from here the driver has do the work. Not investing any resources and interrupts in packets we still have to drop. This what happens at very high load a kind of livelock. For routers routing protocols will time out and we loose conetivity. But I would say its important for all apps. In 2.4.0-test10 Jamal added sampling of the backlog queue so device drivers get the current congestion level. This opens new possiblities. > It probably won't work as well as putting it directly in the > driver, but it'll at least keep Linux from collapsing under > really heavy loads ... And we have done experiments with controlling interrupts and running the RX at "lower" priority. The idea is take RX-interrupt and immediately postponing the RX process to tasklet. The tasklet opens for new RX-ints. when its done. This way dropping now occurs outside the box since and dropping becomes very undramatically. As little example of this. I monitored a DoS attack on Linux router equipped with this RX-tasklet driver. Admin up 6 day(s) 13 hour(s) 37 min 54 sec Last input NOW Last output NOW 5min RX bit/s 22.4 M 5min TX bit/s 1.3 M 5min RX pkts/s 44079 <==== 5min TX pkts/s 877 5min TX errors 0 5min RX errors 0 5min RX dropped 49913 <==== Fb: no 3127894088 low 154133938 mod 6 high 0 drp 0 <==== Congestion levels Polling: ON starts/pkts/tasklet_count 96545881/2768574948/1850259980 HW_flowcontrol xon's 0 A bit of explanation. Above is output from tulip driver. We are forwarding 44079 and we are dropping 49913 packets per second! This box has full BGP. The DoS attack was going on for about 30 minutes BGP survived and the box was manageable. Under a heavy attack it still performs well. Cheers. --ro From owner-netdev@oss.sgi.com Thu Mar 15 23:22:20 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2G7MKg28934 for netdev-outgoing; Thu, 15 Mar 2001 23:22:20 -0800 Received: from list.framfab.se (IDENT:root@list.framfab.se [195.54.96.202]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2G7MIM28931 for ; Thu, 15 Mar 2001 23:22:19 -0800 Received: from stoent001.framfab.se (mail.sto.framfab.se [172.16.200.241]) by list.framfab.se (8.9.3/8.9.3) with ESMTP id JAA06629; Fri, 16 Mar 2001 09:14:59 +0100 Received: by STOENT001 with Internet Mail Service (5.5.2653.19) id ; Fri, 16 Mar 2001 08:21:09 +0100 Message-ID: From: =?iso-8859-1?Q?M=E5rten_Wikstr=F6m?= To: "'Martin Josefsson'" Cc: Rik van Riel , "'linux-kernel@vger.kernel.org'" , netdev@oss.sgi.com Subject: RE: How to optimize routing performance Date: Fri, 16 Mar 2001 08:21:08 +0100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f2G7MKM28932 Sender: owner-netdev@oss.sgi.com Precedence: bulk > > You want to have CONFIG_NET_HW_FLOWCONTROL enabled. If you don't the > kernel gets _alot_ of interrupts from the NIC and dosn't have > any cycles > left to do anything. So you want to turn this on! > > > At the NordU/USENIX conference in Stockholm (this february) I > > saw a nice presentation on the flow control code in the Linux > > networking code and how it improved networking performance. > > I'm pretty convinced that flow control _should_ be saving your > > system in this case. > > That was probably Jamal Hadi and Robert Olsson. They have > been optimizing > the tulip driver. These optimizations havn't been integrated with the > "vanilla" driver yet, but I hope the can integrate it soon. > > They have one version that is very optimized and then they have one > version that have even more optimizations, ie. it uses polling at high > interruptload. > > you will find these drivers here: > ftp://robur.slu.se/pub/Linux/net-development/ > The latest versions are: > tulip-ss010111.tar.gz > and > tulip-ss010116-poll.tar.gz > > > OTOH, if they _are_ enabled, the networking people seem to have > > a new item for their TODO list. ;) > > Yup. > > You can take a look here too: > > http://robur.slu.se/Linux/net-development/jamal/FF-html/ > > This is the presentation they gave at OLS (IIRC) > > And this is the final result: > > http://robur.slu.se/Linux/net-development/jamal/FF-html/img26.htm > > As you can see the throughput is a _lot_ higher with this driver. > > One final note: The makefile in at least > tulip-ss010111.tar.gz is in the > old format (not the new as 2.4.0-testX introduced), but you > can copy the > makefile from the "vanilla" driver and It'lll work like a charm. > > Please redo your tests with this driver and report the > results to me and > this list. I really want to know how it compares against FreeBSD. > > /Martin Thanks! I'll try that out. How can I tell if the driver supports CONFIG_NET_HW_FLOWCONTROL? I'm not sure, but I think the cards are tulip-based, can I then use Robert & Jamal's optimised drivers? It'll probably take some time before I can do further testing. (My employer thinks I've spent too much time on it already...). FYI, Linux had _much_ better delay variation characteristics than FreeBSD. Typically no packet was delayed more than 100usec, whereas FreeBSD had some packets delayed about 2-3 msec. /Mๅrten From owner-netdev@oss.sgi.com Fri Mar 16 00:09:41 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2G89fh30074 for netdev-outgoing; Fri, 16 Mar 2001 00:09:41 -0800 Received: from tux.rsn.bth.se (root@tux.rsn.hk-r.se [194.47.143.135]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2G89eM30071 for ; Fri, 16 Mar 2001 00:09:40 -0800 Received: from localhost (gandalf@localhost) by tux.rsn.bth.se (8.11.3/8.11.2/Debian 8.11.2-1) with ESMTP id f2G887F32108; Fri, 16 Mar 2001 09:08:07 +0100 Date: Fri, 16 Mar 2001 09:08:06 +0100 (CET) From: Martin Josefsson X-Sender: gandalf@tux.rsn.bth.se To: =?iso-8859-1?Q?M=E5rten_Wikstr=F6m?= cc: Rik van Riel , "'linux-kernel@vger.kernel.org'" , netdev@oss.sgi.com Subject: RE: How to optimize routing performance In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by oss.sgi.com id f2G89fM30072 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 16 Mar 2001, Mๅrten Wikstr๖m wrote: [much text] > Thanks! I'll try that out. How can I tell if the driver supports > CONFIG_NET_HW_FLOWCONTROL? I'm not sure, but I think the cards are > tulip-based, can I then use Robert & Jamal's optimised drivers? > It'll probably take some time before I can do further testing. (My employer > thinks I've spent too much time on it already...). I don't really know how to tell except 'grep CONFIG_NET_HW_FLOWCONTROL driverfiles' You said that the cards where 100Mbit DEC cards, I assumed that by that you meant that the cards use DECchip 21143 or similar chips. If that's true you can use Robert & Jamal's optimised drivers. Sorry to hear that your employer doesn't see the importance in such a test :) > FYI, Linux had _much_ better delay variation characteristics than FreeBSD. > Typically no packet was delayed more than 100usec, whereas FreeBSD had some > packets delayed about 2-3 msec. This sounds promising. So Linux had nice variations until it broke down completely and stopped routing because of all the interrupts. I can almost guarantee that with the optimised driver and CONFIG_NET_HW_FLOWCONTROL you'll see a _big_ improvement in routingperformance. /Martin From owner-netdev@oss.sgi.com Fri Mar 16 06:18:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2GEIak05823 for netdev-outgoing; Fri, 16 Mar 2001 06:18:36 -0800 Received: from smtp1.arnet.com.ar (host000012.arnet.net.ar [200.45.0.12] (may be forged)) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2GEIYM05820 for ; Fri, 16 Mar 2001 06:18:35 -0800 Received: (qmail 7473 invoked from network); 16 Mar 2001 14:18:22 -0000 Received: AntiBombing Version 0.08 by GCM Received: ThePolice Version 0.02 by GCM Received: from host000005.arnet.net.ar (HELO mail2.arnet.com.ar) (200.45.0.5) by host000012.arnet.net.ar with SMTP; 16 Mar 2001 14:18:22 -0000 Received: from mail pickup service by mail2.arnet.com.ar with Microsoft SMTPSVC; Fri, 16 Mar 2001 11:18:07 -0300 Received: from recife.arnet.com.ar ([200.45.0.70]) by mail2.arnet.com.ar with Microsoft SMTPSVC(5.5.1877.357.35); Tue, 13 Mar 2001 16:29:52 -0300 Received: (qmail 8551 invoked from network); 13 Mar 2001 18:12:59 -0000 Received: from oss.sgi.com (216.32.174.190) by recife.arnet.com.ar with SMTP; 13 Mar 2001 18:12:59 -0000 Received: by oss.sgi.com id ; Tue, 13 Mar 2001 10:17:11 -0800 Received: from minus.inr.ac.ru ([193.233.7.97]:60434 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 13 Mar 2001 10:17:01 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA29024; Tue, 13 Mar 2001 21:16:42 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103131816.VAA29024@ms2.inr.ac.ru> Subject: Re: TCP sockets not flagged as writable? To: mostrows@speakeasy.net Date: Tue, 13 Mar 2001 21:16:42 +0300 (MSK) Cc: netdev@oss.sgi.com In-Reply-To: <15022.16934.52573.377722@slug.watson.ibm.com> from "Michal Ostrowski" at Mar 13, 1 07:15:02 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 X-Orcpt: rfc822;netdev@oss.sgi.com Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 416 Lines: 14 Hello! > Is this wrong or is this just my imagination? Not this and not this, the third alternative is "simply right". 8) Wakeup happens as soon as there is _enough_ space to wake up. 1 byte is not an enough. What's about exact form of wakeup predicate, criterium can be different. Current one (1/3 of space is free) was selected in 2.2 after several other variants were tried. It is not very essential. Alexey From owner-netdev@oss.sgi.com Sat Mar 17 12:05:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2HK5wN31658 for netdev-outgoing; Sat, 17 Mar 2001 12:05:58 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2HK5tM31650 for ; Sat, 17 Mar 2001 12:05:56 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA03892; Sat, 17 Mar 2001 23:05:34 +0300 From: kuznet@ms2.inr.ac.ru Message-Id: <200103172005.XAA03892@ms2.inr.ac.ru> Subject: Re: initial acenic ZC cleanup To: jes@linuxcare.COM (Jes Sorensen) Date: Sat, 17 Mar 2001 23:05:34 +0300 (MSK) Cc: netdev@oss.sgi.com, davem@redhat.com (Dave Miller) In-Reply-To: from "Jes Sorensen" at Mar 9, 1 05:45:00 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 428 Lines: 15 Hello! > A little, it needs to be taught about the tuning parameters for the > AceNIC. Actually, rx and tx timeouts should be generic enough to exist in common structure. BTW by some strange reason ETHTOOL_SSET does not allow to change these parameters. All of them except for TxBufRat are changable at run time. Jes, please, invent something to read verbose acenic statistics. It is too useful thing to be lost. 8) Alexey From owner-netdev@oss.sgi.com Sun Mar 18 03:30:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2IBU8l10315 for netdev-outgoing; Sun, 18 Mar 2001 03:30:08 -0800 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2IBU6M10312 for ; Sun, 18 Mar 2001 03:30:06 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id C92021F70; Sun, 18 Mar 2001 06:29:53 -0500 (EST) Message-ID: <3AB49C2E.4792071B@mandrakesoft.com> Date: Sun, 18 Mar 2001 06:29:50 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: Junfeng Yang Cc: linux-kernel@vger.kernel.org, mc@cs.stanford.edu, Andrew Morton , netdev@oss.sgi.com Subject: Re: [CHECKER] 120 potential dereference to invalid pointers errors forlinux 2.4.1 References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 8191 Lines: 192 Junfeng Yang wrote: > [BUG] dev_alloc_skb can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/3c505.c:619:receive_packet: ERROR:NULL:598:619: Using NULL ptr "skb" illegally! set by 'dev_alloc_skb':598 Fixed. > [BUG] init_etherdev could return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/3c515.c:604:corkscrew_found_device: ERROR:NULL:603:604: Using unknown ptr "dev" illegally! set by 'init_etherdev':603 > > Start ---> > dev = init_etherdev(dev, sizeof(struct corkscrew_private)); > Error ---> > dev->base_addr = ioaddr; > dev->irq = irq; init_etherdev is a special case -- It can conditionally take NULL as its first argument. If that is the case, when an allocation is performed, and the return val needed to be checked for NULL. If init_etherdev's first arg is guaranteed to be non-NULL, you do not need to check its return value. 3c515 is one such case. > [BUG] init_etherdev can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/aironet4500_card.c:537:awc4500_isa_probe: ERROR:NULL:535:537: Using unknown ptr "dev" illegally! set by 'init_etherdev':535 Fixed. > [BUG] > /u2/acc/oses/linux/2.4.1/drivers/net/aironet4500_card.c:375:awc4500_pnp_probe: ERROR:NULL:373:375: Using unknown ptr "dev" illegally! set by 'init_etherdev':373 Fixed. > [BUG] dev_alloc_skb can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/defxx.c:2719:dfx_rcv_init: ERROR:NULL:2712:2719: Using unknown ptr "newskb" illegally! set by 'dev_alloc_skb':2712 Seems to be fixed already in my 2.4.3-pre4-based tree. > [BUG] kmalloc can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/dgrs.c:1258:dgrs_found_device: ERROR:NULL:1256:1258: Using unknown ptr "dev" illegally! set by 'kmalloc':1256 Seems to be fixed already in my 2.4.3-pre4-based tree. > [BUG] kmalloc can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/dgrs.c:1297:dgrs_found_device: ERROR:NULL:1294:1297: Using unknown ptr "devN" illegally! set by 'kmalloc':1294 Seems to be fixed already in my 2.4.3-pre4-based tree. > [BUG] kmalloc can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/pcmcia/aironet4500_cs.c:181:awc_attach: ERROR:NULL:179:181: Using unknown ptr "link" illegally! set by 'kmalloc':179 > > Start ---> > link = kmalloc(sizeof(struct dev_link_t), GFP_KERNEL); > memset(link, 0, sizeof(struct dev_link_t)); > Error ---> > link->dev = kmalloc(sizeof(struct dev_node_t), GFP_KERNEL); > memset(link->dev, 0, sizeof(struct dev_node_t)); Fixed. Your checker missed two other problems of the same sort in the same function... one of the two missed is the link->dev kmalloc you show in your example here. > [BUG] kmalloc can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/pcmcia/wavelan_cs.c:4463:wavelan_attach: ERROR:NULL:4458:4463: Using unknown ptr "dev" illegally! set by 'kmalloc':4458 Seems to be fixed already in my 2.4.3-pre4-based tree. > [BUG] kmalloc can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/pcmcia/wavelan_cs.c:4430:wavelan_attach: ERROR:NULL:4426:4430: Using unknown ptr "link" illegally! set by 'kmalloc':4426 Seems to be fixed already in my 2.4.3-pre4-based tree. > [BUG] dev could be NULL, then init_etherdev -> init_netdev will alloc a new device -- it could fail. > /u2/acc/oses/linux/2.4.1/drivers/net/pcmcia/xircom_tulip_cb.c:559:tulip_probe1: ERROR:NULL:522:559: Using unknown ptr "dev" illegally! set by 'init_etherdev':522 Fixed, although this driver is going away when Arjan's Xircom driver matures. > [BUG] init_etherdev > /u2/acc/oses/linux/2.4.1/drivers/net/pcmcia/xircom_tulip_cb.c:577:tulip_probe1: ERROR:NULL:522:577: Using unknown ptr "dev" illegally! set by 'init_etherdev':522 > [BUG] init_etherdev > /u2/acc/oses/linux/2.4.1/drivers/net/pcmcia/xircom_tulip_cb.c:607:tulip_probe1: ERROR:NULL:522:607: Using unknown ptr "dev" illegally! set by 'init_etherdev':522 > [BUG] init_etherdev > /u2/acc/oses/linux/2.4.1/drivers/net/pcmcia/xircom_tulip_cb.c:636:tulip_probe1: ERROR:NULL:522:636: Using unknown ptr "dev" illegally! set by 'init_etherdev':522 > [BUG] init_etherdev > /u2/acc/oses/linux/2.4.1/drivers/net/pcmcia/xircom_tulip_cb.c:642:tulip_probe1: ERROR:NULL:522:642: Using unknown ptr "dev" illegally! set by 'init_etherdev':522 Fixed by the above fix. Is this a checker bug... or does the checker spit out each incorrect de-ref? > [BUG] function doesn't exit if skb == NULL. just printk > /u2/acc/oses/linux/2.4.1/drivers/net/smc9194.c:1356:smc_rcv: ERROR:NULL:1341:1356: Using NULL ptr "skb" illegally! set by 'dev_alloc_skb':1341 Seems to be fixed already in my 2.4.3-pre4-based tree. > [BUG] init_etherdev can return NULL if dev is NULL > /u2/acc/oses/linux/2.4.1/drivers/net/sunhme.c:2838:happy_meal_pci_init: ERROR:NULL:2806:2838: Using unknown ptr "dev" illegally! set by 'init_etherdev':2806 Fixed. > [BUG] dev could be NULL, then init_trdev will call init_netdev to allocate a new device. > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/ibmtr.c:405:ibmtr_probe1: ERROR:NULL:304:405: Using unknown ptr "dev" illegally! set by 'init_trdev':304 > > Start ---> > dev = init_trdev(dev,0); As with 3c515, this is a false positive. 'dev' is never NULL when passed to init_trdev, so the call always succeeds. > [BUG] dev_alloc_skb can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/lanstreamer.c:1429:streamer_arb_cmd: ERROR:NULL:1386:1429: Using unknown ptr "mac_frame" illegally! set by 'dev_alloc_skb':1386 Seems to be fixed already in my 2.4.3-pre4 tree. > [BUG] dev_alloc_skb can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/olympic.c:1276:olympic_arb_cmd: ERROR:NULL:1258:1276: Using unknown ptr "mac_frame" illegally! set by 'dev_alloc_skb':1258 Fixed. > [BUG] init_trdev can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/olympic.c:219:olympic_scan: ERROR:NULL:217:219: Using unknown ptr "dev" illegally! set by 'init_trdev':217 Fixed. > [BUG] kmalloc can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/olympic.c:226:olympic_scan: ERROR:NULL:212:226: Using unknown ptr "olympic_priv" illegally! set by 'kmalloc':212 Seems to be fixed already in my 2.4.3-pre4 tree. > [BUG] dev_alloc_skb can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/smctr.c:3956:smctr_process_rx_packet: ERROR:NULL:3955:3956: Using unknown ptr "skb" illegally! set by 'dev_alloc_skb':3955 Seems to be fixed already in my 2.4.3-pre4 tree. > [BUG] dev_alloc_skb can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/smctr.c:4633:smctr_rx_frame: ERROR:NULL:4630:4633: Using unknown ptr "skb" illegally! set by 'dev_alloc_skb':4630 Fixed. > [BUG] dev_alloc_skb can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/tms380tr.c:2167:tms380tr_rcv_status_irq: ERROR:NULL:2149:2167: Using NULL ptr "skb" illegally! set by 'dev_alloc_skb':2149 Seems to be fixed already in my 2.4.3-pre4 tree. > [BUG] dev_alloc_skb can return NULL > /u2/acc/oses/linux/2.4.1/drivers/net/tokenring/tms380tr.c:2172:tms380tr_rcv_status_irq: ERROR:NULL:2149:2172: Using NULL ptr "skb" illegally! set by 'dev_alloc_skb':2149 Fixed. > [BUG] kmalloc can return NULL > /u2/acc/oses/linux/2.4.1/drivers/pci/setup-res.c:166:pdev_sort_resources: ERROR:NULL:165:166: Using unknown ptr "tmp" illegally! set by 'kmalloc':165 > > Start ---> > tmp = kmalloc(sizeof(*tmp), GFP_KERNEL); > Error ---> > tmp->next = ln; > tmp->res = r; > --------------------------------------------------------- > [BUG] kmalloc can return NULL > /u2/acc/oses/linux/2.4.1/drivers/pcmcia/bulkmem.c:231:setup_erase_request: ERROR:NULL:230:231: Using unknown ptr "busy" illegally! set by 'kmalloc':230 > > Start ---> > busy = kmalloc(sizeof(erase_busy_t), GFP_KERNEL); > Error ---> This sizeof() construct may be a special case for your checker, but it's a common one for the kernel... It definitely doesn't de-reference a pointer. -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full mooon on a dark night, MandrakeSoft | and a smooth road all the way to your door. From owner-netdev@oss.sgi.com Sun Mar 18 04:16:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2ICGPi11324 for netdev-outgoing; Sun, 18 Mar 2001 04:16:25 -0800 Received: from mail.ocs.com.au (ppp0.ocs.com.au [203.34.97.3]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2ICGNM11321 for ; Sun, 18 Mar 2001 04:16:23 -0800 Received: (qmail 10460 invoked from network); 18 Mar 2001 12:16:18 -0000 Received: from ocs3.ocs-net (192.168.255.3) by mail.ocs.com.au with SMTP; 18 Mar 2001 12:16:18 -0000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Jeff Garzik cc: Junfeng Yang , linux-kernel@vger.kernel.org, mc@cs.stanford.edu, Andrew Morton , netdev@oss.sgi.com Subject: Re: [CHECKER] 120 potential dereference to invalid pointers errors forlinux 2.4.1 In-reply-to: Your message of "Sun, 18 Mar 2001 06:29:50 CDT." <3AB49C2E.4792071B@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 18 Mar 2001 23:16:16 +1100 Message-ID: <24784.984917776@ocs3.ocs-net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1100 Lines: 30 On Sun, 18 Mar 2001 06:29:50 -0500, Jeff Garzik wrote: >Junfeng Yang wrote: >> Start ---> >> busy = kmalloc(sizeof(erase_busy_t), GFP_KERNEL); >> Error ---> > >This sizeof() construct may be a special case for your checker, but it's >a common one for the kernel... It definitely doesn't de-reference a >pointer. IMHO the above line is a bad construct. If the type of the variable changes it is extremely easy to miss the fact that *alloc is now returning the wrong size. I always do busy = kmalloc(sizeof(*busy), GFP_KERNEL); and let the compiler insert the correct type. For the checker, you can also have typeof(). kdb has this line typeof (*ef) local_ef; The type definition of ef is kdb_eframe_t which is "pointer to some arch dependent type" and local_ef is in arch independent code, much easier to do this than use multiple #ifdef. Of course it would have been even easier if kdb had separate types for the struct and the pointer to the struct, then I would not need typeof(). OTOH I am sure that somebody will find a use for typeof(). From owner-netdev@oss.sgi.com Mon Mar 19 09:54:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2JHsSo04311 for netdev-outgoing; Mon, 19 Mar 2001 09:54:28 -0800 Received: from webber.adilger.int (h24-65-193-28.cg.shawcable.net [24.65.193.28]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2JHsSM04306 for ; Mon, 19 Mar 2001 09:54:28 -0800 Received: (from adilger@localhost) by webber.adilger.int (8.11.2/8.11.1/Debian 8.11.0-6) id f2JGOmo16559; Mon, 19 Mar 2001 09:24:48 -0700 From: Andreas Dilger Message-Id: <200103191624.f2JGOmo16559@webber.adilger.int> Subject: Re: [CHECKER] 120 potential dereference to invalid pointers errors forlinux 2.4.1 In-Reply-To: <3AB49C2E.4792071B@mandrakesoft.com> from Jeff Garzik at "Mar 18, 2001 06:29:50 am" To: Jeff Garzik Date: Mon, 19 Mar 2001 09:24:48 -0700 (MST) CC: Junfeng Yang , linux-kernel@vger.kernel.org, mc@cs.stanford.edu, Andrew Morton , netdev@oss.sgi.com, dahinds@users.sourceforge.net X-Mailer: ELM [version 2.4ME+ PL66 (25)] Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1996 Lines: 52 Jeff Garzic writes: > > [BUG] init_etherdev could return NULL > > /u2/acc/oses/linux/2.4.1/drivers/net/3c515.c:604:corkscrew_found_device: ERROR:NULL:603:604: Using unknown ptr "dev" illegally! set by 'init_etherdev':603 > > > > Start ---> > > dev = init_etherdev(dev, sizeof(struct corkscrew_private)); > > Error ---> > > dev->base_addr = ioaddr; > > dev->irq = irq; > > init_etherdev is a special case -- It can conditionally take NULL as its > first argument. If that is the case, when an allocation is performed, > and the return val needed to be checked for NULL. If init_etherdev's > first arg is guaranteed to be non-NULL, you do not need to check its > return value. 3c515 is one such case. If this is the case, why not change it to look like: init_etherdev(dev, sizeof(struct corkscrew_private)); so it doesn't appear that you are setting "dev" again? > > dev = init_trdev(dev,0); Ditto, don't make it look like "dev" is getting set on the return value, when it is already set when calling the function. > > /u2/acc/oses/linux/2.4.1/drivers/pcmcia/bulkmem.c:231:setup_erase_request: ERROR:NULL:230:231: Using unknown ptr "busy" illegally! set by 'kmalloc':230 > > > > Start ---> > > busy = kmalloc(sizeof(erase_busy_t), GFP_KERNEL); > > Error ---> > > This sizeof() construct may be a special case for your checker, but it's > a common one for the kernel... It definitely doesn't de-reference a > pointer. It is the "busy" pointer that appears to be dereferenced, not the sizeof. We need something like (ERASE_BAD_KMALLOC doesn't yet exist): else if ((busy = kmalloc(sizeof(erase_busy_t), GFP_KERNEL)) == NULL) erase->State = ERASE_BAD_KMALLOC; else { erase->State = 1; ... } Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert From owner-netdev@oss.sgi.com Mon Mar 19 13:53:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2JLr6n10371 for netdev-outgoing; Mon, 19 Mar 2001 13:53:06 -0800 Received: from ikar.t17.ds.pwr.wroc.pl (postfix@ikar.t17.ds.pwr.wroc.pl [156.17.210.253]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2JLr5M10368 for ; Mon, 19 Mar 2001 13:53:05 -0800 Received: by ikar.t17.ds.pwr.wroc.pl (Postfix+IPv6, from userid 1002) id 894DBC8357; Mon, 19 Mar 2001 22:52:30 +0100 (CET) Date: Mon, 19 Mar 2001 22:52:30 +0100 From: Arkadiusz Miskiewicz To: netdev@oss.sgi.com Subject: 8139too driver Message-ID: <20010319225230.A1507@ikar.t17.ds.pwr.wroc.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i X-URL: http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ X-Operating-System: Linux dark 4.0.20 #119 Tue Jan 16 12:21:53 MET 2001 i986 pld Organization: Polish(ed) Linux Distribution Team Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3221 Lines: 69 Hi, I have trouble with 8139too driver on i686/FIC KL6011 (LX) mainboard and 00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10) In 2.4.2 kernel when MAC address was set via ip link set DEV address xyz then ethernet card wasn't working at all! Setting promisc mode caused it to receive packets and it was working until promisc was turned off. In 2.4.3-pre4 8139too driver was working fine even after changing MAC address. Unfortunately now it locks quite often and it's only enabled by netwatchdog: Mar 19 22:28:32 arm kernel: NETDEV WATCHDOG: eth0: transmit timed out Mar 19 22:28:32 arm kernel: eth0: Setting half-duplex based on auto-negotiated partner ability 0000.Mar 19 22:29:02 arm kernel: NETDEV WATCHDOG: eth0: transmit timed out Mar 19 22:29:02 arm kernel: eth0: Setting half-duplex based on auto-negotiated partner ability 0000.Mar 19 22:29:38 arm kernel: NETDEV WATCHDOG: eth0: transmit timed out alias eth0 8139too options 8139too media=0x0 full_duplex=0x0 (10baseT/half duplex) Any ideas what's going on? Also why it auto-negotiates partner ability while it has specified full_duplex=0x0 (half duplex)? [root@arm misiek]# rtl8139-diag -mmmaaavvveef rtl8139-diag.c:v2.01 1/8/2001 Donald Becker (becker@scyld.com) http://www.scyld.com/diag/index.html Index #1: Found a RealTek RTL8139 adapter at 0xe400. RealTek chip registers at 0xe400 0x000: 0b526000 000081ff a0000000 80004000 802ca5ea 802ca5ea 802ca5a2 802ca5ea 0x020: 00d18000 00d18600 00d18c00 00d19200 00560000 0d0a0000 c7e8c7d8 0000c07f 0x040: 78000600 0000d68e 24f2c16f 00000000 002f1000 00000000 0080c108 00100000 0x060: 0000f00f 05e1780d 00000000 00000000 00000000 000f7400 58fab388 a438d843. No interrupt sources are pending. The chip configuration is 0x10 0x2f, MII half-duplex mode. EEPROM size test returned 6, 0x204a4 / 0x2. Parsing the EEPROM of a RealTek chip: PCI IDs -- Vendor 0x10ec, Device 0x8139, Subsystem 0x10ec. PCI timer settings -- minimum grant 32, maximum latency 64. General purpose pins -- direction 0xe1 value 0x10. Station Address 00:60:52:0B:FF:81. Configuration register 0/1 -- 0x8d / 0xc2. EEPROM active region checksum is 09fa. EEPROM contents: 8129 10ec 8139 10ec 8139 4020 e110 6000 0b52 81ff 8d10 f7c2 8001 b388 58fa 0708 d843 a438 d843 a438 d843 a438 d843 a438 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 The word-wide EEPROM checksum is 0xbe3d. [root@arm misiek]# btw. I also have big number of collisions but this shouldn't cause eth lockup. [root@arm misiek]# ip -s link show eth0 33: eth0: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:60:52:0b:ff:81 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 2614099699 2931077 437 0 0 0 TX: bytes packets errors dropped carrier collsns 2926863755 2307203 179 109 4 1993408 CONFIG_X86_GOOD_APIC=y -- Arkadiusz Miถkiewicz, AM2-6BONE [ PLD GNU/Linux IPv6 ] http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ [ enabled ] From owner-netdev@oss.sgi.com Tue Mar 20 08:57:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2KGvnm30635 for netdev-outgoing; Tue, 20 Mar 2001 08:57:49 -0800 Received: from ikar.t17.ds.pwr.wroc.pl (postfix@ikar.t17.ds.pwr.wroc.pl [156.17.210.253]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2KGvlM30632 for ; Tue, 20 Mar 2001 08:57:47 -0800 Received: by ikar.t17.ds.pwr.wroc.pl (Postfix+IPv6, from userid 1002) id BA158C8002; Tue, 20 Mar 2001 17:57:30 +0100 (CET) Date: Tue, 20 Mar 2001 17:57:30 +0100 From: Arkadiusz Miskiewicz To: netdev@oss.sgi.com Cc: Jeff Garzik Subject: Re: 8139too driver Message-ID: <20010320175730.A504@ikar.t17.ds.pwr.wroc.pl> References: <20010319225230.A1507@ikar.t17.ds.pwr.wroc.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: <20010319225230.A1507@ikar.t17.ds.pwr.wroc.pl>; from misiek@pld.org.pl on Mon, Mar 19, 2001 at 10:52:30PM +0100 X-URL: http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ X-Operating-System: Linux dark 4.0.20 #119 Tue Jan 16 12:21:53 MET 2001 i986 pld Organization: Polish(ed) Linux Distribution Team Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2065 Lines: 36 On/Dnia Mon, Mar 19, 2001 at 10:52:30PM +0100, Arkadiusz Miskiewicz wrote/napisaณ(a) > 00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10) > > Mar 19 22:29:02 arm kernel: eth0: Setting half-duplex based on auto-negotiated partner ability 0000.Mar 19 22:29:38 arm kernel: NETDEV WATCHDOG: eth0: transmit timed out > > (10baseT/half duplex) Here is more: Mar 19 16:33:54 arm kernel: eth1: bogus packet size: 65504, status=0x0 nxpg=0x0. Mar 19 16:33:54 arm kernel: eth1: bogus packet size: 65504, status=0x0 nxpg=0x0. Mar 19 16:33:54 arm kernel: eth1: bogus packet size: 65504, status=0x0 nxpg=0x0. Mar 19 16:33:54 arm kernel: eth1: bogus packet size: 65504, status=0x0 nxpg=0x0. Mar 19 16:33:54 arm kernel: eth1: bogus packet size: 65504, status=0x0 nxpg=0x0. ... Mar 19 17:55:46 arm kernel: eth0: Tx queue start entry 1077660 dirty entry 1077656. Mar 19 17:55:46 arm kernel: eth0: Tx descriptor 0 is 00002000. (queue head) Mar 19 17:55:46 arm kernel: eth0: Tx descriptor 1 is 00002000. Mar 19 17:55:46 arm kernel: eth0: Tx descriptor 2 is 00002000. Mar 19 17:55:46 arm kernel: eth0: Tx descriptor 3 is 00002000. Mar 19 17:58:58 arm kernel: eth0: Tx queue start entry 47845 dirty entry 47841. Mar 19 17:58:58 arm kernel: eth0: Tx descriptor 0 is 00002000. Mar 19 17:58:58 arm kernel: eth0: Tx descriptor 1 is 00002000. (queue head) Mar 19 17:58:58 arm kernel: eth0: Tx descriptor 2 is 00002000. Mar 19 17:58:58 arm kernel: eth0: Tx descriptor 3 is 00002000. Mar 19 17:59:16 arm kernel: eth0: Tx queue start entry 1665 dirty entry 1661. Mar 19 17:59:16 arm kernel: eth0: Tx descriptor 0 is 00002000. Mar 19 17:59:16 arm kernel: eth0: Tx descriptor 1 is 00002000. (queue head) Mar 19 17:59:16 arm kernel: eth0: Tx descriptor 2 is 00002000. Mar 19 17:59:16 arm kernel: eth0: Tx descriptor 3 is 00002000. ... and log are growing fast. Unfortunately 8139too driver is at this moment almost unusable. -- Arkadiusz Miถkiewicz, AM2-6BONE [ PLD GNU/Linux IPv6 ] http://www.t17.ds.pwr.wroc.pl/~misiek/ipv6/ [ enabled ] From owner-netdev@oss.sgi.com Wed Mar 21 04:07:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2LC7Sq23740 for netdev-outgoing; Wed, 21 Mar 2001 04:07:28 -0800 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2LC7RM23736 for ; Wed, 21 Mar 2001 04:07:27 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 2ED2A1F6A; Wed, 21 Mar 2001 07:07:25 -0500 (EST) Message-ID: <3AB8997D.A715CD35@mandrakesoft.com> Date: Wed, 21 Mar 2001 07:07:25 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jes Sorensen Cc: netdev@oss.sgi.com Subject: ethtool and MII (was Re: initial acenic ZC cleanup) References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> <15015.65092.349145.143015@pizda.ninka.net> <3AA80487.3C7E26A6@mandrakesoft.com> <3AAAEEC8.9375ED6@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2466 Lines: 53 Jes Sorensen wrote: > >>>>> "Jeff" == Jeff Garzik writes: > Jeff> If they are NIC-specific, then you should just use ioctls... > Jeff> It's a silly redirection to add NIC-specific stuff to the > Jeff> ethtool ioctl, when you could just add a private ioctl. > And another tool to access it so we end up with ethtool and acetool > and sktool and hmetool and 3c905tool etc etc. all of them basically > doing the same thing. It seems silly to have two tools, one for > setting the link rate and flow control and one for setting the > interrupt coalescing counters. Point. > The point is that a lot of the NICs have the same type of tuning > variables, sometimes they are identical sometimes they are slightly > different. My suggestion is that we either teach ethtool about the > different names of tuning parameters or if we are lazy just allow one > to set `NIC private parameters 1-16 with it' and define those for the > different NIC as different things. I stick to my assertion that creating driver-private ioctls is the way to go. Given your good point however, my suggestion would be to modify userland ethtool to support the acenic/tulip/foo-specific ioctls. Sort of like how mount(8) supports fs-specific options, along with general options. Many of the Becker-derived drivers export an MDIO interface, but do not support ethtool.h. I already plan to modify userland ethtool to support these driver-private ioctls, until they are modified for full ethtool support in-kernel. This immediately makes ethtool useful for many more users, and it should eliminate the need for the /sbin/mii-tool that comes with pcmcia-cs. So far I don't yet know who is maintaining userland ethtool. I'm guessing DaveM or Jakub Jelinek (sp?), since they appear to have originally written/packaged it. I need to figure that one out before I start making too many changes to ethtool.c... Jeff P.S. For 2.5, I would like to write an mii-phy module. Then all these mdio-alike drivers can pass their mdio_read and mdio_write functions to a generic layer, which properly handles ethtool and the older dev->if_port form of media selection. This will also allow us to identify and work around bugs in specific MII PHYs that might affect more than one type of NIC. -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full mooon on a dark night, MandrakeSoft | and a smooth road all the way to your door. From owner-netdev@oss.sgi.com Wed Mar 21 04:13:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2LCDvH24135 for netdev-outgoing; Wed, 21 Mar 2001 04:13:57 -0800 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2LCDuM24131 for ; Wed, 21 Mar 2001 04:13:56 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 3D01B1F6C; Wed, 21 Mar 2001 07:13:55 -0500 (EST) Message-ID: <3AB89B04.3034A8E4@mandrakesoft.com> Date: Wed, 21 Mar 2001 07:13:56 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jes Sorensen Cc: netdev@oss.sgi.com Subject: Re: ethtool and MII (was Re: initial acenic ZC cleanup) References: <200103082147.f28LlS301042@itanic.thepuffingroup.com> <15015.65092.349145.143015@pizda.ninka.net> <3AA80487.3C7E26A6@mandrakesoft.com> <3AAAEEC8.9375ED6@mandrakesoft.com> <3AB8997D.A715CD35@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 816 Lines: 24 My apologies, I should have added to that last message: However, if there are tuning parameters which are used across many drivers, go ahead and support them via ethtool, which is the common kernel place for such things now. Note that you'll probably want to create new ETHTOOL_xxx commands, if you need more than the four reserved dwords we have left in struct ethtool_cmd. Further, if these new commands are not specific mainly to ethernet drivers (ie. if TR or arcnet drivers could use your new feature), you should probably create a new sock ioctl, instead of messing around with the ethtool interface at all... WDYT? Jeff -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full mooon on a dark night, MandrakeSoft | and a smooth road all the way to your door. From owner-netdev@oss.sgi.com Wed Mar 21 04:54:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2LCsr725219 for netdev-outgoing; Wed, 21 Mar 2001 04:54:53 -0800 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2LCsqM25216 for ; Wed, 21 Mar 2001 04:54:52 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id A3F951F67; Wed, 21 Mar 2001 07:54:50 -0500 (EST) Message-ID: <3AB8A49B.372063D7@mandrakesoft.com> Date: Wed, 21 Mar 2001 07:54:51 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Cc: Linux Kernel Mailing List , dhinds@sonic.net Subject: [PATCH] RFC: Network driver info to userspace Content-Type: multipart/mixed; boundary="------------F91653DF53532B4243FBCC90" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2604 Lines: 66 This is a multi-part message in MIME format. --------------F91653DF53532B4243FBCC90 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit In various scenarios where userland needs to interact with hardware, userland often needs to know exactly what driver (and driver version) is currently running on a given interface. Hotplugging and other applications could -really- use the ability to find out bus information for a given interface. Firmware So I propose the attached addition to the ethtool interface. It adds a new structure with several ASCII text fields, which are filled in at the driver's discretion. Userland then interprets these fields for their own evil designs. Currently (AFAIK) for all kernel interfaces, userland has no reliable way to associate a hardware device with a kernel interface, or a driver with a kernel interface[1]. Since we have no generic register_driver() interface, solving this problem means implementing a domain-specific solution like I have done here... -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full mooon on a dark night, MandrakeSoft | and a smooth road all the way to your door. --------------F91653DF53532B4243FBCC90 Content-Type: text/plain; charset=us-ascii; name="ethtool.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ethtool.patch" Index: include/linux/ethtool.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/Attic/ethtool.h,v retrieving revision 1.1.1.2 diff -u -r1.1.1.2 ethtool.h --- include/linux/ethtool.h 2000/11/14 22:01:49 1.1.1.2 +++ include/linux/ethtool.h 2001/03/21 12:42:15 @@ -24,10 +24,21 @@ u32 reserved[4]; }; +/* these strings are set to whatever the driver author decides... */ +struct ethtool_drvinfo { + char driver[32]; /* driver short name, "tulip", "eepro100" */ + char version[32]; /* driver version string */ + char fw_version[32]; /* firmware version string, if applicable */ + char bus_info[32]; /* Bus info for this interface. For PCI + * devices, use pci_dev->slot_name. */ + char reserved1[32]; + char reserved2[32]; +}; /* CMDs currently supported */ -#define ETHTOOL_GSET 0x00000001 /* Get settings, non-privileged. */ +#define ETHTOOL_GSET 0x00000001 /* Get settings. */ #define ETHTOOL_SSET 0x00000002 /* Set settings, privileged. */ +#define ETHTOOL_GDRVINFO 0x00000003 /* Get driver info. */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET --------------F91653DF53532B4243FBCC90-- From owner-netdev@oss.sgi.com Wed Mar 21 06:26:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2LEQrW27361 for netdev-outgoing; Wed, 21 Mar 2001 06:26:53 -0800 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2LEQoM27356 for ; Wed, 21 Mar 2001 06:26:50 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 7EBC21F6D; Wed, 21 Mar 2001 09:26:30 -0500 (EST) Message-ID: <3AB8BA16.A25C0929@mandrakesoft.com> Date: Wed, 21 Mar 2001 09:26:30 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: Linux Kernel Mailing List , netdev@oss.sgi.com Cc: Andrew Morton , Linus Torvalds , "David S. Miller" Subject: PATCH 2.4.3.6: fix netdevice initialization Content-Type: multipart/mixed; boundary="------------98FC4B3C4E3DBB7B168B218A" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 19993 Lines: 643 This is a multi-part message in MIME format. --------------98FC4B3C4E3DBB7B168B218A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit (Linus, please do not apply this. After some patch review, I'll send this to you along with driver updates to use it) Attached is a patch against 2.4.3-pre6, which adds alloc_etherdev and variants to the driver API. This API addition should allow us to close the gaping race between init_etherdev call time, and when the network device is actually ready. This version removes the DECLARE_xxx at the request of the crowd. Hooray for cut-n-paste code... please check for errors. Driver API changes for this stable kernel series are: * Six functions added, * Four prototypes moved from netdevice.h to foodevice.h, and * No behavior changes, no code changes requiring immediate driver updates Main change in this patch: * New functions alloc_etherdev, alloc_fddidev, alloc_hippi_dev, alloc_fcdev, alloc_trdev Cleanup changes in this patch: * Move prototypes from netdevice.h to foodevice.h: [un]register_fcdev, [un]register_trdev * Add inline source docs for init_xxxdev * Move EXPORT_SYMBOL for public functions from net/netsyms.c to net_init.c * New function register_hipdev, for API completeness * Remove duplicate code from unregister_hipdev, [un]register_trdev, [un]register_fcdev * tr_setup was exported but did nothing. Rename tr_configure to tr_setup, remove old no-op tr_setup. -- Jeff Garzik | More novel than War and Peace Building 1024 | More tongue-in-cheek than a lesbian orgy MandrakeSoft | Sneakin' up like celery, yeah I'm stalkin' --------------98FC4B3C4E3DBB7B168B218A Content-Type: text/plain; charset=us-ascii; name="net-fix.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="net-fix.patch" Index: drivers/net/net_init.c =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/drivers/net/net_init.c,v retrieving revision 1.1.1.8 retrieving revision 1.1.1.8.34.3 diff -u -r1.1.1.8 -r1.1.1.8.34.3 --- drivers/net/net_init.c 2001/02/27 03:03:50 1.1.1.8 +++ drivers/net/net_init.c 2001/03/21 14:10:50 1.1.1.8.34.3 @@ -28,10 +28,12 @@ up. We now share common code and have regularised name allocation setups. Abolished the 16 card limits. 03/19/2000 - jgarzik and Urban Widmark: init_etherdev 32-byte align + 03/21/2001 - jgarzik: alloc_etherdev and friends */ #include +#include #include #include #include @@ -68,6 +70,33 @@ */ +static struct net_device *alloc_netdev(int sizeof_priv, const char *mask, + void (*setup)(struct net_device *)) +{ + struct net_device *dev; + int alloc_size; + + /* ensure 32-byte alignment of the private area */ + alloc_size = sizeof (*dev) + sizeof_priv + 31; + + dev = (struct net_device *) kmalloc (alloc_size, GFP_KERNEL); + if (dev == NULL) + { + printk(KERN_ERR "alloc_dev: Unable to allocate device memory.\n"); + return NULL; + } + + memset(dev, 0, alloc_size); + + if (sizeof_priv) + dev->priv = (void *) (((long)(dev + 1) + 31) & ~31); + + setup(dev); + strcpy(dev->name, mask); + + return dev; +} + static struct net_device *init_alloc_dev(int sizeof_priv) { struct net_device *dev; @@ -142,6 +171,17 @@ return dev; } +static int __register_netdev(struct net_device *dev) +{ + dev_init_buffers(dev); + + if (dev->init && dev->init(dev) != 0) { + unregister_netdev(dev); + return -EIO; + } + return 0; +} + /** * init_etherdev - Register ethernet device * @dev: An ethernet device structure to be filled in, or %NULL if a new @@ -164,6 +204,25 @@ return init_netdev(dev, sizeof_priv, "eth%d", ether_setup); } +/** + * alloc_etherdev - Register ethernet device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with ethernet-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ + +struct net_device *alloc_etherdev(int sizeof_priv) +{ + return alloc_netdev(sizeof_priv, "eth%d", ether_setup); +} + +EXPORT_SYMBOL(init_etherdev); +EXPORT_SYMBOL(alloc_etherdev); static int eth_mac_addr(struct net_device *dev, void *p) { @@ -184,11 +243,48 @@ #ifdef CONFIG_FDDI +/** + * init_fddidev - Register FDDI device + * @dev: A FDDI device structure to be filled in, or %NULL if a new + * struct should be allocated. + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with FDDI-generic values. + * + * If no device structure is passed, a new one is constructed, complete with + * a private data area of size @sizeof_priv. A 32-byte (not bit) + * alignment is enforced for this private data area. + * + * If an empty string area is passed as dev->name, or a new structure is made, + * a new name string is constructed. + */ + struct net_device *init_fddidev(struct net_device *dev, int sizeof_priv) { return init_netdev(dev, sizeof_priv, "fddi%d", fddi_setup); } +/** + * alloc_fddidev - Register FDDI device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this FDDI device + * + * Fill in the fields of the device structure with FDDI-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ + +struct net_device *alloc_fddidev(int sizeof_priv) +{ + return alloc_netdev(sizeof_priv, "fddi%d", fddi_setup); +} + +EXPORT_SYMBOL(init_fddidev); +EXPORT_SYMBOL(alloc_fddidev); + static int fddi_change_mtu(struct net_device *dev, int new_mtu) { if ((new_mtu < FDDI_K_SNAP_HLEN) || (new_mtu > FDDI_K_SNAP_DLEN)) @@ -227,19 +323,59 @@ } +/** + * init_hippi_dev - Register HIPPI device + * @dev: A HIPPI device structure to be filled in, or %NULL if a new + * struct should be allocated. + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with HIPPI-generic values. + * + * If no device structure is passed, a new one is constructed, complete with + * a private data area of size @sizeof_priv. A 32-byte (not bit) + * alignment is enforced for this private data area. + * + * If an empty string area is passed as dev->name, or a new structure is made, + * a new name string is constructed. + */ + struct net_device *init_hippi_dev(struct net_device *dev, int sizeof_priv) { return init_netdev(dev, sizeof_priv, "hip%d", hippi_setup); } +/** + * alloc_hippi_dev - Register HIPPI device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this HIPPI device + * + * Fill in the fields of the device structure with HIPPI-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ +struct net_device *alloc_hippi_dev(int sizeof_priv) +{ + return alloc_netdev(sizeof_priv, "hip%d", hippi_setup); +} + +int register_hipdev(struct net_device *dev) +{ + return __register_netdev(dev); +} + void unregister_hipdev(struct net_device *dev) { - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); + unregister_netdev(dev); } +EXPORT_SYMBOL(init_hippi_dev); +EXPORT_SYMBOL(alloc_hippi_dev); +EXPORT_SYMBOL(register_hipdev); +EXPORT_SYMBOL(unregister_hipdev); static int hippi_neigh_setup_dev(struct net_device *dev, struct neigh_parms *p) { @@ -283,6 +419,7 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(ether_setup); #ifdef CONFIG_FDDI @@ -312,6 +449,7 @@ return; } +EXPORT_SYMBOL(fddi_setup); #endif /* CONFIG_FDDI */ @@ -349,6 +487,7 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(hippi_setup); #endif /* CONFIG_HIPPI */ #if defined(CONFIG_ATALK) || defined(CONFIG_ATALK_MODULE) @@ -387,6 +526,7 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(ltalk_setup); #endif /* CONFIG_ATALK || CONFIG_ATALK_MODULE */ @@ -438,10 +578,12 @@ rtnl_unlock(); } +EXPORT_SYMBOL(register_netdev); +EXPORT_SYMBOL(unregister_netdev); #ifdef CONFIG_TR -static void tr_configure(struct net_device *dev) +void tr_setup(struct net_device *dev) { /* * Configure and register @@ -462,32 +604,61 @@ dev->flags = IFF_BROADCAST | IFF_MULTICAST ; } +/** + * init_trdev - Register token ring device + * @dev: A token ring device structure to be filled in, or %NULL if a new + * struct should be allocated. + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with token ring-generic values. + * + * If no device structure is passed, a new one is constructed, complete with + * a private data area of size @sizeof_priv. A 32-byte (not bit) + * alignment is enforced for this private data area. + * + * If an empty string area is passed as dev->name, or a new structure is made, + * a new name string is constructed. + */ + struct net_device *init_trdev(struct net_device *dev, int sizeof_priv) { - return init_netdev(dev, sizeof_priv, "tr%d", tr_configure); + return init_netdev(dev, sizeof_priv, "tr%d", tr_setup); } -void tr_setup(struct net_device *dev) +/** + * alloc_trdev - Register token ring device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this token ring device + * + * Fill in the fields of the device structure with token ring-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ + +struct net_device *alloc_trdev(int sizeof_priv) { + return alloc_netdev(sizeof_priv, "tr%d", tr_setup); } int register_trdev(struct net_device *dev) { - dev_init_buffers(dev); - - if (dev->init && dev->init(dev) != 0) { - unregister_trdev(dev); - return -EIO; - } - return 0; + return __register_netdev(dev); } void unregister_trdev(struct net_device *dev) { - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); + unregister_netdev(dev); } + +EXPORT_SYMBOL(tr_setup); +EXPORT_SYMBOL(init_trdev); +EXPORT_SYMBOL(alloc_trdev); +EXPORT_SYMBOL(register_trdev); +EXPORT_SYMBOL(unregister_trdev); + #endif /* CONFIG_TR */ @@ -509,31 +680,62 @@ /* New-style flags. */ dev->flags = IFF_BROADCAST; dev_init_buffers(dev); - return; } +/** + * init_fcdev - Register fibre channel device + * @dev: A fibre channel device structure to be filled in, or %NULL if a new + * struct should be allocated. + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with fibre channel-generic values. + * + * If no device structure is passed, a new one is constructed, complete with + * a private data area of size @sizeof_priv. A 32-byte (not bit) + * alignment is enforced for this private data area. + * + * If an empty string area is passed as dev->name, or a new structure is made, + * a new name string is constructed. + */ struct net_device *init_fcdev(struct net_device *dev, int sizeof_priv) { return init_netdev(dev, sizeof_priv, "fc%d", fc_setup); } +/** + * alloc_fcdev - Register fibre channel device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this fibre channel device + * + * Fill in the fields of the device structure with fibre channel-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ + +struct net_device *alloc_fcdev(int sizeof_priv) +{ + return alloc_netdev(sizeof_priv, "fc%d", fc_setup); +} + int register_fcdev(struct net_device *dev) { - dev_init_buffers(dev); - if (dev->init && dev->init(dev) != 0) { - unregister_fcdev(dev); - return -EIO; - } - return 0; + return __register_netdev(dev); } void unregister_fcdev(struct net_device *dev) { - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); + unregister_netdev(dev); } + +EXPORT_SYMBOL(fc_setup); +EXPORT_SYMBOL(init_fcdev); +EXPORT_SYMBOL(alloc_fcdev); +EXPORT_SYMBOL(register_fcdev); +EXPORT_SYMBOL(unregister_fcdev); #endif /* CONFIG_NET_FC */ Index: net/netsyms.c =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/net/netsyms.c,v retrieving revision 1.1.1.23 retrieving revision 1.1.1.23.4.1 diff -u -r1.1.1.23 -r1.1.1.23.4.1 --- net/netsyms.c 2001/03/20 12:56:42 1.1.1.23 +++ net/netsyms.c 2001/03/21 02:19:50 1.1.1.23.4.1 @@ -432,33 +432,19 @@ #endif /* CONFIG_INET */ #ifdef CONFIG_TR -EXPORT_SYMBOL(tr_setup); EXPORT_SYMBOL(tr_type_trans); -EXPORT_SYMBOL(register_trdev); -EXPORT_SYMBOL(unregister_trdev); -EXPORT_SYMBOL(init_trdev); #endif -#ifdef CONFIG_NET_FC -EXPORT_SYMBOL(register_fcdev); -EXPORT_SYMBOL(unregister_fcdev); -EXPORT_SYMBOL(init_fcdev); -#endif - /* Device callback registration */ EXPORT_SYMBOL(register_netdevice_notifier); EXPORT_SYMBOL(unregister_netdevice_notifier); /* support for loadable net drivers */ #ifdef CONFIG_NET -EXPORT_SYMBOL(init_etherdev); EXPORT_SYMBOL(loopback_dev); EXPORT_SYMBOL(register_netdevice); EXPORT_SYMBOL(unregister_netdevice); -EXPORT_SYMBOL(register_netdev); -EXPORT_SYMBOL(unregister_netdev); EXPORT_SYMBOL(netdev_state_change); -EXPORT_SYMBOL(ether_setup); EXPORT_SYMBOL(dev_new_index); EXPORT_SYMBOL(dev_get_by_index); EXPORT_SYMBOL(__dev_get_by_index); @@ -469,8 +455,6 @@ EXPORT_SYMBOL(eth_type_trans); #ifdef CONFIG_FDDI EXPORT_SYMBOL(fddi_type_trans); -EXPORT_SYMBOL(fddi_setup); -EXPORT_SYMBOL(init_fddidev); #endif /* CONFIG_FDDI */ #if 0 EXPORT_SYMBOL(eth_copy_and_sum); @@ -511,8 +495,6 @@ #ifdef CONFIG_HIPPI EXPORT_SYMBOL(hippi_type_trans); -EXPORT_SYMBOL(init_hippi_dev); -EXPORT_SYMBOL(unregister_hipdev); #endif #ifdef CONFIG_SYSCTL @@ -522,12 +504,6 @@ EXPORT_SYMBOL(sysctl_ip_default_ttl); #endif #endif - -#if defined(CONFIG_ATALK) || defined(CONFIG_ATALK_MODULE) -#include -EXPORT_SYMBOL(ltalk_setup); -#endif - /* Packet scheduler modules want these. */ EXPORT_SYMBOL(qdisc_destroy); Index: include/linux/etherdevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/etherdevice.h,v retrieving revision 1.1.1.14 retrieving revision 1.1.1.14.4.2 diff -u -r1.1.1.14 -r1.1.1.14.4.2 --- include/linux/etherdevice.h 2001/03/20 12:54:47 1.1.1.14 +++ include/linux/etherdevice.h 2001/03/21 14:10:50 1.1.1.14.4.2 @@ -38,7 +38,8 @@ struct hh_cache *hh); extern int eth_header_parse(struct sk_buff *skb, unsigned char *haddr); -extern struct net_device * init_etherdev(struct net_device *, int); +extern struct net_device *init_etherdev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_etherdev(int sizeof_priv); static __inline__ void eth_copy_and_sum (struct sk_buff *dest, unsigned char *src, int len, int base) { Index: include/linux/fcdevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/fcdevice.h,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.178.2 diff -u -r1.1.1.1 -r1.1.1.1.178.2 --- include/linux/fcdevice.h 2000/10/22 19:36:14 1.1.1.1 +++ include/linux/fcdevice.h 2001/03/21 14:10:50 1.1.1.1.178.2 @@ -33,7 +33,10 @@ extern int fc_rebuild_header(struct sk_buff *skb); //extern unsigned short fc_type_trans(struct sk_buff *skb, struct net_device *dev); -extern struct net_device * init_fcdev(struct net_device *, int); +extern struct net_device *init_fcdev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_fcdev(int sizeof_priv); +extern int register_fcdev(struct net_device *dev); +extern void unregister_fcdev(struct net_device *dev); #endif Index: include/linux/fddidevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/fddidevice.h,v retrieving revision 1.1.1.2 retrieving revision 1.1.1.2.170.2 diff -u -r1.1.1.2 -r1.1.1.2.170.2 --- include/linux/fddidevice.h 2000/10/22 20:44:24 1.1.1.2 +++ include/linux/fddidevice.h 2001/03/21 14:10:50 1.1.1.2.170.2 @@ -34,7 +34,8 @@ extern int fddi_rebuild_header(struct sk_buff *skb); extern unsigned short fddi_type_trans(struct sk_buff *skb, struct net_device *dev); -extern struct net_device * init_fddidev(struct net_device *, int); +extern struct net_device *init_fddidev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_fddidev(int sizeof_priv); #endif #endif /* _LINUX_FDDIDEVICE_H */ Index: include/linux/hippidevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/hippidevice.h,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.178.2 diff -u -r1.1.1.1 -r1.1.1.1.178.2 --- include/linux/hippidevice.h 2000/10/22 19:36:13 1.1.1.1 +++ include/linux/hippidevice.h 2001/03/21 14:10:50 1.1.1.1.178.2 @@ -51,7 +51,9 @@ extern void hippi_net_init(void); void hippi_setup(struct net_device *dev); -extern struct net_device *init_hippi_dev(struct net_device *, int); +extern struct net_device *init_hippi_dev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_hippi_dev(int sizeof_priv); +extern int register_hipdev(struct net_device *dev); extern void unregister_hipdev(struct net_device *dev); #endif Index: include/linux/netdevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/netdevice.h,v retrieving revision 1.1.1.21 retrieving revision 1.1.1.21.4.1 diff -u -r1.1.1.21 -r1.1.1.21.4.1 --- include/linux/netdevice.h 2001/03/20 12:54:47 1.1.1.21 +++ include/linux/netdevice.h 2001/03/21 14:10:50 1.1.1.21.4.1 @@ -633,10 +633,6 @@ /* Support for loadable net-drivers */ extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); -extern int register_trdev(struct net_device *dev); -extern void unregister_trdev(struct net_device *dev); -extern int register_fcdev(struct net_device *dev); -extern void unregister_fcdev(struct net_device *dev); /* Functions used for multicast support */ extern void dev_mc_upload(struct net_device *dev); extern int dev_mc_delete(struct net_device *dev, void *addr, int alen, int all); Index: include/linux/trdevice.h =================================================================== RCS file: /cvsroot/gkernel/linux_2_4/include/linux/trdevice.h,v retrieving revision 1.1.1.1 retrieving revision 1.1.1.1.178.2 diff -u -r1.1.1.1 -r1.1.1.1.178.2 --- include/linux/trdevice.h 2000/10/22 19:36:03 1.1.1.1 +++ include/linux/trdevice.h 2001/03/21 14:10:50 1.1.1.1.178.2 @@ -33,7 +33,10 @@ void *saddr, unsigned len); extern int tr_rebuild_header(struct sk_buff *skb); extern unsigned short tr_type_trans(struct sk_buff *skb, struct net_device *dev); -extern struct net_device * init_trdev(struct net_device *, int); +extern struct net_device *init_trdev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_trdev(int sizeof_priv); +extern int register_trdev(struct net_device *dev); +extern void unregister_trdev(struct net_device *dev); #endif --------------98FC4B3C4E3DBB7B168B218A-- From owner-netdev@oss.sgi.com Wed Mar 21 06:33:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2LEXlx27957 for netdev-outgoing; Wed, 21 Mar 2001 06:33:47 -0800 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2LEXjM27950 for ; Wed, 21 Mar 2001 06:33:45 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id E4FED1F6A; Wed, 21 Mar 2001 09:33:40 -0500 (EST) Message-ID: <3AB8BBC5.61A7F65F@mandrakesoft.com> Date: Wed, 21 Mar 2001 09:33:41 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: Linux Kernel Mailing List , netdev@oss.sgi.com Cc: Andrew Morton , Linus Torvalds , "David S. Miller" Subject: Re: PATCH 2.4.3.6: fix netdevice initialization References: <3AB8BA16.A25C0929@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 391 Lines: 10 I have this bad habit of thinking of things after I click . One other change that accompanies this -- define a feature macro. The following should go into linux/netdevice.h: #define HAVE_ALLOC_NETDEV -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full mooon on a dark night, MandrakeSoft | and a smooth road all the way to your door. From owner-netdev@oss.sgi.com Wed Mar 21 07:20:20 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2LFKKT29842 for netdev-outgoing; Wed, 21 Mar 2001 07:20:20 -0800 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2LFKIM29839 for ; Wed, 21 Mar 2001 07:20:18 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id CAA15333; Thu, 22 Mar 2001 02:19:24 +1100 (EST) Message-ID: <3AB8C6EE.5EF6BA90@uow.edu.au> Date: Thu, 22 Mar 2001 02:21:18 +1100 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.3-pre3 i586) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: Linux Kernel Mailing List , netdev@oss.sgi.com, Linus Torvalds , "David S. Miller" Subject: Re: PATCH 2.4.3.6: fix netdevice initialization References: <3AB8BA16.A25C0929@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3065 Lines: 143 Jeff Garzik wrote: > > Attached is a patch against 2.4.3-pre6, which adds alloc_etherdev and > variants to the driver API. Looks good. Few minor things: In register_netdev(): if (strchr(dev->name, '%')) { err = -EBUSY; if(dev_alloc_name(dev, dev->name)<0) goto out; } We should propagate dev_alloc_name's return value: err = dev_alloc_name() if (err < 0) goto out; And register_netdevice's, so err = -EIO; if (register_netdevice(dev)) goto out; err = 0; out: becomes simply err = register_netdevice(dev); out: More significantly, the driver probe functions now become: xxx_probe() { dev = alloc_etherdev(sizeof(xxx_private)); ... printk(KERN_INFO "%s: stuff\n", dev->name); ... ret = register_netdev(dev); if (ret < 0) kfree(ret); return ret; } yes? And the printk() will say "eth%d: stuff", so we'll need to change the messages: - printk(KERN_INFO "%s: stuff\n", dev->name); + printk(KERN_INFO "xxx: stuff\n"); Correct? My quibble with this is things like wait_for_completion(), which are called from both the probe() function and the mainline driver code. These also print dev->name, and there's no obvious fix for that. For this reason I think I'd prefer it if alloc_etherdev() was passed some probe-time identifier: alloc_etherdev(sizeof(xxx_private), "xxx"); which goes into dev->name, and gets overwritten by register_netdev(). So, against your current patch: --- drivers/net/net_init.c.orig Thu Mar 22 02:11:47 2001 +++ drivers/net/net_init.c Thu Mar 22 02:17:46 2001 @@ -71,7 +71,7 @@ static struct net_device *alloc_netdev(int sizeof_priv, const char *mask, - void (*setup)(struct net_device *)) + const char *driver_name, void (*setup)(struct net_device *)) { struct net_device *dev; int alloc_size; @@ -92,7 +92,8 @@ dev->priv = (void *) (((long)(dev + 1) + 31) & ~31); setup(dev); - strcpy(dev->name, mask); + strcpy(dev->ifname, mask); + strcpy(dev->name, driver_name); return dev; } @@ -216,9 +217,9 @@ * this private data area. */ -struct net_device *alloc_etherdev(int sizeof_priv) +struct net_device *alloc_etherdev(int sizeof_priv, const char *driver_name) { - return alloc_netdev(sizeof_priv, "eth%d", ether_setup); + return alloc_netdev(sizeof_priv, "eth%d", driver_name, ether_setup); } EXPORT_SYMBOL(init_etherdev); @@ -532,7 +533,7 @@ int register_netdev(struct net_device *dev) { - int err; + int err = 0; rtnl_lock(); @@ -540,13 +541,14 @@ * If the name is a format string the caller wants us to * do a name allocation */ - - if (strchr(dev->name, '%')) - { - err = -EBUSY; - if(dev_alloc_name(dev, dev->name)<0) - goto out; - } + + /* Insert comment here */ + if (strchr(dev->ifname, '%')) + err = dev_alloc_name(dev, dev->ifname); + else if (strchr(dev->name, '%')) + err = dev_alloc_name(dev, dev->name); + if (err < 0) + goto out; /* * Back compatibility hook. Kill this one in 2.5 From owner-netdev@oss.sgi.com Wed Mar 21 07:41:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2LFf5F30619 for netdev-outgoing; Wed, 21 Mar 2001 07:41:05 -0800 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2LFf5M30616 for ; Wed, 21 Mar 2001 07:41:05 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 276181F6C; Wed, 21 Mar 2001 10:41:00 -0500 (EST) Message-ID: <3AB8CB8C.D055FC6E@mandrakesoft.com> Date: Wed, 21 Mar 2001 10:41:00 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton Cc: Linux Kernel Mailing List , netdev@oss.sgi.com, Linus Torvalds , "David S. Miller" Subject: Re: PATCH 2.4.3.6: fix netdevice initialization References: <3AB8BA16.A25C0929@mandrakesoft.com> <3AB8C6EE.5EF6BA90@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1490 Lines: 54 Andrew Morton wrote: > We should propagate dev_alloc_name's return value: > And register_netdevice's, so ok > More significantly, the driver probe functions now become: > > xxx_probe() > { > dev = alloc_etherdev(sizeof(xxx_private)); > ... > printk(KERN_INFO "%s: stuff\n", dev->name); > ... > ret = register_netdev(dev); > if (ret < 0) > kfree(ret); > return ret; > } > > yes? correct. > And the printk() will say "eth%d: stuff", so we'll need to > change the messages: > > - printk(KERN_INFO "%s: stuff\n", dev->name); > + printk(KERN_INFO "xxx: stuff\n"); > > Correct? correct. For PCI drivers, change to something like printk(KERN_INFO "tulip(%s): stuff\n", pci_dev->slot_name); > My quibble with this is things like wait_for_completion(), > which are called from both the probe() function and > the mainline driver code. These also print dev->name, > and there's no obvious fix for that. hrm. I'm not sure it's necessary to pass driver_name to alloc_etherdev (to set dev->name), just to be able to reference solely from the driver during the probe phase. Further there is a dev->name size limit you will run into with "myverylongdrivernameskimbosh." Just pass 'name' arg to wait_for_completion ;-) -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full mooon on a dark night, MandrakeSoft | and a smooth road all the way to your door. From owner-netdev@oss.sgi.com Thu Mar 22 16:48:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2N0mUr15913 for netdev-outgoing; Thu, 22 Mar 2001 16:48:30 -0800 Received: from mailb.telia.com (root@mailb.telia.com [194.22.194.6]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2N0mSM15910 for ; Thu, 22 Mar 2001 16:48:29 -0800 Received: from kongeroa.home (t6o98p143.telia.com [62.20.230.143]) by mailb.telia.com (8.9.3/8.9.3) with ESMTP id BAA07415; Fri, 23 Mar 2001 01:48:24 +0100 (CET) Received: from bhlap.home (IDENT:root@bhlap.home [192.168.2.20]) by kongeroa.home (8.9.3/8.9.3) with ESMTP id BAA24954; Fri, 23 Mar 2001 01:47:58 +0100 Received: from signal.uu.se (IDENT:bh@localhost [127.0.0.1]) by bhlap.home (8.9.3/8.9.3) with ESMTP id BAA17985; Fri, 23 Mar 2001 01:46:36 +0100 Message-ID: <3ABA9CEC.446D0E99@signal.uu.se> Date: Fri, 23 Mar 2001 01:46:36 +0100 From: Bjorn Hammarberg Reply-To: Bjorn.Hammarberg@signal.uu.se Organization: Uppsala University, Sweden X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-3 i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: Bjorn Hammarberg Subject: How to rewrite source addr from diald to prevent martians Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f2N0mTM15911 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3288 Lines: 85 Hi, Is there a simple way (i.e. through some kernel functionality) to rewrite the source address of a tcp/ip packet? I (think I) need that to improve my diald setup since buffered packets generated on the router/gateway are either dropped or does not work. I have read all sorts of documentation including the code without finding a proper solution that does not involve patching the kernel. Background: Diald sets up a slip interface to analyze and buffer the outbound traffic. Whenever a packet that should bring the link up arrives, diald starts pppd and changes the default route to the ppp interface. Then, all buffered packets are bounced back to the kernel through the slip interface where the packets are reforwarded and masqueraded accordingly. For packets arriving from the internal LAN, this setup works just perfect and connections come up really fast without waiting for retransmission by the tcp/ip protocol. For packets generated locally, however, the source address of the slip interface leads to all sorts of problems like martians, non-working masquerading, or local addresses being sent on the ppp interface. In short, this means that the buffering is useless for local packets. In my case this is problematic because I need to run a name server on the router. Therefore, all connections are initiated through the name server (unless the address is cached) which has to wait on retransmissions (since the diald forwarding does not work we have to wait until named sends a packet that gets the correct source address) leading to slow connects and even failures. All this could get fixed by changing the source address of locally generated packets when bouncing them back to the kernel??? Please CC any answers to me since I am not a member of this list. Or, you could send your answer directly to me, and I will post a write-up and (hopefully) a success story to this list. Cheers, /Bj๖rn The setup consists of: the internal computer (RH-6.2) ethernet interface (eth0) 192.168.c.d2 the router (RH-6.2 upgraded to 2.2.18) ethernet interface (eth0) 192.168.c.d1 diald snoop iface (sl0) 192.168.s.t diald ppp interface (ppp0) A.B.C.D named (8.2.2-P5) forwarded to 10.0.0.1 | ppp0 .---+---, | : | router/ | x...<----------------, gateway | : | sl0 .------, | | x...>------> FIFO >--' | : | `------' `---+---' | eth0 | +-------+--+--------+--------+ | | | | eth0 | .--+--, | | | | `-----' Internal Computer ---------------------------------------------------------------------- Bjorn Hammarberg, PhD student in Neurophysiological Signal Processing Dep. of Neuroscience Signals and Systems Clinical Neurophysiology จจจจจจจ|+|o|จจจจจจจจจจ Uppsala University University Hospital Uppsala |-+-| PO Box 528 SE-751 85 Uppsala, SWEDEN |o|+| SE-751 20 Uppsala, SWEDEN http://www.neurofys.uu.se `---' http://www.signal.uu.se From owner-netdev@oss.sgi.com Thu Mar 22 16:54:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2N0sBp16264 for netdev-outgoing; Thu, 22 Mar 2001 16:54:11 -0800 Received: from mailbox3.ucsd.edu (mailbox3.ucsd.edu [132.239.1.55]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2N0sAM16258 for ; Thu, 22 Mar 2001 16:54:10 -0800 Received: from smtp.ucsd.edu (smtp.ucsd.edu [132.239.1.49]) by mailbox3.ucsd.edu (8.11.0/8.11.0) with ESMTP id f2N0rR017462; Thu, 22 Mar 2001 16:53:27 -0800 (PST) Received: from cs.ucsd.edu (cse-air-dhcp-189.ucsd.edu [132.239.10.189]) by smtp.ucsd.edu (8.9.3/8.9.3) with ESMTP id QAA05076; Thu, 22 Mar 2001 16:53:26 -0800 (PST) Message-ID: <3ABA9E98.81D1413E@cs.ucsd.edu> Date: Thu, 22 Mar 2001 16:53:44 -0800 From: Federico David Sacerdoti Organization: University of California San Diego X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i586) X-Accept-Language: en MIME-Version: 1.0 To: davem@redhat.com, ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: A TCP monitoring /proc/net file Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2357 Lines: 62 Hi, For a graduate network class at UCSD I implemented some TCP performance monitors in the Linux TCP stack (ipv4). I have added a file to the proc filesystem (/proc/net/tcphealth) that monitors the "health" of all tcp connections on a machine. The tcphealth file tracks smoothed Round-Trip-Times, duplicate acks, and duplicate incoming packets for each established tcp connection. I believe that there is lots of good monitoring information that can be gleaned from this file. It is designed to help answer the question: "Why is my network connection so slow?", and work on all TCP connections without the cooperation of the remote server. I also wrote a module for the sweet GKrellM monitor that shows sender and receiver retransmissions (receiver side metric is 3 consecutive dupAcks - a retransmission request), and average SRTT over all open connections. In the code I have taken care not to disrupt the fast path in tcp_rcv_established(), and generally have tried to step lightly. I have patched kernel versions 2.2.14 and 2.2.16, and tested it on an ix86, a SUN, and a PPC. If there is any interest, I will submit the patch to the appropriate maintainer. Sincerely, Federico David Sacerdoti PS. This is my first time communicating with the kernel developers so please forgive any of my breaches of protocol. My team wrote a short paper on these Network Health Monitors and ran some interesting experiments using them. If interested the link is: http://www-cse.ucsd.edu/classes/wi01/cse222/projects/reports/net-health-14.pdf A Sample Output (fairly health connections - 11Mbps wireless eth): [fds@sandpiper fds]$ cat /proc/net/tcphealth TCP Health Monitoring -Duplicate ACKs are normal and indicate lost/reordered packets. -Duplicate Packets are Bad and show an inefficient connection. id Local Address Remote Address RttEst AcksSent DupAcksSent PktsRecv DupPktsRecv 1: 132.239.10.189:1150 207.25.71.146:80 80 3 0 3 0 6: 132.239.10.189:1145 207.25.71.146:80 66 29 13 37 0 7: 132.239.10.189:1141 208.48.26.229:80 112 8 0 12 0 8: 132.239.10.189:1131 208.48.26.226:80 77 12 0 23 0 9: 132.239.10.189:1077 132.239.55.100:993 28 9 0 20 0 [fds@sandpiper fds]$ From owner-netdev@oss.sgi.com Thu Mar 22 16:58:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2N0w5g16692 for netdev-outgoing; Thu, 22 Mar 2001 16:58:05 -0800 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2N0w5M16689 for ; Thu, 22 Mar 2001 16:58:05 -0800 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA04380; Thu, 22 Mar 2001 16:57:12 -0800 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15034.40807.977418.671551@pizda.ninka.net> Date: Thu, 22 Mar 2001 16:57:11 -0800 (PST) To: Federico David Sacerdoti Cc: ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: A TCP monitoring /proc/net file In-Reply-To: <3ABA9E98.81D1413E@cs.ucsd.edu> References: <3ABA9E98.81D1413E@cs.ucsd.edu> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 91 Lines: 6 See the TCP_INFO socket option we added to 2.4.x Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Thu Mar 22 23:33:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2N7Xl923885 for netdev-outgoing; Thu, 22 Mar 2001 23:33:47 -0800 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2N7XkM23882 for ; Thu, 22 Mar 2001 23:33:46 -0800 Received: (qmail 22566 invoked from network); 23 Mar 2001 07:33:33 -0000 Received: from pd9502481.dip.t-dialin.net (HELO worker.bieringer.de) (217.80.36.129) by mail.bieringer.de with SMTP; 23 Mar 2001 07:33:33 -0000 Message-Id: <5.0.2.1.0.20010323083110.02df5850@mail.bieringer.de> X-Sender: list4peter@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Version 5.0.2 Date: Fri, 23 Mar 2001 08:35:32 +0100 To: Bjorn.Hammarberg@signal.uu.se, netdev@oss.sgi.com From: Peter Bieringer Subject: Re: How to rewrite source addr from diald to prevent martians In-Reply-To: <3ABA9CEC.446D0E99@signal.uu.se> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 713 Lines: 20 At 01:46 23.03.2001, Bjorn Hammarberg wrote: >Hi, > >Is there a simple way (i.e. through some kernel functionality) to >rewrite the source address of a tcp/ip packet? I (think I) need that to >improve my diald setup since buffered packets generated on the >router/gateway are either dropped or does not work. I have read all >sorts of documentation including the code without finding a proper >solution that does not involve patching the kernel. Ever tested /proc/sys/net/ipv4/ip_dynaddr ? See /usr/src/linux/Documentation/networking/ip_dynaddr.txt for more. > named (8.2.2-P5) forwarded to 10.0.0.1 Oh, remote root access via exploit possible. You should update your "bind" RPM asap. Peter From owner-netdev@oss.sgi.com Fri Mar 23 00:39:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2N8daj25632 for netdev-outgoing; Fri, 23 Mar 2001 00:39:36 -0800 Received: from smtp01.netkracker.com (smtp01.netkracker.com [202.177.128.2]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2N8dWM25629 for ; Fri, 23 Mar 2001 00:39:33 -0800 Received: from netkracker.com ([210.18.2.16]) by smtp01.netkracker.com (Netscape Messaging Server 4.15) with ESMTP id GAN6UW00.HGQ for ; Fri, 23 Mar 2001 14:12:32 +0530 Message-ID: <3ABBB4A7.90005@netkracker.com> Date: Sat, 24 Mar 2001 02:10:07 +0530 From: "Sharath Udupa" User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.17-21mdk i686; en-US; 0.7) Gecko/20010105 X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: reuqest for advice Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 610 Lines: 20 Hello, I am a Computer Science student studying in India. I am very interested in working on TCP/IP. I have gone through the implementation of TCP/IP in 4.4BSD (Net/3). But I am interested in learning how this has been implemented in Linux. I have tried to understand by just looking at the source code, but finding it very difficult to properly understand it. I request you to tell me if there are any documentation on the implementation. I am also interested in doing a mini project in TCP/IP. Could please suggest some good topics for me to start off. Please do reply to this mail. Thank you, Sharath From owner-netdev@oss.sgi.com Fri Mar 23 00:48:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2N8mbS26182 for netdev-outgoing; Fri, 23 Mar 2001 00:48:37 -0800 Received: from laurin.munich.netsurf.de (laurin.munich.netsurf.de [194.64.166.1]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2N8maM26179 for ; Fri, 23 Mar 2001 00:48:36 -0800 Received: from fred.muc.de (noidentity@ns1115.munich.netsurf.de [195.180.235.115]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id JAA25736; Fri, 23 Mar 2001 09:48:25 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id 5A285E3911; Fri, 23 Mar 2001 09:39:18 +0100 (CET) Date: Fri, 23 Mar 2001 09:39:18 +0100 From: Andi Kleen To: "David S. Miller" Cc: Federico David Sacerdoti , ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: A TCP monitoring /proc/net file Message-ID: <20010323093918.A1476@fred.local> References: <3ABA9E98.81D1413E@cs.ucsd.edu> <15034.40807.977418.671551@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <15034.40807.977418.671551@pizda.ninka.net>; from davem@redhat.com on Fri, Mar 23, 2001 at 01:57:11AM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 288 Lines: 10 On Fri, Mar 23, 2001 at 01:57:11AM +0100, David S. Miller wrote: > > See the TCP_INFO socket option we added to 2.4.x Sadly TCP_INFO can not be used for external monitoring currently (at least not without very bad and racy hacks to allow /proc to open sockets in /proc/pid/fd) -Andi From owner-netdev@oss.sgi.com Fri Mar 23 05:35:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2NDZ9J31705 for netdev-outgoing; Fri, 23 Mar 2001 05:35:09 -0800 Received: from mgw1.ul.ie (mgw1.ul.ie [136.201.1.117]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2NDZ7M31702 for ; Fri, 23 Mar 2001 05:35:08 -0800 Received: from gabriel.ul.ie ([136.201.1.101]) by ul.ie (PMDF V5.2-32 #41948) with ESMTP id <0GAN00ICPK5EPZ@ul.ie> for netdev@oss.sgi.com; Fri, 23 Mar 2001 13:29:38 +0000 (GMT) Received: by gabriel.ul.ie with Internet Mail Service (5.5.2653.19) id ; Fri, 23 Mar 2001 13:38:36 +0000 Content-return: allowed Date: Fri, 23 Mar 2001 13:38:34 +0000 From: EOIN RYAN <9726179@student.ul.ie> Subject: Kernel Network Implementation help. To: "'netdev@oss.sgi.com'" Message-id: <992C0C12C388D411B264009027AA3418698F31@gabriel.ul.ie> MIME-version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1077 Lines: 23 Hi, I'm a final year student in the University of Limerick, Ireland. Part of my final year project requires me to develop a kernel module which provides an network interface for other modules on the system. I have found getting exact information quite difficult, with my main source being the socket.c file. I'm having major problems with the communication and my deadline is coming agonisingly close. I'm running the 2.2.16 kernel on an intel and at the moment I'm trying to get a server to accept a connection and receive a few characters from it. The server accepts a telnet connection and gets the name of the connecting session however when I try to do a sock->ops->recvmsg i get a efault. I was led to believe by someone that I may be able to get some help from here. Is there any sample code that you can give me, or places I can go to get examples, that you know of? I don't really know who to turn to for help on this one, there's no one in my college that has enough knowledge of the network implementation in the kernel to help me. Sincerely, Eoin Ryan. From owner-netdev@oss.sgi.com Fri Mar 23 07:39:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2NFdUJ02326 for netdev-outgoing; Fri, 23 Mar 2001 07:39:30 -0800 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2NFdTM02323 for ; Fri, 23 Mar 2001 07:39:29 -0800 Received: by colin.muc.de id <140650-3>; Fri, 23 Mar 2001 16:39:17 +0100 Message-ID: <20010323163908.50941@colin.muc.de> Date: Fri, 23 Mar 2001 16:39:08 +0100 From: Andi Kleen To: EOIN RYAN <9726179@student.ul.ie> Cc: "'netdev@oss.sgi.com'" Subject: Re: Kernel Network Implementation help. References: <992C0C12C388D411B264009027AA3418698F31@gabriel.ul.ie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <992C0C12C388D411B264009027AA3418698F31@gabriel.ul.ie>; from EOIN RYAN on Fri, Mar 23, 2001 at 02:38:34PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 384 Lines: 18 On Fri, Mar 23, 2001 at 02:38:34PM +0100, EOIN RYAN wrote: > get a server to accept a connection and receive a few characters from it. > The server accepts a telnet connection and gets the name of the connecting > session however when I try to do a sock->ops->recvmsg i get a efault. Wrap it with mm_segment_t oldfs = get_fs(); set_fs(KERNEL_DS); ... set_fs(oldfs); -Andi From owner-netdev@oss.sgi.com Fri Mar 23 08:47:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2NGlxM03873 for netdev-outgoing; Fri, 23 Mar 2001 08:47:59 -0800 Received: from zmailer.org (mail.zmailer.org [194.252.70.162]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2NGlwM03870 for ; Fri, 23 Mar 2001 08:47:58 -0800 Received: (from localhost user: 'mea' uid#500 fake: STDIN (mea@zmailer.org)) by mail.zmailer.org id ; Fri, 23 Mar 2001 18:47:22 +0200 Date: Fri, 23 Mar 2001 18:47:22 +0200 From: Matti Aarnio To: EOIN RYAN <9726179@student.ul.ie> Cc: netdev@oss.sgi.com Subject: Re: Kernel Network Implementation help. Message-ID: <20010323184722.S23336@mea-ext.zmailer.org> References: <992C0C12C388D411B264009027AA3418698F31@gabriel.ul.ie> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <992C0C12C388D411B264009027AA3418698F31@gabriel.ul.ie>; from 9726179@student.ul.ie on Fri, Mar 23, 2001 at 01:38:34PM +0000 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1413 Lines: 32 On Fri, Mar 23, 2001 at 01:38:34PM +0000, EOIN RYAN wrote: > Hi, > > I'm a final year student in the University of Limerick, Ireland. Part of my > final year project requires me to develop a kernel module which provides an > network interface for other modules on the system. I have found getting > exact information quite difficult, with my main source being the socket.c > file. I'm having major problems with the communication and my deadline is > coming agonisingly close. > > I'm running the 2.2.16 kernel on an intel and at the moment I'm trying to > get a server to accept a connection and receive a few characters from it. > The server accepts a telnet connection and gets the name of the connecting > session however when I try to do a sock->ops->recvmsg i get a efault. The recvmsg() does copy to userspace, and your reception buffer is in kernel space ? No panic. See for example net/ipv6/addrconf.c file for calls of get_fs(), and set_fs() (with KERNEL_DS). (Many files have KERNEL_DS parameter for set_fs() call, those all do it for same reason.) This is generic issue in fact, most kernel functions refer to data residing in user space, and thus you need to temporarily switch your FS (originally some intel F-Segment-Register ?) to kernel side from its normal (USER_DS) value. > Sincerely, > Eoin Ryan. /Matti Aarnio -- not a Guru per se, just "oldtimer" hacker... From owner-netdev@oss.sgi.com Fri Mar 23 12:19:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2NKJdM08550 for netdev-outgoing; Fri, 23 Mar 2001 12:19:39 -0800 Received: from mailbox2.ucsd.edu (mailbox2.ucsd.edu [132.239.1.54]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2NKJdM08547 for ; Fri, 23 Mar 2001 12:19:39 -0800 Received: from smtp.ucsd.edu (smtp.ucsd.edu [132.239.1.49]) by mailbox2.ucsd.edu (8.11.0/8.11.0) with ESMTP id f2NKKNR09761; Fri, 23 Mar 2001 12:20:23 -0800 (PST) Received: from cs.ucsd.edu (cse-air-dhcp-189.ucsd.edu [132.239.10.189]) by smtp.ucsd.edu (8.9.3/8.9.3) with ESMTP id MAA22531; Fri, 23 Mar 2001 12:18:55 -0800 (PST) Message-ID: <3ABBAFC2.6991DBA2@cs.ucsd.edu> Date: Fri, 23 Mar 2001 12:19:14 -0800 From: Federico David Sacerdoti Organization: University of California San Diego X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i586) X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: "David S. Miller" , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: A TCP monitoring /proc/net file References: <3ABA9E98.81D1413E@cs.ucsd.edu> <15034.40807.977418.671551@pizda.ninka.net> <20010323093918.A1476@fred.local> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 793 Lines: 24 The external monitoring made possible by the /proc/net/tcphealth is interesting because the SRTT is proportional to the speed of one's network connection, and duplicate acks indicate that packets are being lost (or reordered, less likely) somewhere in the network. These are things you might want to know about a connection you are trying to communicate on - its individual latency and how often packets are being lost over it. Would a patch for 2.4.2 be helpful? Dave Andi Kleen wrote: > On Fri, Mar 23, 2001 at 01:57:11AM +0100, David S. Miller wrote: > > > > See the TCP_INFO socket option we added to 2.4.x > > Sadly TCP_INFO can not be used for external monitoring currently > (at least not without very bad and racy hacks to allow /proc to open sockets > in /proc/pid/fd) > > -Andi From owner-netdev@oss.sgi.com Fri Mar 23 23:20:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2O7Kss21420 for netdev-outgoing; Fri, 23 Mar 2001 23:20:54 -0800 Received: from isis.its.uow.edu.au (isis.its.uow.edu.au [130.130.68.21]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2O7KpM21417 for ; Fri, 23 Mar 2001 23:20:52 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by isis.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id SAA05996; Sat, 24 Mar 2001 18:20:31 +1100 (EST) Message-ID: <3ABC4B33.F1833B5A@uow.edu.au> Date: Sat, 24 Mar 2001 18:22:27 +1100 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.3-pre3 i586) X-Accept-Language: en MIME-Version: 1.0 To: Jan Harkes CC: Andi Kleen , "netdev@oss.sgi.com" Subject: Re: Adding just a pinch of icache/dcache pressure... References: <20010323015358Z129164-406+3041@vger.kernel.org> <20010323122815.A6428@win.tue.nl> <3ABB6833.183E9188@mandrakesoft.com> <20010323111056.A9332@cs.cmu.edu> <20010323171716.28420@colin.muc.de>, <20010323171716.28420@colin.muc.de>; from ak@muc.de on Fri, Mar 23, 2001 at 05:17:16PM +0100 <20010323115123.A12720@cs.cmu.edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2068 Lines: 73 [ switched lists ] Jan Harkes wrote: > > On Fri, Mar 23, 2001 at 05:17:16PM +0100, Andi Kleen wrote: > > On Fri, Mar 23, 2001 at 05:10:56PM +0100, Jan Harkes wrote: > > > btw. There definitely is a network receive buffer leak somewhere in > > > either the 3c905C path or higher up in the network layers (2.4.0 or > > > 2.4.1). The normal path does not leak anything. > > > > What do you mean with "normal path" ? > > > > And are you sure it was a leak? TCP can buffer quite a bit of skbs, but it > > should be bounded based on the number of sockets. > > > > -Andi > > No corrupted packets. I was pretty sure it was a leak once I noticed > that most of my memory got allocated here: > > Top 10 of the not yet freed allocations taken from /proc/memleak in an > IKD-patched 2.4.2 kernel a couple of weeks ago: > > memleak/01-02-27__15:44:19 > 74603 buffer.c:1234 > 42956 3c59x.c:2232 > 13025 dcache.c:598 > 12392 inode.c:665 > 5921 dcache.c:603 > 4480 ll_rw_blk.c:397 > 2304 raid5.c:154 > 2105 mmap.c:276 > 2064 af_unix.c:1340 > 1312 file_table.c:62 > I tried to reproduce this memory leak and failed. Added some code to netif_rx() to corrupt incoming packets: --- linux-2.4.2-ac24/net/core/dev.c Sat Mar 24 14:28:25 2001 +++ ac/net/core/dev.c Sat Mar 24 17:53:18 2001 @@ -1194,6 +1194,21 @@ struct softnet_data *queue; unsigned long flags; +{ + extern int akpm; + if (akpm) { + static int stomper; + + stomper++; + if (stomper >= akpm) { + stomper = 0; + if (skb->len > 100) { + skb->ip_summed = CHECKSUM_NONE; + skb->data[88]++; + } + } + } +} if (skb->stamp.tv_sec == 0) get_fast_time(&skb->stamp); `akpm' is a sysctl. I set it to 17. TCP and NFS throughput were of course quite horrid, but no sign in /proc/slabinfo of any memory leaks. Tested both 2.4.3-pre7 and 2.4.2-ac24. Can you suggest any other way of reproducing this? Have I missed something? Do you think your broken ethernet switch was corrupting data at layer2 (ethernet checksum will catch it) or at layer 3 (IP checksums)? From owner-netdev@oss.sgi.com Sat Mar 24 05:09:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2OD9Xq26111 for netdev-outgoing; Sat, 24 Mar 2001 05:09:33 -0800 Received: from laurin.munich.netsurf.de (laurin.munich.netsurf.de [194.64.166.1]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2OD9VM26108 for ; Sat, 24 Mar 2001 05:09:31 -0800 Received: from fred.muc.de (noidentity@ns1232.munich.netsurf.de [195.180.235.232]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id OAA13834; Sat, 24 Mar 2001 14:09:07 +0100 (MET) Received: by fred.muc.de (Postfix, from userid 500) id CED44E3C87; Sat, 24 Mar 2001 13:34:24 +0100 (CET) Date: Sat, 24 Mar 2001 13:34:24 +0100 From: Andi Kleen To: Federico David Sacerdoti Cc: Andi Kleen , "David S. Miller" , kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com, linux-net@vger.kernel.org Subject: Re: A TCP monitoring /proc/net file Message-ID: <20010324133424.A779@fred.local> References: <3ABA9E98.81D1413E@cs.ucsd.edu> <15034.40807.977418.671551@pizda.ninka.net> <20010323093918.A1476@fred.local> <3ABBAFC2.6991DBA2@cs.ucsd.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <3ABBAFC2.6991DBA2@cs.ucsd.edu>; from fds@cs.ucsd.edu on Fri, Mar 23, 2001 at 09:19:14PM +0100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 855 Lines: 18 On Fri, Mar 23, 2001 at 09:19:14PM +0100, Federico David Sacerdoti wrote: > The external monitoring made possible by the /proc/net/tcphealth is > interesting because the SRTT is proportional to the speed of one's > network connection, and duplicate acks indicate that packets are being > lost (or reordered, less likely) somewhere in the network. 2.4 has a special state machine to detect reordering when the connection supports timestamps. I guess some long term statistics (currently TCP_INFO only dumps current state) would be useful too, but it's David's call if he want to put in the few cycles that'll cost (probably only in slow paths anyways) I guess it would be better if you would put it into the existing TCP_INFO framework, perhaps with an additional /proc frontend to TCP_INFO. Having two ways to do a similar thing is not good. -Andi From owner-netdev@oss.sgi.com Sat Mar 24 08:08:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2OG8qP28999 for netdev-outgoing; Sat, 24 Mar 2001 08:08:52 -0800 Received: from ravel.coda.cs.cmu.edu (mail@RAVEL.CODA.CS.CMU.EDU [128.2.222.215]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2OG8pM28996 for ; Sat, 24 Mar 2001 08:08:51 -0800 Received: from jaharkes by ravel.coda.cs.cmu.edu with local (Exim 3.22 #1 (Debian)) id 14gqaU-00067l-00; Sat, 24 Mar 2001 11:08:22 -0500 Date: Sat, 24 Mar 2001 11:08:22 -0500 To: Andrew Morton Cc: Andi Kleen , "netdev@oss.sgi.com" Subject: Re: Adding just a pinch of icache/dcache pressure... Message-ID: <20010324110822.A23436@cs.cmu.edu> References: <20010323015358Z129164-406+3041@vger.kernel.org> <20010323122815.A6428@win.tue.nl> <3ABB6833.183E9188@mandrakesoft.com> <20010323111056.A9332@cs.cmu.edu> <20010323171716.28420@colin.muc.de>, <20010323171716.28420@colin.muc.de>; <20010323115123.A12720@cs.cmu.edu> <3ABC4B33.F1833B5A@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.15i In-Reply-To: <3ABC4B33.F1833B5A@uow.edu.au>; from andrewm@uow.edu.au on Sat, Mar 24, 2001 at 06:22:27PM +1100 From: Jan Harkes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1336 Lines: 30 On Sat, Mar 24, 2001 at 06:22:27PM +1100, Andrew Morton wrote: > I tried to reproduce this memory leak and failed. Added some > code to netif_rx() to corrupt incoming packets: ... > Can you suggest any other way of reproducing this? Have I missed > something? > > Do you think your broken ethernet switch was corrupting > data at layer2 (ethernet checksum will catch it) or at > layer 3 (IP checksums)? I have no idea, I didn't notice anything at first, except for a network slowdown which the others on my floor were seeing as well and everyone was assuming simple network congestion. My machine went OOM about twice in that week, which triggered me to started looking around. At first I was suspecting some leak in Coda (my neck of the woods), but noticed in slabinfo that a lot of size-2048 slabs were allocated. Patched ikd into my system and ran it for a day to find out when it happened and where. I looked at the 3c59x driver and could not find anything wrong with it. The next day everything was back to normal. I do have a SMP system, so it could be an obscure race. The thing that might have been triggering it were the backups that consisted of about 121 tcp connection with a total datatransfer of about 1GB. But it could also be UDP fragments, Coda tends to send 4KB UDP packets when revalidating it's caches. Jan From owner-netdev@oss.sgi.com Sat Mar 24 09:27:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2OHRjw30731 for netdev-outgoing; Sat, 24 Mar 2001 09:27:45 -0800 Received: from isis.its.uow.edu.au (isis.its.uow.edu.au [130.130.68.21]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2OHRhM30727 for ; Sat, 24 Mar 2001 09:27:43 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by isis.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id DAA05144; Sun, 25 Mar 2001 03:27:26 +1000 (EST) Message-ID: <3ABCD973.FC568447@uow.edu.au> Date: Sun, 25 Mar 2001 03:29:23 +1000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.3-pre3 i586) X-Accept-Language: en MIME-Version: 1.0 To: Jan Harkes CC: Andi Kleen , "netdev@oss.sgi.com" Subject: Re: Adding just a pinch of icache/dcache pressure... References: <20010323015358Z129164-406+3041@vger.kernel.org> <20010323122815.A6428@win.tue.nl> <3ABB6833.183E9188@mandrakesoft.com> <20010323111056.A9332@cs.cmu.edu> <20010323171716.28420@colin.muc.de>, <20010323171716.28420@colin.muc.de>; <20010323115123.A12720@cs.cmu.edu> <3ABC4B33.F1833B5A@uow.edu.au>, <3ABC4B33.F1833B5A@uow.edu.au>; from andrewm@uow.edu.au on Sat, Mar 24, 2001 at 06:22:27PM +1100 <20010324110822.A23436@cs.cmu.edu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 474 Lines: 11 Jan Harkes wrote: > > I do have a SMP system, so it could be an obscure race. The thing that > might have been triggering it were the backups that consisted of about > 121 tcp connection with a total datatransfer of about 1GB. But it could > also be UDP fragments, Coda tends to send 4KB UDP packets when > revalidating it's caches. > OK, thanks. I'll test UDP with large packets and random corruption. I guess testing with NFS client didn't cover that path adequately. From owner-netdev@oss.sgi.com Sun Mar 25 11:37:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2PJbar22474 for netdev-outgoing; Sun, 25 Mar 2001 11:37:36 -0800 Received: from smtp01.netkracker.com (smtp01.netkracker.com [202.177.128.2]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2PJbWM22471 for ; Sun, 25 Mar 2001 11:37:33 -0800 Received: from sharath ([210.18.2.115]) by smtp01.netkracker.com (Netscape Messaging Server 4.15) with SMTP id GARQNL00.P96 for ; Mon, 26 Mar 2001 01:10:33 +0530 Message-ID: <000201c0b5c7$b88ad700$730212d2@sharath> From: "Ramesh Udupa" To: Subject: request for advice Date: Fri, 23 Mar 2001 09:52:47 +0530 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0009_01C0B37F.0447B280" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1519 Lines: 46 This is a multi-part message in MIME format. ------=_NextPart_000_0009_01C0B37F.0447B280 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable hello, i am a computer science engineering student & very interested in working = in linux. i am interested in more specifically in TCP/IP in linux. could = you suggest some projects & any ongoing work to which i could contribute = (programming).=20 Please do reply to this mail, Thank you, Sharath K Udupa ------=_NextPart_000_0009_01C0B37F.0447B280 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
hello,
 
i am a computer science engineering = student &=20 very interested in working in linux. i am interested in more = specifically in=20 TCP/IP in linux. could you suggest some projects & any ongoing work = to which=20 i could contribute (programming).
Please do reply to this = mail,
Thank you,
Sharath K=20 Udupa
------=_NextPart_000_0009_01C0B37F.0447B280-- From owner-netdev@oss.sgi.com Sun Mar 25 14:41:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2PMflt26484 for netdev-outgoing; Sun, 25 Mar 2001 14:41:47 -0800 Received: from mandrakesoft.mandrakesoft.com (IDENT:jgarzik@mandrakesoft.mandrakesoft.com [216.71.84.35]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2PMfkM26481 for ; Sun, 25 Mar 2001 14:41:46 -0800 Received: (from jgarzik@localhost) by mandrakesoft.mandrakesoft.com (8.8.5/8.8.5) id QAA29089; Sun, 25 Mar 2001 16:41:43 -0600 Date: Sun, 25 Mar 2001 16:41:43 -0600 Message-Id: <200103252241.QAA29089@mandrakesoft.mandrakesoft.com> From: Jeff Garzik To: Linus Torvalds CC: Andrew Morton , davem@redhat.com, netdev@oss.sgi.com Subject: PATCH 2.4.3.7: ethtool interface addition Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1788 Lines: 44 RFC sent out to list, with no objections. Adds driver info exportable via ioctl. Exporting bus info allows userspace utilities to make the critical association between registered interface and bus address. Exporting driver name and version info is done in various hack-y ways, this puts such code in one location. Further, this allows userspace utilities which support more than one driver type to determine which set of driver-private ioctls to use. diff -u linux_2_4/include/linux/ethtool.h:1.1.1.2 linux_2_4/include/linux/ethtool.h:1.1.1.2.140.1 --- linux_2_4/include/linux/ethtool.h:1.1.1.2 Tue Nov 14 14:01:49 2000 +++ linux_2_4/include/linux/ethtool.h Fri Mar 23 20:04:23 2001 @@ -1,4 +1,4 @@ -/* $Id: ethtool.h,v 1.1.1.2 2000/11/14 22:01:49 jgarzik Exp $ +/* $Id: ethtool.h,v 1.1.1.2.140.1 2001/03/24 04:04:23 jgarzik Exp $ * ethtool.h: Defines for Linux ethtool. * * Copyright (C) 1998 David S. Miller (davem@redhat.com) @@ -24,10 +24,22 @@ u32 reserved[4]; }; +/* these strings are set to whatever the driver author decides... */ +struct ethtool_drvinfo { + u32 cmd; + char driver[32]; /* driver short name, "tulip", "eepro100" */ + char version[32]; /* driver version string */ + char fw_version[32]; /* firmware version string, if applicable */ + char bus_info[32]; /* Bus info for this interface. For PCI + * devices, use pci_dev->slot_name. */ + char reserved1[32]; + char reserved2[32]; +}; /* CMDs currently supported */ -#define ETHTOOL_GSET 0x00000001 /* Get settings, non-privileged. */ +#define ETHTOOL_GSET 0x00000001 /* Get settings. */ #define ETHTOOL_SSET 0x00000002 /* Set settings, privileged. */ +#define ETHTOOL_GDRVINFO 0x00000003 /* Get driver info. */ /* compatibility with older code */ #define SPARC_ETH_GSET ETHTOOL_GSET From owner-netdev@oss.sgi.com Sun Mar 25 14:42:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2PMgim26655 for netdev-outgoing; Sun, 25 Mar 2001 14:42:44 -0800 Received: from mandrakesoft.mandrakesoft.com (IDENT:jgarzik@mandrakesoft.mandrakesoft.com [216.71.84.35]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2PMggM26646 for ; Sun, 25 Mar 2001 14:42:42 -0800 Received: (from jgarzik@localhost) by mandrakesoft.mandrakesoft.com (8.8.5/8.8.5) id QAA29192; Sun, 25 Mar 2001 16:41:58 -0600 Date: Sun, 25 Mar 2001 16:41:58 -0600 Message-Id: <200103252241.QAA29192@mandrakesoft.mandrakesoft.com> From: Jeff Garzik To: Linus Torvalds CC: Andrew Morton , alan@lxorguk.ukuu.org.uk, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: PATCH 2.4.3.7: net drvr probe fix Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 19551 Lines: 641 The attached patch provides a solution for the problem where an interface is not completely ready by the time /sbin/hotplug is called, from init_etherdev. The patch also includes semi-related cleanups and fixes found along the way. Changes: * Add alloc_etherdev, alloc_fddidev, alloc_hippi_dev, alloc_trdev, alloc_fcdev * Add register_hipdev for API completeness * Add inline source docs for init_fddidev, init_hippi_dev, init_trdev, init_fcdev * Move EXPORT_SYMBOL for public functions from net/netsyms.c to drivers/net/net_init.c. * Remove duplicate code by making unregister_foo functions simply call unregister_netdev() * Remove duplicate code by making register_foo functions simply call new static function __register_netdev() * Propagate returned error codes in register_netdev() * Rename private tr_configure() to public tr_setup(), and remove no-op tr_setup() function. diff -u linux_2_4/drivers/net/net_init.c:1.1.1.8 linux_2_4/drivers/net/net_init.c:1.1.1.8.38.2 --- linux_2_4/drivers/net/net_init.c:1.1.1.8 Mon Feb 26 19:03:50 2001 +++ linux_2_4/drivers/net/net_init.c Sun Mar 25 12:43:06 2001 @@ -28,10 +28,12 @@ up. We now share common code and have regularised name allocation setups. Abolished the 16 card limits. 03/19/2000 - jgarzik and Urban Widmark: init_etherdev 32-byte align + 03/21/2001 - jgarzik: alloc_etherdev and friends */ #include +#include #include #include #include @@ -68,6 +70,33 @@ */ +static struct net_device *alloc_netdev(int sizeof_priv, const char *mask, + void (*setup)(struct net_device *)) +{ + struct net_device *dev; + int alloc_size; + + /* ensure 32-byte alignment of the private area */ + alloc_size = sizeof (*dev) + sizeof_priv + 31; + + dev = (struct net_device *) kmalloc (alloc_size, GFP_KERNEL); + if (dev == NULL) + { + printk(KERN_ERR "alloc_dev: Unable to allocate device memory.\n"); + return NULL; + } + + memset(dev, 0, alloc_size); + + if (sizeof_priv) + dev->priv = (void *) (((long)(dev + 1) + 31) & ~31); + + setup(dev); + strcpy(dev->name, mask); + + return dev; +} + static struct net_device *init_alloc_dev(int sizeof_priv) { struct net_device *dev; @@ -142,6 +171,17 @@ return dev; } +static int __register_netdev(struct net_device *dev) +{ + dev_init_buffers(dev); + + if (dev->init && dev->init(dev) != 0) { + unregister_netdev(dev); + return -EIO; + } + return 0; +} + /** * init_etherdev - Register ethernet device * @dev: An ethernet device structure to be filled in, or %NULL if a new @@ -164,6 +204,25 @@ return init_netdev(dev, sizeof_priv, "eth%d", ether_setup); } +/** + * alloc_etherdev - Register ethernet device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with ethernet-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ + +struct net_device *alloc_etherdev(int sizeof_priv) +{ + return alloc_netdev(sizeof_priv, "eth%d", ether_setup); +} + +EXPORT_SYMBOL(init_etherdev); +EXPORT_SYMBOL(alloc_etherdev); static int eth_mac_addr(struct net_device *dev, void *p) { @@ -184,11 +243,48 @@ #ifdef CONFIG_FDDI +/** + * init_fddidev - Register FDDI device + * @dev: A FDDI device structure to be filled in, or %NULL if a new + * struct should be allocated. + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with FDDI-generic values. + * + * If no device structure is passed, a new one is constructed, complete with + * a private data area of size @sizeof_priv. A 32-byte (not bit) + * alignment is enforced for this private data area. + * + * If an empty string area is passed as dev->name, or a new structure is made, + * a new name string is constructed. + */ + struct net_device *init_fddidev(struct net_device *dev, int sizeof_priv) { return init_netdev(dev, sizeof_priv, "fddi%d", fddi_setup); } +/** + * alloc_fddidev - Register FDDI device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this FDDI device + * + * Fill in the fields of the device structure with FDDI-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ + +struct net_device *alloc_fddidev(int sizeof_priv) +{ + return alloc_netdev(sizeof_priv, "fddi%d", fddi_setup); +} + +EXPORT_SYMBOL(init_fddidev); +EXPORT_SYMBOL(alloc_fddidev); + static int fddi_change_mtu(struct net_device *dev, int new_mtu) { if ((new_mtu < FDDI_K_SNAP_HLEN) || (new_mtu > FDDI_K_SNAP_DLEN)) @@ -227,19 +323,59 @@ } +/** + * init_hippi_dev - Register HIPPI device + * @dev: A HIPPI device structure to be filled in, or %NULL if a new + * struct should be allocated. + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with HIPPI-generic values. + * + * If no device structure is passed, a new one is constructed, complete with + * a private data area of size @sizeof_priv. A 32-byte (not bit) + * alignment is enforced for this private data area. + * + * If an empty string area is passed as dev->name, or a new structure is made, + * a new name string is constructed. + */ + struct net_device *init_hippi_dev(struct net_device *dev, int sizeof_priv) { return init_netdev(dev, sizeof_priv, "hip%d", hippi_setup); } +/** + * alloc_hippi_dev - Register HIPPI device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this HIPPI device + * + * Fill in the fields of the device structure with HIPPI-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ +struct net_device *alloc_hippi_dev(int sizeof_priv) +{ + return alloc_netdev(sizeof_priv, "hip%d", hippi_setup); +} + +int register_hipdev(struct net_device *dev) +{ + return __register_netdev(dev); +} + void unregister_hipdev(struct net_device *dev) { - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); + unregister_netdev(dev); } +EXPORT_SYMBOL(init_hippi_dev); +EXPORT_SYMBOL(alloc_hippi_dev); +EXPORT_SYMBOL(register_hipdev); +EXPORT_SYMBOL(unregister_hipdev); static int hippi_neigh_setup_dev(struct net_device *dev, struct neigh_parms *p) { @@ -283,6 +419,7 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(ether_setup); #ifdef CONFIG_FDDI @@ -312,6 +449,7 @@ return; } +EXPORT_SYMBOL(fddi_setup); #endif /* CONFIG_FDDI */ @@ -349,6 +487,7 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(hippi_setup); #endif /* CONFIG_HIPPI */ #if defined(CONFIG_ATALK) || defined(CONFIG_ATALK_MODULE) @@ -387,6 +526,7 @@ dev_init_buffers(dev); } +EXPORT_SYMBOL(ltalk_setup); #endif /* CONFIG_ATALK || CONFIG_ATALK_MODULE */ @@ -403,8 +543,8 @@ if (strchr(dev->name, '%')) { - err = -EBUSY; - if(dev_alloc_name(dev, dev->name)<0) + err = dev_alloc_name(dev, dev->name); + if (err < 0) goto out; } @@ -414,17 +554,12 @@ if (dev->name[0]==0 || dev->name[0]==' ') { - err = -EBUSY; - if(dev_alloc_name(dev, "eth%d")<0) + err = dev_alloc_name(dev, "eth%d"); + if (err < 0) goto out; } - - - err = -EIO; - if (register_netdevice(dev)) - goto out; - err = 0; + err = register_netdevice(dev); out: rtnl_unlock(); @@ -438,10 +573,12 @@ rtnl_unlock(); } +EXPORT_SYMBOL(register_netdev); +EXPORT_SYMBOL(unregister_netdev); #ifdef CONFIG_TR -static void tr_configure(struct net_device *dev) +void tr_setup(struct net_device *dev) { /* * Configure and register @@ -462,32 +599,61 @@ dev->flags = IFF_BROADCAST | IFF_MULTICAST ; } +/** + * init_trdev - Register token ring device + * @dev: A token ring device structure to be filled in, or %NULL if a new + * struct should be allocated. + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with token ring-generic values. + * + * If no device structure is passed, a new one is constructed, complete with + * a private data area of size @sizeof_priv. A 32-byte (not bit) + * alignment is enforced for this private data area. + * + * If an empty string area is passed as dev->name, or a new structure is made, + * a new name string is constructed. + */ + struct net_device *init_trdev(struct net_device *dev, int sizeof_priv) { - return init_netdev(dev, sizeof_priv, "tr%d", tr_configure); + return init_netdev(dev, sizeof_priv, "tr%d", tr_setup); } -void tr_setup(struct net_device *dev) +/** + * alloc_trdev - Register token ring device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this token ring device + * + * Fill in the fields of the device structure with token ring-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ + +struct net_device *alloc_trdev(int sizeof_priv) { + return alloc_netdev(sizeof_priv, "tr%d", tr_setup); } int register_trdev(struct net_device *dev) { - dev_init_buffers(dev); - - if (dev->init && dev->init(dev) != 0) { - unregister_trdev(dev); - return -EIO; - } - return 0; + return __register_netdev(dev); } void unregister_trdev(struct net_device *dev) { - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); + unregister_netdev(dev); } + +EXPORT_SYMBOL(tr_setup); +EXPORT_SYMBOL(init_trdev); +EXPORT_SYMBOL(alloc_trdev); +EXPORT_SYMBOL(register_trdev); +EXPORT_SYMBOL(unregister_trdev); + #endif /* CONFIG_TR */ @@ -509,31 +675,62 @@ /* New-style flags. */ dev->flags = IFF_BROADCAST; dev_init_buffers(dev); - return; } +/** + * init_fcdev - Register fibre channel device + * @dev: A fibre channel device structure to be filled in, or %NULL if a new + * struct should be allocated. + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this ethernet device + * + * Fill in the fields of the device structure with fibre channel-generic values. + * + * If no device structure is passed, a new one is constructed, complete with + * a private data area of size @sizeof_priv. A 32-byte (not bit) + * alignment is enforced for this private data area. + * + * If an empty string area is passed as dev->name, or a new structure is made, + * a new name string is constructed. + */ struct net_device *init_fcdev(struct net_device *dev, int sizeof_priv) { return init_netdev(dev, sizeof_priv, "fc%d", fc_setup); } +/** + * alloc_fcdev - Register fibre channel device + * @sizeof_priv: Size of additional driver-private structure to be allocated + * for this fibre channel device + * + * Fill in the fields of the device structure with fibre channel-generic values. + * + * Constructs a new net device, complete with a private data area of + * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for + * this private data area. + */ + +struct net_device *alloc_fcdev(int sizeof_priv) +{ + return alloc_netdev(sizeof_priv, "fc%d", fc_setup); +} + int register_fcdev(struct net_device *dev) { - dev_init_buffers(dev); - if (dev->init && dev->init(dev) != 0) { - unregister_fcdev(dev); - return -EIO; - } - return 0; + return __register_netdev(dev); } void unregister_fcdev(struct net_device *dev) { - rtnl_lock(); - unregister_netdevice(dev); - rtnl_unlock(); + unregister_netdev(dev); } + +EXPORT_SYMBOL(fc_setup); +EXPORT_SYMBOL(init_fcdev); +EXPORT_SYMBOL(alloc_fcdev); +EXPORT_SYMBOL(register_fcdev); +EXPORT_SYMBOL(unregister_fcdev); #endif /* CONFIG_NET_FC */ diff -u linux_2_4/include/linux/etherdevice.h:1.1.1.15 linux_2_4/include/linux/etherdevice.h:1.1.1.15.4.1 --- linux_2_4/include/linux/etherdevice.h:1.1.1.15 Fri Mar 23 08:14:53 2001 +++ linux_2_4/include/linux/etherdevice.h Fri Mar 23 20:04:23 2001 @@ -38,19 +38,28 @@ struct hh_cache *hh); extern int eth_header_parse(struct sk_buff *skb, unsigned char *haddr); -extern struct net_device * init_etherdev(struct net_device *, int); +extern struct net_device *init_etherdev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_etherdev(int sizeof_priv); -static __inline__ void eth_copy_and_sum (struct sk_buff *dest, unsigned char *src, int len, int base) +static inline void eth_copy_and_sum (struct sk_buff *dest, unsigned char *src, int len, int base) { memcpy (dest->data, src, len); } -/* Check that the ethernet address (MAC) is not 00:00:00:00:00:00 and is not - * a multicast address. Return true if the address is valid. +/** + * is_valid_ether_addr - Determine if the given Ethernet address is valid + * @addr: Pointer to a six-byte array containing the Ethernet address + * + * Check that the Ethernet address (MAC) is not 00:00:00:00:00:00, is not + * a multicast address, and is not FF:FF:FF:FF:FF:FF. + * + * Return true if the address is valid. */ -static __inline__ int is_valid_ether_addr( u8 *addr ) +static inline int is_valid_ether_addr( u8 *addr ) { - return !(addr[0]&1) && memcmp( addr, "\0\0\0\0\0\0", 6); + const char zaddr[6] = {0,}; + + return !(addr[0]&1) && memcmp( addr, zaddr, 6); } #endif diff -u linux_2_4/include/linux/fcdevice.h:1.1.1.1 linux_2_4/include/linux/fcdevice.h:1.1.1.1.182.1 --- linux_2_4/include/linux/fcdevice.h:1.1.1.1 Sun Oct 22 12:36:14 2000 +++ linux_2_4/include/linux/fcdevice.h Fri Mar 23 20:04:23 2001 @@ -33,7 +33,10 @@ extern int fc_rebuild_header(struct sk_buff *skb); //extern unsigned short fc_type_trans(struct sk_buff *skb, struct net_device *dev); -extern struct net_device * init_fcdev(struct net_device *, int); +extern struct net_device *init_fcdev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_fcdev(int sizeof_priv); +extern int register_fcdev(struct net_device *dev); +extern void unregister_fcdev(struct net_device *dev); #endif diff -u linux_2_4/include/linux/fddidevice.h:1.1.1.2 linux_2_4/include/linux/fddidevice.h:1.1.1.2.174.1 --- linux_2_4/include/linux/fddidevice.h:1.1.1.2 Sun Oct 22 13:44:24 2000 +++ linux_2_4/include/linux/fddidevice.h Fri Mar 23 20:04:23 2001 @@ -34,7 +34,8 @@ extern int fddi_rebuild_header(struct sk_buff *skb); extern unsigned short fddi_type_trans(struct sk_buff *skb, struct net_device *dev); -extern struct net_device * init_fddidev(struct net_device *, int); +extern struct net_device *init_fddidev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_fddidev(int sizeof_priv); #endif #endif /* _LINUX_FDDIDEVICE_H */ diff -u linux_2_4/include/linux/hippidevice.h:1.1.1.1 linux_2_4/include/linux/hippidevice.h:1.1.1.1.182.1 --- linux_2_4/include/linux/hippidevice.h:1.1.1.1 Sun Oct 22 12:36:13 2000 +++ linux_2_4/include/linux/hippidevice.h Fri Mar 23 20:04:23 2001 @@ -51,7 +51,9 @@ extern void hippi_net_init(void); void hippi_setup(struct net_device *dev); -extern struct net_device *init_hippi_dev(struct net_device *, int); +extern struct net_device *init_hippi_dev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_hippi_dev(int sizeof_priv); +extern int register_hipdev(struct net_device *dev); extern void unregister_hipdev(struct net_device *dev); #endif diff -u linux_2_4/include/linux/netdevice.h:1.1.1.23 linux_2_4/include/linux/netdevice.h:1.1.1.23.2.1 --- linux_2_4/include/linux/netdevice.h:1.1.1.23 Fri Mar 23 19:21:18 2001 +++ linux_2_4/include/linux/netdevice.h Fri Mar 23 20:04:23 2001 @@ -41,6 +41,9 @@ struct divert_blk; +#define HAVE_ALLOC_NETDEV /* feature macro: alloc_xxxdev + functions are available. */ + #define NET_XMIT_SUCCESS 0 #define NET_XMIT_DROP 1 /* skb dropped */ #define NET_XMIT_CN 2 /* congestion notification */ @@ -633,10 +636,6 @@ /* Support for loadable net-drivers */ extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); -extern int register_trdev(struct net_device *dev); -extern void unregister_trdev(struct net_device *dev); -extern int register_fcdev(struct net_device *dev); -extern void unregister_fcdev(struct net_device *dev); /* Functions used for multicast support */ extern void dev_mc_upload(struct net_device *dev); extern int dev_mc_delete(struct net_device *dev, void *addr, int alen, int all); diff -u linux_2_4/include/linux/trdevice.h:1.1.1.1 linux_2_4/include/linux/trdevice.h:1.1.1.1.182.1 --- linux_2_4/include/linux/trdevice.h:1.1.1.1 Sun Oct 22 12:36:03 2000 +++ linux_2_4/include/linux/trdevice.h Fri Mar 23 20:04:23 2001 @@ -33,7 +33,10 @@ void *saddr, unsigned len); extern int tr_rebuild_header(struct sk_buff *skb); extern unsigned short tr_type_trans(struct sk_buff *skb, struct net_device *dev); -extern struct net_device * init_trdev(struct net_device *, int); +extern struct net_device *init_trdev(struct net_device *dev, int sizeof_priv); +extern struct net_device *alloc_trdev(int sizeof_priv); +extern int register_trdev(struct net_device *dev); +extern void unregister_trdev(struct net_device *dev); #endif diff -u linux_2_4/net/netsyms.c:1.1.1.25 linux_2_4/net/netsyms.c:1.1.1.25.2.1 --- linux_2_4/net/netsyms.c:1.1.1.25 Fri Mar 23 19:23:31 2001 +++ linux_2_4/net/netsyms.c Fri Mar 23 20:04:23 2001 @@ -432,33 +432,19 @@ #endif /* CONFIG_INET */ #ifdef CONFIG_TR -EXPORT_SYMBOL(tr_setup); EXPORT_SYMBOL(tr_type_trans); -EXPORT_SYMBOL(register_trdev); -EXPORT_SYMBOL(unregister_trdev); -EXPORT_SYMBOL(init_trdev); #endif -#ifdef CONFIG_NET_FC -EXPORT_SYMBOL(register_fcdev); -EXPORT_SYMBOL(unregister_fcdev); -EXPORT_SYMBOL(init_fcdev); -#endif - /* Device callback registration */ EXPORT_SYMBOL(register_netdevice_notifier); EXPORT_SYMBOL(unregister_netdevice_notifier); /* support for loadable net drivers */ #ifdef CONFIG_NET -EXPORT_SYMBOL(init_etherdev); EXPORT_SYMBOL(loopback_dev); EXPORT_SYMBOL(register_netdevice); EXPORT_SYMBOL(unregister_netdevice); -EXPORT_SYMBOL(register_netdev); -EXPORT_SYMBOL(unregister_netdev); EXPORT_SYMBOL(netdev_state_change); -EXPORT_SYMBOL(ether_setup); EXPORT_SYMBOL(dev_new_index); EXPORT_SYMBOL(dev_get_by_index); EXPORT_SYMBOL(__dev_get_by_index); @@ -469,8 +455,6 @@ EXPORT_SYMBOL(eth_type_trans); #ifdef CONFIG_FDDI EXPORT_SYMBOL(fddi_type_trans); -EXPORT_SYMBOL(fddi_setup); -EXPORT_SYMBOL(init_fddidev); #endif /* CONFIG_FDDI */ #if 0 EXPORT_SYMBOL(eth_copy_and_sum); @@ -511,8 +495,6 @@ #ifdef CONFIG_HIPPI EXPORT_SYMBOL(hippi_type_trans); -EXPORT_SYMBOL(init_hippi_dev); -EXPORT_SYMBOL(unregister_hipdev); #endif #ifdef CONFIG_SYSCTL @@ -522,12 +504,6 @@ EXPORT_SYMBOL(sysctl_ip_default_ttl); #endif #endif - -#if defined(CONFIG_ATALK) || defined(CONFIG_ATALK_MODULE) -#include -EXPORT_SYMBOL(ltalk_setup); -#endif - /* Packet scheduler modules want these. */ EXPORT_SYMBOL(qdisc_destroy); From owner-netdev@oss.sgi.com Mon Mar 26 03:59:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2QBxWZ18312 for netdev-outgoing; Mon, 26 Mar 2001 03:59:32 -0800 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2QBxUM18309 for ; Mon, 26 Mar 2001 03:59:30 -0800 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id VAA21835; Mon, 26 Mar 2001 21:59:21 +1000 (EST) Message-ID: <3ABF2F8E.B212A96B@uow.edu.au> Date: Mon, 26 Mar 2001 22:01:18 +1000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.3-pre3 i586) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: "netdev@oss.sgi.com" Subject: Compile-time versus run-time Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 687 Lines: 20 CONFIG_8139TOO_TUNE_TWISTER This implements a function which might come in handy in case you are using low quality on long cabling. It tries to match the transceiver to the cable characteristics. This is experimental since hardly documented by the manufacturer. If unsure, say N. Jeff, don't you think this sort of thing should be a module option, and not a compile-time option? It's OK for you and me - we compile kernels occasionally. But for most people, unless the distributor turns this on, they simply won't be able to access it. No? (And wouldn't it be nice to be able to get the same functionality which module options give us when using a statically linked driver?) From owner-netdev@oss.sgi.com Mon Mar 26 07:19:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2QFJkR22973 for netdev-outgoing; Mon, 26 Mar 2001 07:19:46 -0800 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2QFJkM22970 for ; Mon, 26 Mar 2001 07:19:46 -0800 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 3586B1F6D; Mon, 26 Mar 2001 10:19:44 -0500 (EST) Message-ID: <3ABF5E0F.85E19335@mandrakesoft.com> Date: Mon, 26 Mar 2001 10:19:43 -0500 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3-pre8 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton Cc: "netdev@oss.sgi.com" Subject: Re: Compile-time versus run-time References: <3ABF2F8E.B212A96B@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1346 Lines: 38 Andrew Morton wrote: > > CONFIG_8139TOO_TUNE_TWISTER > > This implements a function which might come in handy in case you > are using low quality on long cabling. It tries to match the > transceiver to the cable characteristics. This is experimental > since hardly documented by the manufacturer. If unsure, say N. > > Jeff, > > don't you think this sort of thing should be a module option, and > not a compile-time option? > > It's OK for you and me - we compile kernels occasionally. But > for most people, unless the distributor turns this on, they simply > won't be able to access it. > > No? > > (And wouldn't it be nice to be able to get the same functionality > which module options give us when using a statically linked driver?) hmm.. I guess we could make this something enabled via ioctl. I agree with your general point, but this option is marked experimental and defaults to no. Potentially the twister tuning varies based on chip (or even silicon or board revision), so while I want to provide the capability to those who can experiment with it, I'm not sure I want to provide it for general use.. Jeff -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full moon on a dark night, MandrakeSoft | and a smooth road all the way to your door. From owner-netdev@oss.sgi.com Mon Mar 26 07:33:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2QFX2q23899 for netdev-outgoing; Mon, 26 Mar 2001 07:33:02 -0800 Received: from mail.networkone.net (mail.networkone.net [209.144.112.246]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2QFX1M23896 for ; Mon, 26 Mar 2001 07:33:01 -0800 Received: (qmail 28764 invoked from network); 26 Mar 2001 15:33:01 -0000 Received: from unknown (HELO lazerus.networkone.net) (209.144.112.28) by mail.networkone.net with SMTP; 26 Mar 2001 15:33:01 -0000 Date: Mon, 26 Mar 2001 07:33:01 -0800 (PST) From: Mark Peugeot To: Andrew Morton cc: Jeff Garzik , "netdev@oss.sgi.com" Subject: Re: Compile-time versus run-time In-Reply-To: <3ABF2F8E.B212A96B@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1014 Lines: 33 Due to it's experimental nature, do you want Novices playing with this as a module parameter, or would you rather that someone who at least knew how to compile a kernal played with it... I would think the later would be more appropriate. Mark On Mon, 26 Mar 2001, Andrew Morton wrote: > CONFIG_8139TOO_TUNE_TWISTER > > This implements a function which might come in handy in case you > are using low quality on long cabling. It tries to match the > transceiver to the cable characteristics. This is experimental > since hardly documented by the manufacturer. If unsure, say N. > > Jeff, > > don't you think this sort of thing should be a module option, and > not a compile-time option? > > It's OK for you and me - we compile kernels occasionally. But > for most people, unless the distributor turns this on, they simply > won't be able to access it. > > No? > > (And wouldn't it be nice to be able to get the same functionality > which module options give us when using a statically linked driver?) > From owner-netdev@oss.sgi.com Tue Mar 27 05:16:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2RDGQp28309 for netdev-outgoing; Tue, 27 Mar 2001 05:16:26 -0800 Received: from mail.ocs.com.au (ppp0.ocs.com.au [203.34.97.3]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2RDGNM28306 for ; Tue, 27 Mar 2001 05:16:24 -0800 Received: (qmail 15414 invoked from network); 27 Mar 2001 13:16:19 -0000 Received: from ocs3.ocs-net (192.168.255.3) by mail.ocs.com.au with SMTP; 27 Mar 2001 13:16:19 -0000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Andrew Morton cc: "netdev@oss.sgi.com" Subject: Re: Compile-time versus run-time References: <3ABF2F8E.B212A96B@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 27 Mar 2001 23:16:19 +1000 Message-ID: <9186.985698979@ocs3.ocs-net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 484 Lines: 12 Andrew Morton wrote: > CONFIG_8139TOO_TUNE_TWISTER > (And wouldn't it be nice to be able to get the same functionality > which module options give us when using a statically linked driver?) On my todo list for 2.5. MODULE_PARM will be promoted to module_name.parm when the object is built in. insmod foo debug=1 or boot with foo.debug=1. It needs a mapping of source to module which is not easy to get for multi object modules in 2.4, my 2.5 makefile rewrite will make it easy. From owner-netdev@oss.sgi.com Tue Mar 27 12:14:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2RKEkR08667 for netdev-outgoing; Tue, 27 Mar 2001 12:14:46 -0800 Received: from mgw1.ul.ie (mgw1.ul.ie [136.201.1.117]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2RKEjM08664 for ; Tue, 27 Mar 2001 12:14:45 -0800 Received: from gabriel.ul.ie ([136.201.1.101]) by ul.ie (PMDF V5.2-32 #41948) with ESMTP id <0GAV0014KHBHIZ@ul.ie> for netdev@oss.sgi.com; Tue, 27 Mar 2001 21:09:17 +0100 (BST) Received: by gabriel.ul.ie with Internet Mail Service (5.5.2653.19) id ; Tue, 27 Mar 2001 21:18:26 +0100 Content-return: allowed Date: Tue, 27 Mar 2001 21:18:26 +0100 From: EOIN RYAN <9726179@student.ul.ie> Subject: Kernel Network Implementation help. To: "'netdev@oss.sgi.com'" Message-id: <992C0C12C388D411B264009027AA3418698F3A@gabriel.ul.ie> MIME-version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1070 Lines: 25 Hi all, I'm coming close to the deadline for my final year project in UL, Ireland. I'm implementing a kernel module to provide a communication interface to other modules. Thanks to some help I got from here last week, I got the server working ok. Now I'm having some mysterious problems with the client. I'm running the 2.2.16 kernel. The socket creates ok. When I try to do a connect I get an eafnosupport error. I traced this back to the file /ipv4/tcp_ipv4.c where it checks if(usin->sin_family != AF_INET). It fails and returns eafnosupport. I have definately setup the destination address correctly. I get the same problem when I leave the socket autobind as I do if I bind it myself. But one question about the binding, how do I set the source address? With the server I set sa->sin_addr.s_addr = INADDR_ANY. How do I tell it to put in my own address? I've tried a few different things but I'm not sure which is correct. s One more question. How do I resolve hostnames to ip addresses without functions like gethostbyname? Thanks in advance, Eoin. From owner-netdev@oss.sgi.com Tue Mar 27 16:57:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2S0vgu15829 for netdev-outgoing; Tue, 27 Mar 2001 16:57:42 -0800 Received: from smtp1.arnet.com.ar (host000012.arnet.net.ar [200.45.0.12] (may be forged)) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2S0veM15826 for ; Tue, 27 Mar 2001 16:57:41 -0800 Received: (qmail 7052 invoked from network); 28 Mar 2001 00:42:13 -0000 Received: AntiBombing Version 0.08 by GCM Received: ThePolice Version 0.02 by GCM Received: from host000004.arnet.net.ar (HELO mail1.arnet.com.ar) (200.45.0.4) by host000012.arnet.net.ar with SMTP; 28 Mar 2001 00:42:13 -0000 Received: from mail pickup service by mail1.arnet.com.ar with Microsoft SMTPSVC; Tue, 27 Mar 2001 21:41:44 -0300 Received: from recife.arnet.com.ar ([200.45.0.70]) by mail2.arnet.com.ar with Microsoft SMTPSVC(5.5.1877.357.35); Tue, 27 Mar 2001 10:22:00 -0300 Received: (qmail 26363 invoked from network); 27 Mar 2001 13:21:59 -0000 Received: from oss.sgi.com (216.32.174.190) by recife.arnet.com.ar with SMTP; 27 Mar 2001 13:21:59 -0000 Received: from localhost (mail@localhost) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2RDLDS28650; Tue, 27 Mar 2001 05:21:13 -0800 X-Authentication-Warning: oss.sgi.com: mail owned process doing -bs Received: by oss.sgi.com (bulk_mailer v1.13); Tue, 27 Mar 2001 05:16:26 -0800 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2RDGQp28309 for netdev-outgoing; Tue, 27 Mar 2001 05:16:26 -0800 Received: from mail.ocs.com.au (ppp0.ocs.com.au [203.34.97.3]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2RDGNM28306 for ; Tue, 27 Mar 2001 05:16:24 -0800 Received: (qmail 15414 invoked from network); 27 Mar 2001 13:16:19 -0000 Received: from ocs3.ocs-net (192.168.255.3) by mail.ocs.com.au with SMTP; 27 Mar 2001 13:16:19 -0000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Andrew Morton cc: "netdev@oss.sgi.com" Subject: Re: Compile-time versus run-time References: <3ABF2F8E.B212A96B@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 27 Mar 2001 23:16:19 +1000 Message-ID: <9186.985698979@ocs3.ocs-net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 483 Lines: 11 Andrew Morton wrote: > CONFIG_8139TOO_TUNE_TWISTER > (And wouldn't it be nice to be able to get the same functionality > which module options give us when using a statically linked driver?) On my todo list for 2.5. MODULE_PARM will be promoted to module_name.parm when the object is built in. insmod foo debug=1 or boot with foo.debug=1. It needs a mapping of source to module which is not easy to get for multi object modules in 2.4, my 2.5 makefile rewrite will make it easy. From owner-netdev@oss.sgi.com Wed Mar 28 12:16:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2SKGqx14789 for netdev-outgoing; Wed, 28 Mar 2001 12:16:52 -0800 Received: from thalia.fm.intel.com (fmfdns02.fm.intel.com [132.233.247.11]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2SKGpM14786 for ; Wed, 28 Mar 2001 12:16:51 -0800 Received: from SMTP (fmsmsxvs02-1.fm.intel.com [132.233.42.202]) by thalia.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.35 2001/02/12 09:03:45 smothers Exp $) with SMTP id UAA09708 for ; Wed, 28 Mar 2001 20:16:46 GMT Received: from fmsmsx29.FM.INTEL.COM ([132.233.48.29]) by 132.233.48.202 (Norton AntiVirus for Internet Email Gateways 1.0) ; Wed, 28 Mar 2001 20:16:45 0000 (GMT) Received: by fmsmsx29.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Wed, 28 Mar 2001 12:16:44 -0800 Message-ID: <9319DDF797C4D211AC4700A0C96B7C9404AC1DE2@orsmsx42.jf.intel.com> From: "Raj, Ashok" To: "'Netdev'" Subject: FW: destructor use in skb Date: Wed, 28 Mar 2001 12:16:41 -0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1026 Lines: 36 Hello. Iam looking for a way so that a networking device driver, after it hands the sk_buff up the protocol stack (netif_rx) right when its appropriate for the buffer to go to the free pool it checks for ref counts and then calls the destructor function with the skb ptr. Most usages dont seem to use this one, but just use this as a means to allocate another skb (using some private fields from here) and just disregard this buffer. Alan seemed to indicate there is some possible work going on here and this recycle of skb's back to the network device driver would be a big plus for us. any pointers here would be greatly apprecited ashokr -----Original Message----- From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk] Sent: Wednesday, March 28, 2001 12:00 PM To: ashok.raj@intel.com Cc: alan@lxorguk.ukuu.org.uk Subject: Re: destructor use in skb Currently its there for AF_UNIX garbage collecting. I know Dave & Ingo experimented with per device recycling. You might want to ask the netdev@oss.sgi.com list. From owner-netdev@oss.sgi.com Thu Mar 29 07:40:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2TFe8403229 for netdev-outgoing; Thu, 29 Mar 2001 07:40:08 -0800 Received: from prserv.net ([32.97.166.31]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2TFe8M03226 for ; Thu, 29 Mar 2001 07:40:08 -0800 Received: from mozart (slip139-92-58-247.mil.it.prserv.net[139.92.58.247]) by prserv.net (out1) with ESMTP id <2001032915400520102i06vce>; Thu, 29 Mar 2001 15:40:05 +0000 Received: from rustcorp.com.au (really [127.0.0.1]) by rustcorp.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.111) for ; Sat, 24 Mar 2001 20:03:16 +1100 (EST) Message-Id: From: Rusty Russell To: Thierry Coutelier Cc: netdev@oss.sgi.com Subject: Re: Rate limit in kbits/s In-reply-to: Your message of "Wed, 21 Mar 2001 13:06:53 BST." <3AB8995D.40BE3E99@linux.lu> Date: Sat, 24 Mar 2001 20:03:16 +1100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 621 Lines: 19 In message <3AB8995D.40BE3E99@linux.lu> you write: > > I'd like to have rate limit based on bits/s instead of packets. > When the limit is reached i want to decrease the window size (cwnd) > instead > of dropping packets. Do not do this! It is illegal to reduce window size; we are having fun with this at the moment. Also, it makes much more sense to use tc, not iptables, to do this. And you're better off setting the ECN bit on ECN-enabled packets, and dropping non-ECN packets. Hope that helps, Rusty. PS. netfilter@lists.samba.org is the mailing list address... -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Thu Mar 29 07:40:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2TFeui03271 for netdev-outgoing; Thu, 29 Mar 2001 07:40:56 -0800 Received: from prserv.net ([32.97.166.34]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2TFetM03268 for ; Thu, 29 Mar 2001 07:40:55 -0800 Received: from mozart (slip139-92-58-247.mil.it.prserv.net[139.92.58.247]) by prserv.net (out4) with ESMTP id <2001032915405320400j5u62e>; Thu, 29 Mar 2001 15:40:53 +0000 Received: from rustcorp.com.au (really [127.0.0.1]) by rustcorp.com.au via in.smtpd with esmtp id (Debian Smail3.2.0.111) for ; Sat, 24 Mar 2001 20:09:57 +1100 (EST) Message-Id: From: Rusty Russell To: Michael Schoen Cc: netdev@oss.sgi.com Subject: Re: Kernel Network Design Problem.. In-reply-to: Your message of "Wed, 21 Mar 2001 16:02:15 BST." <20010321160215.A14520@anduras.de> Date: Sat, 24 Mar 2001 20:09:57 +1100 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 930 Lines: 23 In message <20010321160215.A14520@anduras.de> you write: > I used the following IPs: > CLIENT: eth0: 192.168.1.2 > SERVER: eth0: 192.168.1.1 > eth1: 10.10.10.1 > > CLIENT AND SERVER are physically connected by a Cross-Cable, thereดs > no cable connected to eth1 at all. > > routing-entry on | routing on | ifconfig eth1 | SERVER responding a > CLIENT | SERVER | on SERVER | ping* 10.10.10.1 > | | | from the CLIENT > ---------------------------------------------------------------------------------- > 192.168.1.0/24 (eth0) | 192.168.1.0/24 (eth0) | down | yes I think you need to flush the route cache when you down eth1: echo 1 > /proc/sys/net/ipv4/route/flush If this is true, I'm surprised that ifconfig doesn't do it itself. Rusty. -- Premature optmztion is rt of all evl. --DK From owner-netdev@oss.sgi.com Thu Mar 29 10:10:40 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2TIAe407649 for netdev-outgoing; Thu, 29 Mar 2001 10:10:40 -0800 Received: from mgw-x1.nokia.com (mgw-x1.nokia.com [131.228.20.21]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2TIAdM07646 for ; Thu, 29 Mar 2001 10:10:39 -0800 Received: from esvir01nok.ntc.nokia.com (esvir01nokt.ntc.nokia.com [172.21.143.33]) by mgw-x1.nokia.com (Switch-2.1.0/Switch-2.1.0) with ESMTP id f2TIAL214783 for ; Thu, 29 Mar 2001 21:10:21 +0300 (EET DST) Received: from esebh25nok.ntc.nokia.com (unverified) by esvir01nok.ntc.nokia.com (Content Technologies SMTPRS 4.2.1) with ESMTP id for ; Thu, 29 Mar 2001 21:10:37 +0300 Received: by esebh25nok with Internet Mail Service (5.5.2652.78) id ; Thu, 29 Mar 2001 21:10:37 +0300 Message-ID: <2D6CADE9B0C6D411A27500508BB3CBD063CEB2@eseis15nok> From: Imran.Patel@nokia.com To: netdev@oss.sgi.com Subject: bug in ipv6 extension header parsing?? Date: Thu, 29 Mar 2001 21:10:36 +0300 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2652.78) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1505 Lines: 59 hello, i was just going thru linux ipv6 code and i came across something like this in the ipv6_rcv routine: hdr = skb->nh.ipv6h; ..... .... if (hdr->nexthdr == NEXTHDR_HOP) { skb->h.raw = (u8*)(hdr+1); if (!ipv6_parse_hopopts(skb, &hdr->nexthdr)) { ipv6_statistics.Ip6InHdrErrors++; return 0; } } note that here skb->h.raw points to the beginning of the hop-by-hop ext header. now ipv6_parse_hopopts is called with a arg = pointer to the nexthdr field of the ipv6 header. ipv6_parse_hopopts(struct sk_buff *skb, u8 *nhptr) { ((struct inet6_skb_parm*)skb->cb)->hop = sizeof(struct ipv6hdr); if (ip6_parse_tlv(tlvprochopopt_lst, skb, nhptr)) return nhptr+((nhptr[1]+1)<<3); return NULL; } in the ipv6_parse_hopopts routine it is written: return nhptr+((nhptr[1]+1)<<3); now this thing should return a pointer to the next header after the hop-by-hop ext header. but it seems it will point to almost at the wrong place since nhptr is not pointing to the hop-by-hop header.it can point to the right place only if nhptr is pointing to the beginning of hop-by-hop header. and also, in the routing ip6_parse_tlv nhptr is passed as an argument but never used (atleast i can't see it:) PS: I am not a kernel guru so be patient if there is something blatantly foolish or wrong in my observation! imran From owner-netdev@oss.sgi.com Thu Mar 29 12:02:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2TK25f10222 for netdev-outgoing; Thu, 29 Mar 2001 12:02:05 -0800 Received: from auemail2.firewall.lucent.com (auemail2.lucent.com [192.11.223.163]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2TK24M10219 for ; Thu, 29 Mar 2001 12:02:04 -0800 Received: from auemail2.firewall.lucent.com (localhost [127.0.0.1]) by auemail2.firewall.lucent.com (Pro-8.9.3/8.9.3) with ESMTP id PAA03033 for ; Thu, 29 Mar 2001 15:02:03 -0500 (EST) Received: from nc8220exchange.ral.lucent.com (h135-92-100-21.lucent.com [135.92.100.21]) by auemail2.firewall.lucent.com (Pro-8.9.3/8.9.3) with ESMTP id PAA03026 for ; Thu, 29 Mar 2001 15:02:03 -0500 (EST) content-class: urn:content-classes:message Subject: Linux Hardware/Software IP Stack Integration for a TOE MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Thu, 29 Mar 2001 15:02:02 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.0.4417.0 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Linux Hardware/Software IP Stack Integration for a TOE Thread-Index: AcC4ix6lOUkN/tU8SNKH6uWkDYYATQ== From: "Gregory Parrott" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f2TK24M10220 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2538 Lines: 53 Hello, I have been a lurker on this netdev list for the last 6 months and have learned a lot by simply watching the e-mails go by. I am quite impressed with the knowledge floating around out there. Now I need to ask for some advice on how to approach my current problem. I am developing a device driver that supports an adapter that has its own TCP/UDP/ICMP/IP stack onboard in silicon (some may refer to this as a transport offload engine). My challenge is to have the hardware stack(s) co-exist with the Linux software stack which will have to be used to support "conventional" NICs. Since we want this to be transparent to users at the socket layer (the interface to the chip is actual socket calls - socket, bind, listen, etc. along with support functions and methods for transmitting raw packets), I believe this to be a non-trivial endeavour. Creating my own protocol family is the easy way out, but not very useful to the tons of existing socket software. I have done some research into the 2.2.14 kernel some time back and now need to move forward with hooking in to the kernel to support my "hardware" stack. The driver work that I have done so far in supporting "conventional" NIC operation has been completed for 2.2.14. You are probably asking why I have not graduated to 2.4 kernels.... my responses would be 1) lack of time in trying to get something to work as it is and 2) sticking with commercially available distributions when I started this project. With this background information, my questions are as follows: 1) Has anyone looked into the problem of supporting multiple stacks in Linux (the existing software stack with one or more hardware stacks)? If so, are the research results available? 2) Is now the time to switch from 2.2.14 to 2.4.x to simplify my life? This will involve converting the existing framework that I have for supporting "conventional" mode. 3) Where is the best place to hook in? I could intercept sys_* calls or I could hook in at the specific protocol (tcp, udp, raw). My feeling is it will be a combination of the two. Multiple stacks introduce interesting problems. When a user app opens a socket, a socket has to be opened on all stacks and simultaneous ops have to be done to each until the socket is bound. Once bound, the others need to be cleaned up. I am sure this is just the tip of the iceberg. Any comments and suggestions would be greatly appreciated. Greg Parrott Optical Area Networking Lucent Technologies 919-838-6095 http://www.lucent-optical.com/oan From owner-netdev@oss.sgi.com Thu Mar 29 17:37:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2U1b3m16568 for netdev-outgoing; Thu, 29 Mar 2001 17:37:03 -0800 Received: from smtp1.arnet.com.ar (host000012.arnet.net.ar [200.45.0.12] (may be forged)) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2U1atM16564 for ; Thu, 29 Mar 2001 17:37:01 -0800 Received: (qmail 29756 invoked from network); 30 Mar 2001 01:25:51 -0000 Received: AntiBombing Version 0.08 by GCM Received: ThePolice Version 0.02 by GCM Received: from host000004.arnet.net.ar (HELO mail1.arnet.com.ar) (200.45.0.4) by host000012.arnet.net.ar with SMTP; 30 Mar 2001 01:25:51 -0000 Received: from mail pickup service by mail1.arnet.com.ar with Microsoft SMTPSVC; Thu, 29 Mar 2001 22:25:33 -0300 Received: from recife.arnet.com.ar ([200.45.0.70]) by smtpmcis1.arnet.com.ar with Microsoft SMTPSVC(5.5.1877.677.67); Wed, 28 Mar 2001 17:19:19 -0300 Received: (qmail 6659 invoked from network); 28 Mar 2001 20:18:29 -0000 Received: from oss.sgi.com (216.32.174.190) by recife.arnet.com.ar with SMTP; 28 Mar 2001 20:18:29 -0000 Received: from localhost (mail@localhost) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2SKHxs14922; Wed, 28 Mar 2001 12:17:59 -0800 X-Authentication-Warning: oss.sgi.com: mail owned process doing -bs Received: by oss.sgi.com (bulk_mailer v1.13); Wed, 28 Mar 2001 12:16:52 -0800 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2SKGqx14789 for netdev-outgoing; Wed, 28 Mar 2001 12:16:52 -0800 Received: from thalia.fm.intel.com (fmfdns02.fm.intel.com [132.233.247.11]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2SKGpM14786 for ; Wed, 28 Mar 2001 12:16:51 -0800 Received: from SMTP (fmsmsxvs02-1.fm.intel.com [132.233.42.202]) by thalia.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.35 2001/02/12 09:03:45 smothers Exp $) with SMTP id UAA09708 for ; Wed, 28 Mar 2001 20:16:46 GMT Received: from fmsmsx29.FM.INTEL.COM ([132.233.48.29]) by 132.233.48.202 (Norton AntiVirus for Internet Email Gateways 1.0) ; Wed, 28 Mar 2001 20:16:45 0000 (GMT) Received: by fmsmsx29.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Wed, 28 Mar 2001 12:16:44 -0800 Message-ID: <9319DDF797C4D211AC4700A0C96B7C9404AC1DE2@orsmsx42.jf.intel.com> From: "Raj, Ashok" To: "'Netdev'" Subject: FW: destructor use in skb Date: Wed, 28 Mar 2001 12:16:41 -0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1025 Lines: 35 Hello. Iam looking for a way so that a networking device driver, after it hands the sk_buff up the protocol stack (netif_rx) right when its appropriate for the buffer to go to the free pool it checks for ref counts and then calls the destructor function with the skb ptr. Most usages dont seem to use this one, but just use this as a means to allocate another skb (using some private fields from here) and just disregard this buffer. Alan seemed to indicate there is some possible work going on here and this recycle of skb's back to the network device driver would be a big plus for us. any pointers here would be greatly apprecited ashokr -----Original Message----- From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk] Sent: Wednesday, March 28, 2001 12:00 PM To: ashok.raj@intel.com Cc: alan@lxorguk.ukuu.org.uk Subject: Re: destructor use in skb Currently its there for AF_UNIX garbage collecting. I know Dave & Ingo experimented with per device recycling. You might want to ask the netdev@oss.sgi.com list. From owner-netdev@oss.sgi.com Thu Mar 29 20:07:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2U479F19627 for netdev-outgoing; Thu, 29 Mar 2001 20:07:09 -0800 Received: from cs.bu.edu (root@CS.BU.EDU [128.197.10.2]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2U477M19624 for ; Thu, 29 Mar 2001 20:07:07 -0800 Received: from csa.bu.edu (dhiman@csa [128.197.12.3]) by cs.bu.edu (8.10.1/8.10.1) with ESMTP id f2U46wa22441 for ; Thu, 29 Mar 2001 23:06:58 -0500 (EST) Received: (from dhiman@localhost) by csa.bu.edu (8.10.1/8.10.1) id f2U46rD17648 for netdev@oss.sgi.com; Thu, 29 Mar 2001 23:06:54 -0500 (EST) Date: Thu, 29 Mar 2001 23:06:53 -0500 From: Dhiman Barman To: netdev@oss.sgi.com Subject: help Message-ID: <20010329230653.A17313@cs.bu.edu> Reply-To: dhiman@cs.bu.edu Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1418 Lines: 52 Hi, I am trying to write this module to linux kernel but while I get all kinds of silly errors. What wrong with the header files ? I am just stuck here in writing another big module. Any help will be appreciated. Dhiman /************************/ define MODULE #define MAX_UNSIGNED_SHORT 65535 #include #include #include int packet_dropper(struct sk_buff *skb) { } int init_module() { } void cleanup_module(void) { } /***********************/ Errors are : In file included from /usr/include/linux/sched.h:15, from /usr/include/linux/skbuff.h:19, from test.c:6: /usr/include/linux/timex.h:171: field `time' has incomplete type In file included from /usr/include/asm/smp.h:15, from /usr/include/linux/smp.h:14, from /usr/include/linux/sched.h:23, from /usr/include/linux/skbuff.h:19, from test.c:6: /usr/include/asm/fixmap.h:72: parse error before `pgprot_t' In file included from /usr/include/linux/sched.h:23, from /usr/include/linux/skbuff.h:19, from test.c:6: /usr/include/linux/smp.h:30: parse error before `(' In file included from /usr/include/linux/sched.h:80, from /usr/include/linux/skbuff.h:19, from test.c:6: /usr/include/linux/timer.h:21: field `list' has incomplete type From owner-netdev@oss.sgi.com Thu Mar 29 22:19:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2U6JJR21899 for netdev-outgoing; Thu, 29 Mar 2001 22:19:19 -0800 Received: from zebra.uas.se (zebra.uas.se [195.252.0.52]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2U6JHM21896 for ; Thu, 29 Mar 2001 22:19:18 -0800 Received: from bhlap.uas.se (uas-16-54.uas.se [195.252.16.54]) by zebra.uas.se (PMDF V6.0-24 #39208) with ESMTP id <01K1SUM19TNQ00LKM0@zebra.uas.se> for netdev@oss.sgi.com; Fri, 30 Mar 2001 08:18:36 +0200 Received: from signal.uu.se (IDENT:bh@localhost [127.0.0.1]) by bhlap.uas.se (8.9.3/8.9.3) with ESMTP id IAA25399; Fri, 30 Mar 2001 08:18:34 +0200 Date: Fri, 30 Mar 2001 08:18:33 +0200 From: Bjorn Hammarberg Subject: HELP: Why are redirected packets dropped? To: linux-net@vger.kernel.org, netdev@oss.sgi.com Reply-to: Bjorn.Hammarberg@signal.uu.se Message-id: <3AC42539.CC7445D5@signal.uu.se> Organization: Uppsala University, Sweden MIME-version: 1.0 X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-3 i586) Content-type: text/plain; charset=iso-8859-1 X-Accept-Language: en Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f2U6JIM21897 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2648 Lines: 70 Hi! Could someone please help me out here. I have no clue at all, so it must be really simple... ;-) Short version: What kind of mechanism drops IP packets and what is the difference between an accepted packet and an unaccepted packet? Why does host A forward packets from host B when host A drops packets from host A that have its source address rewritten to host B (with recalculated checksum)? In other words: how can host A discriminate between true host B packets and fake host B (and even fake host C) packets? My kernel is 2.2.18 on a 486. Longer version: I am trying to implement some kind of rerouter that receives a packet, changes its source address and sends it out again (after checksumming of course). The problem is that these packets are silently dropped somewhere in the forward chain. Host A initiates a tcp connection and a packet is sent through sl0, gets its saddr rewritten, checksummed, and resent through sl0 to host A. This can be seen by either using tcpdump or the firewall logging. The firewall then masquerades this packet according to its logging, but that's it! No trace of it, neither in tcpdump nor the firewall logging. I have tried both slip links and ethertap links with the same result. If host B, with host A as its gateway, tries to send packets through the sl0 interface (no saddr rewriting though) the packets don't get dropped. In tcpdump and the firewall log I see this tcpdump: ... sl0 < hostC.1152 internet.telnet ... ... Packet log: input ACCEPT tap0 PROTO=6 hostC:1152 internet:23 ... ... Packet log: forw MASQ ppp0 PROTO=6 hostC:1152 internet:23 ... # here it ends!!! tcpdump: ... sl0 < hostB.3917 internet.telnet ... ... Packet log: input ACCEPT tap0 PROTO=6 hostB:3917 internet:23 ... ... Packet log: forw MASQ ppp0 PROTO=6 hostB:3917 internet:23 ... ... Packet log: output ACCEPT ppp0 PROTO=6 hostB:3917 internet:23 ... tcpdump: ... ppp0 > hostA.61726 internet.telnet ... # perfect! But why is it different from above???!? hostA is the gateway hostB is on the LAN hostC is the fake host (tried all sorts of addresses both LAN and slip) Any help is *VERY* welcome! Cheers, Bjorn ---------------------------------------------------------------------- Bjorn Hammarberg, PhD student in Neurophysiological Signal Processing Dep. of Neuroscience Signals and Systems Clinical Neurophysiology จจจจจจจ|+|o|จจจจจจจจจจ Uppsala University University Hospital Uppsala |-+-| PO Box 528 SE-751 85 Uppsala, SWEDEN |o|+| SE-751 20 Uppsala, SWEDEN http://www.neurofys.uu.se `---' http://www.signal.uu.se From owner-netdev@oss.sgi.com Fri Mar 30 00:11:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2U8BDD24489 for netdev-outgoing; Fri, 30 Mar 2001 00:11:13 -0800 Received: from se1.cogenit.fr (IDENT:root@se1.cogenit.fr [195.68.53.173]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2U8BBM24486 for ; Fri, 30 Mar 2001 00:11:11 -0800 Received: (from romieu@localhost) by se1.cogenit.fr (8.11.1/8.11.1) id f2U8AtP15532 for netdev@oss.sgi.com; Fri, 30 Mar 2001 10:10:55 +0200 Date: Fri, 30 Mar 2001 10:10:54 +0200 From: Francois Romieu To: netdev@oss.sgi.com Subject: Re: Linux Hardware/Software IP Stack Integration for a TOE Message-ID: <20010330101054.A15228@se1.cogenit.fr> References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: ; from gparrott@lucent.com on Thu, Mar 29, 2001 at 03:02:02PM -0500 X-Organisation: Marie's fan club - I Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 509 Lines: 13 Gregory Parrott ้crit : [...] > 3) Where is the best place to hook in? I could intercept sys_* calls or > I could hook in at the specific protocol (tcp, udp, raw). My feeling is > it will be a combination of the two. IMVVVHO hooking in inet_create (net/ipv4/af_inet.c): you register your own proto fields and you may decide to fallback to the normal one (if your connect operation doesn't succeed for example). That way the existing stack shouldn't need any modification. -- Ueimor From owner-netdev@oss.sgi.com Fri Mar 30 17:28:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2V1SuN14248 for netdev-outgoing; Fri, 30 Mar 2001 17:28:56 -0800 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2V1StM14245 for ; Fri, 30 Mar 2001 17:28:55 -0800 Received: by colin.muc.de id <140556-3>; Sat, 31 Mar 2001 03:28:42 +0200 Message-ID: <20010331032831.09161@colin.muc.de> Date: Sat, 31 Mar 2001 03:28:31 +0200 From: Andi Kleen To: Imran.Patel@nokia.com Cc: netdev@oss.sgi.com Subject: Re: bug in ipv6 extension header parsing?? References: <2D6CADE9B0C6D411A27500508BB3CBD063CEB2@eseis15nok> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <2D6CADE9B0C6D411A27500508BB3CBD063CEB2@eseis15nok>; from Imran.Patel@nokia.com on Thu, Mar 29, 2001 at 08:10:36PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 565 Lines: 14 On Thu, Mar 29, 2001 at 08:10:36PM +0200, Imran.Patel@nokia.com wrote: > hop-by-hop ext header. but it seems it will point to almost at the wrong > place since nhptr is not pointing to the hop-by-hop header.it can point to > the right place only if nhptr is pointing to the beginning of hop-by-hop > header. > > and also, in the routing ip6_parse_tlv nhptr is passed as an argument but > never used (atleast i can't see it:) Yes, it's bogus code, but currenty it doesn't matter because the return value is never used. Feel free to send a patch to fix it. -Andi From owner-netdev@oss.sgi.com Sat Mar 31 07:45:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2VFjDj25540 for netdev-outgoing; Sat, 31 Mar 2001 07:45:13 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2VFjAM25537 for ; Sat, 31 Mar 2001 07:45:11 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA20928; Sat, 31 Mar 2001 19:44:51 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200103311544.TAA20928@ms2.inr.ac.ru> Subject: Re: bug in ipv6 extension header parsing?? To: ak@muc.DE (Andi Kleen) Date: Sat, 31 Mar 2001 19:44:51 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20010331032831.09161@colin.muc.de> from "Andi Kleen" at Mar 31, 1 05:45:00 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 186 Lines: 10 Hello! > > and also, in the routing ip6_parse_tlv nhptr is passed as an argument but > > never used (atleast i can't see it:) > > Yes, it's bogus code, It is already removed. Alexey From owner-netdev@oss.sgi.com Sat Mar 31 08:33:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2VGX5g26654 for netdev-outgoing; Sat, 31 Mar 2001 08:33:05 -0800 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f2VGX3M26651 for ; Sat, 31 Mar 2001 08:33:03 -0800 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA21284; Sat, 31 Mar 2001 20:32:33 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200103311632.UAA21284@ms2.inr.ac.ru> Subject: Re: A TCP monitoring /proc/net file To: fds@cs.ucsd.edu (Federico David Sacerdoti) Date: Sat, 31 Mar 2001 20:32:33 +0400 (MSK DST) Cc: ak@muc.de, davem@redhat.com, netdev@oss.sgi.com, linux-net@vger.kernel.org In-Reply-To: <3ABBAFC2.6991DBA2@cs.ucsd.edu> from "Federico David Sacerdoti" at Mar 23, 1 12:19:14 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 126 Lines: 7 Hello! > Would a patch for 2.4.2 be helpful? Yes, of course. The tool is useful not depending on any curcumstances. Alexey