From owner-netdev@oss.sgi.com Mon May 1 07:35:27 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 07:35:18 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:10756 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 07:34:56 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id KAA07044 for netdev@oss.sgi.com; Mon, 1 May 2000 10:36:21 -0500 Message-ID: <20000501103620.A7015@doit.wisc.edu> Date: Mon, 1 May 2000 10:36:20 -0500 From: "James R. Leu" To: netdev@oss.sgi.com Subject: rt_cache_flush() Reply-To: jleu@mindspring.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I am fiddling with entries in the fib table and each time I modify a fib entry I want all entries in the route cache that used this fib entry to be flushed. rt_cache_flush() seems to want to flush the whole cache. For some reason it doesn't seem to be working. Is this the correct way to flush the entire route cache? What about entries that still have a ref_cnt > 0? Jim -- James R. Leu From owner-netdev@oss.sgi.com Mon May 1 08:22:17 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 08:22:08 -0700 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:24260 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 08:21:40 -0700 Received: from smtprich.nortel.com (actually zrchs148) by smtprch2.nortel.com; Mon, 1 May 2000 10:17:25 -0500 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprich.nortel.com; Mon, 1 May 2000 10:21:01 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id J5JQ85LL; Mon, 1 May 2000 10:19:39 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id J8YN6HSB; Tue, 2 May 2000 01:19:40 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id BAA04206 for ; Tue, 2 May 2000 01:19:45 +1000 Message-ID: <390DA08D.8979B5B7@uow.edu.au> Date: Tue, 02 May 2000 01:19:41 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: tx_timeout and timer serialisation Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In 3c59x we have a timer routine for media selection. It is set up with add_timer(). It is called from a BH (or whatever we call BH's in 2.3) We also have the tx_timeout routine which is called from the netdev layer. Another add_timer function. We also have hard_start_xmit() which appears to be called from all sorts of contexts. These three functions can potentially tread upon each others toes and hence need serialisation. In 2.3, is the driver provided any serialisation guarantees, or do we go it alone? In 2.2: same question. Am I correct in believing that in 2.2, BH handlers were serialised wrt SMP, but that in 2.3 they are not? Thanks, Alexey :) -- -akpm- From owner-netdev@oss.sgi.com Mon May 1 08:46:37 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 08:46:28 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:27567 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 08:46:11 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id LAA01185; Mon, 1 May 2000 11:46:10 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id LAA03846; Mon, 1 May 2000 11:46:09 -0400 (EDT) Date: Mon, 1 May 2000 11:46:08 -0400 (EDT) From: jamal To: Andrew Morton cc: "netdev@oss.sgi.com" Subject: Re: tx_timeout and timer serialisation In-Reply-To: <390DA08D.8979B5B7@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew, Hacking away those last minutes? ;-> Not Alexey, but i can give you some answers. On Tue, 2 May 2000, Andrew Morton wrote: > In 3c59x we have a timer routine for media selection. It is set up with > add_timer(). It is called from a BH (or whatever we call BH's in 2.3) > > We also have the tx_timeout routine which is called from the netdev > layer. Another add_timer function. > > We also have hard_start_xmit() which appears to be called from all sorts > of contexts. > you mean the dev->hard_start_xmit() ? This is called only from the non interupt context; the only exception i can think of is the case of fast routing. > These three functions can potentially tread upon each others toes and > hence need serialisation. > > > In 2.3, is the driver provided any serialisation guarantees, or do we go > it alone? > transmit is serialized from the toplevel by the device xmit_lock > > In 2.2: same question. > BH Lock + dev->tbusy should protect the transmit. For both 2.2 and 2.3: - The timer routine for media selection should not trample on the transmit or anything else and usage tp->medialock should protect things in both 2.2 and 2.3; - The tx_timeout() as well is called only when the xmit_lock is grabbed. > Am I correct in believing that in 2.2, BH handlers were serialised wrt > SMP, but that in 2.3 they are not? > It is just as serialized but more finer grained in 2.3; I think the rules are simply: - use the tp->medialock for media selection timer - Dont worry about the tx_timeout() it is serialized with respect to your drivers hard_start_xmit() cheers, jamal From owner-netdev@oss.sgi.com Mon May 1 09:33:18 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 09:33:08 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:13316 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 09:32:40 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id MAA07147 for netdev@oss.sgi.com; Mon, 1 May 2000 12:34:10 -0500 Message-ID: <20000501123409.B7015@doit.wisc.edu> Date: Mon, 1 May 2000 12:34:09 -0500 From: "James R. Leu" To: netdev@oss.sgi.com Subject: dev_hold() and skb->dev Reply-To: jleu@mindspring.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Is it necessary to do a dev_hold() when placing the outgoing dev in an sk_buff? It looks like ethernet driver in general do not, but skb->rx_dev is held (in netif_rx). Jim -- James R. Leu From owner-netdev@oss.sgi.com Mon May 1 10:14:47 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 10:14:38 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:8967 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 1 May 2000 10:14:27 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA05174; Mon, 1 May 2000 21:14:13 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005011714.VAA05174@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: andrewm@uow.EDU.AU (Andrew Morton) Date: Mon, 1 May 2000 21:14:13 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <390DA08D.8979B5B7@uow.edu.au> from "Andrew Morton" at May 1, 0 08:13:07 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1466 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > These three functions can potentially tread upon each others toes and > hence need serialisation. tx_timeout and hard_start_xmit are serialized by core and never overlap. The way, how driver protects itself of its own internal events, (media selection timer in this case) is its own internal problem. No super-locks exist. If media selection does not affect RX logic, you may use dev->xmit_lock (that one which serializes tx_timeout and hard_start_xmit). If some dependencies between RX and TX exist, than core can do nothing to help you and you have to use an internal lock. > In 2.2: same question. In 2.2 super-lock exists, which serializes _all_ the networking, timers etc. etc. etc. hard_start_xmit is called only under this super-lock, so that there is nothing to be bothered about. Well, except for exiting this dangerous part of code so fastly as it is possible to allow OS to proceed. 8) > Am I correct in believing that in 2.2, BH handlers were serialised wrt > SMP, but that in 2.3 they are not? Actually, "BHs" are serialized exactly as they were in 2.2. Only networking does not use these BHs more, exactly because they are serialized and we do not want this. Timers are still globally serialized and do not overlap. But you can honestly assume that they are serialized only locally to cpu too, the life will be only simpler. It is from experience. 8) At least, the rest of networking do not assume any global synchronization. Alexey From owner-netdev@oss.sgi.com Mon May 1 11:20:18 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 11:20:08 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:24839 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 1 May 2000 11:19:47 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA06017; Mon, 1 May 2000 22:19:42 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005011819.WAA06017@ms2.inr.ac.ru> Subject: Re: dev_hold() and skb->dev To: jleu@mindspring.COM Date: Mon, 1 May 2000 22:19:42 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000501123409.B7015@doit.wisc.edu> from "James R. Leu" at May 1, 0 09:13:04 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 770 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Is it necessary to do a dev_hold() when placing the outgoing dev in an > sk_buff? It is not only not necessary, it would be wrong. skb->dev is randomly rewritten and used only as a scratch variable. Devices are held by different logic in output direction. When a part of code uses device, it is already referenced and skb finishes its life either inside device or in device queue, which also holds it and destroys skbs before device is destroyed. You may hold device only if it is recorded in skb->rx_dev, but it is almost never necessary, and it is not recommended. rx_dev is for core only. > but skb->rx_dev > is held (in netif_rx). It is one case when holding is required, packet waits in backlog without any other references to device. Alexey From owner-netdev@oss.sgi.com Mon May 1 11:38:38 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 11:38:19 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:29447 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Mon, 1 May 2000 11:38:17 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA06084; Mon, 1 May 2000 22:37:58 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005011837.WAA06084@ms2.inr.ac.ru> Subject: Re: PATCH 2.2.14 net/core/dev.c To: dlr@collab.net Date: Mon, 1 May 2000 22:37:58 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <390C998A.42873E21@finemaltcoding.com> from "Daniel L. Rall" at May 1, 0 01:13:12 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 860 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I rewrote the dev_alloc_name() function in the 2.2.14 Linux kernel's > net/core/dev.c module. It had an apparently artificial limitation on > number of network devices of the same type allowed (100), It is not artificial limitation but absolutely required fool-proof protection. Before creating more devices, you have to think about accesses to device list (which cannot be list after this, certainly). To be honest, for now any attempt to make a module potentially creating large number of devices is pure bug, because their designers _know_ that accesses to device list are poorly programmed in core and _can_ organize code more friendly to existing infrastructure. VLAN is the best example of wrong approach, all these innumerous pseudo-devices do nothing useful but eating resources, one multipoint device is more than enough as rule. Alexey From owner-netdev@oss.sgi.com Mon May 1 12:08:39 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 12:08:22 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:17831 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 12:08:09 -0700 Received: from fred.muc.de (none@ns1093.munich.netsurf.de [195.180.235.93]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id VAA11817; Mon, 1 May 2000 21:08:04 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12mLbd-0002h4-00; Mon, 1 May 2000 21:11:45 +0200 Date: Mon, 1 May 2000 21:11:45 +0200 From: Andi Kleen To: "James R. Leu" Cc: netdev@oss.sgi.com Subject: Re: rt_cache_flush() Message-ID: <20000501211145.A10174@fred.muc.de> References: <20000501103620.A7015@doit.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <20000501103620.A7015@doit.wisc.edu>; from James R. Leu on Mon, May 01, 2000 at 04:36:56PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, May 01, 2000 at 04:36:56PM +0200, James R. Leu wrote: > I am fiddling with entries in the fib table and each time I modify a > fib entry I want all entries in the route cache that used this fib entry > to be flushed. > > rt_cache_flush() seems to want to flush the whole cache. For some reason > it doesn't seem to be working. Is this the correct way to flush the entire > route cache? What about entries that still have a ref_cnt > 0? You set rt->u.dst.obsolete = 1 and expect the clients to relookup when they can. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Mon May 1 12:35:29 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 12:35:10 -0700 Received: from nero.doit.wisc.edu ([128.104.17.130]:33540 "EHLO nero.doit.wisc.edu") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 12:34:56 -0700 Received: (from jleu@localhost) by nero.doit.wisc.edu (8.8.7/8.8.7) id PAA07326; Mon, 1 May 2000 15:36:01 -0500 Message-ID: <20000501153601.D7152@doit.wisc.edu> Date: Mon, 1 May 2000 15:36:01 -0500 From: "James R. Leu" To: Andi Kleen , "James R. Leu" Cc: netdev@oss.sgi.com Subject: Re: rt_cache_flush() Reply-To: jleu@mindspring.com References: <20000501103620.A7015@doit.wisc.edu> <20000501211145.A10174@fred.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2 In-Reply-To: <20000501211145.A10174@fred.muc.de>; from Andi Kleen on Mon, May 01, 2000 at 09:11:45PM +0200 Organization: none Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, May 01, 2000 at 09:11:45PM +0200, Andi Kleen wrote: > On Mon, May 01, 2000 at 04:36:56PM +0200, James R. Leu wrote: > > I am fiddling with entries in the fib table and each time I modify a > > fib entry I want all entries in the route cache that used this fib entry > > to be flushed. > > > > rt_cache_flush() seems to want to flush the whole cache. For some reason > > it doesn't seem to be working. Is this the correct way to flush the entire > > route cache? What about entries that still have a ref_cnt > 0? > > You set rt->u.dst.obsolete = 1 and expect the clients to relookup when they > can. I'm not sure I have that availble to me at the level I'm at. Here is wha I'm doing: -I wrote a function called fn_hash_lookup_exact(). This finds an exact match in the fib (not longest, EXACT). -I add a snigglet of info to it (an outgoing MPLS label to be exact) -I then to a rt_cache_flush(-1) Ideally I would like for only the route cache entries that matched the fib_node I found to be yanked from the cache (immediatly). I'll settle for the entire cache being flushed. It looks like the cache is being flushed, but it take a small amount of time. In that time, some packet go by and continue to use the old entry. If I stop all packets, modify the fib_node (resulting in a rt_cache_flush(-1)) and wait a couple of seconds, then send packets, things seem to work as expected. Is there any way to get my ideal scenario above? If not can I speed up the cache flush? Thanks, Jim -- James R. Leu From owner-netdev@oss.sgi.com Mon May 1 16:38:30 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 16:38:11 -0700 Received: from amber.ccs.neu.edu ([129.10.116.51]:47522 "EHLO amber.ccs.neu.edu") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 16:37:50 -0700 Received: from denali.ccs.neu.edu (bcoffey@denali.ccs.neu.edu [129.10.116.200]) by amber.ccs.neu.edu (8.10.0.Beta10/8.10.0.Beta10) with ESMTP id e41Nbh917520 for ; Mon, 1 May 2000 19:37:44 -0400 (EDT) Date: Mon, 1 May 2000 19:37:43 -0400 (EDT) From: "Brendan M. Coffey" To: netdev@oss.sgi.com Subject: linux tcp/ip rfc compliance Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing dear network team: I am planning a brief investigation of the Linux 2.2 kernel's TCP/IP implementation, as regards RFC compliance. I am given to understand that the results of this research would be helpful, or at least informative, to you in your efforts. If there is anyone in particular to whom I should address correspondence, please let me know. Also, I understand that Mike Shaver undertook a similar study of a 2.0 series kernel. If you have any idea as to an accessible location of the results of his research, I would appreciate that information as well. Thanks! brendan coffey From owner-netdev@oss.sgi.com Mon May 1 17:18:40 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 17:18:30 -0700 Received: from n01141-lau1.unity.ncsu.edu ([152.1.112.141]:9094 "EHLO n01141-lau1.unity.ncsu.edu") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 17:18:17 -0700 Received: (from lhan@localhost) by n01141-lau1.unity.ncsu.edu (8.8.4/EC02Jan97) id UAA12715; Mon, 1 May 2000 20:18:15 -0400 (EDT) Date: Mon, 1 May 2000 20:18:14 -0400 (EDT) From: Liang Han To: netdev@oss.sgi.com Subject: Re: Linux2.2.12 and 2.0.36 TCP differences? In-Reply-To: <20000429012414.A2137@fred.muc.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Sat, 29 Apr 2000, Andi Kleen wrote: > On Sat, Apr 29, 2000 at 12:20:23AM +0200, Liang Han wrote: > > 1. The routing should be resolved at IP level. My experience on > > linux2.2.12 seems conflict with the principle. So what's the reason. > > Most likely the TOS setting in your redirects. The TOS in the Ip Header > in the redirect has to match the TOS of the TCP connection. ftp and > telnet both set TOS. Linux 2.2 has per TOS routing, so it uses > TOS dependent routes. Checking TOS for redirects is arguably > very pedantic, but it is like it is. I have changed the TOS field of the redirect message. It works and Telnet and Ftp sessiones go through the new route now. But I also need to send another redirect message with Tos set to 0x0 to make Tracetoute work. My conclusion is under the scheme of per TOS routing, we can not totally rely on traceroute, ping, route and ping programms to determine the actual route. Liang From owner-netdev@oss.sgi.com Mon May 1 17:54:00 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 17:53:41 -0700 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:48908 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 17:53:17 -0700 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.9.3) with ESMTP id SAA30756; Mon, 1 May 2000 18:26:09 -0700 Message-ID: <390E2EB1.44EEE20@candelatech.com> Date: Mon, 01 May 2000 18:26:09 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: dlr@collab.net, netdev@oss.sgi.com Subject: Re: PATCH 2.2.14 net/core/dev.c References: <200005011837.WAA06084@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > Hello! > > > I rewrote the dev_alloc_name() function in the 2.2.14 Linux kernel's > > net/core/dev.c module. It had an apparently artificial limitation on > > number of network devices of the same type allowed (100), > > It is not artificial limitation but absolutely required fool-proof > protection. Before creating more devices, you have to think about > accesses to device list (which cannot be list after this, certainly). Surely there are no critical paths that access the list linearly??? If that is the problem, then those should be fixed, based on the device index or some other hash. > To be honest, for now any attempt to make a module potentially > creating large number of devices is pure bug, because their designers > _know_ that accesses to device list are poorly programmed in core > and _can_ organize code more friendly to existing infrastructure. > VLAN is the best example of wrong approach, all these innumerous > pseudo-devices do nothing useful but eating resources, > one multipoint device is more than enough as rule. What resources, other than memory, would the VLANs consume? I'm willing to pay memory for constant-time lookups. If having many interfaces hurts performance in some other way, then that is something I'd be interested in fixing... Although the VLAN code I've written will support an extremely large number of VLAN devices, the common user will have between 1 and 10 I believe, and most users will only have 1. I don't see any reason to make the code less flexible just because _I_ don't think they need that many: They may have reasons I've never considered... (low speed DSL/cable-modem router/firewall?) Still, if there's a better way, I'd like to look at it. Can you offer an example of something that has many virtual interfaces, such as VLAN does, but does not use many virtual net_devices? I think virtual-IP interfaces use pseudo-devices, what about Frame-Relay PVCs? ATM PVCs? Thanks, Ben > > Alexey -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Mon May 1 18:37:51 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 18:37:41 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:19161 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 18:37:17 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch1.nortel.com; Mon, 1 May 2000 20:36:46 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KDQMRV33; Mon, 1 May 2000 20:36:40 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id J8YN621M; Tue, 2 May 2000 11:36:41 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id LAA10993 for ; Tue, 2 May 2000 11:36:45 +1000 Message-ID: <390E3143.5CF7D4AD@uow.edu.au> Date: Tue, 02 May 2000 11:37:07 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: Re: tx_timeout and timer serialisation References: <390DA08D.8979B5B7@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal wrote: > > Andrew, > Hacking away those last minutes? ;-> Had a little problem with the corporate Amex. 24 hrs delay.. > Not Alexey, but i can give you some answers. You're less entertaining :) [ stuff ] OK, thanks, guys. 1: hard_start_xmit and tx_timeout are serialised wrt each other. 2: timer functions, ioctl() and get_stats() have no guarantees. 3: The ISR can be called during get_stats(), start_xmit(), ioctl(), media_timer(), tx_timeout(), etc. 4: We have a lot of racy drivers :( Let's pick some random samples: eepro100 -------- speedo_timer does mdio_read()s. speedo_tx_timeout() does mdio_read()s and mdio_write()'s. mdio functions are stateful. Race. 3c515.c ------ WTF? cli()? I stopped reading there. epic100.c --------- epic_timer() and mii_ioctl() do mdio_read()s 8139too.c --------- Appears to get it right. Way to go, Jeff! 3c59x.c ------- vortex_timer() and vortex_ioctl() do mdio stuff. sis900.c -------- iotcl() and timer() do mdio functions. acenic.c -------- Well ace_timer is nice and safe, because the driver carefully constructs the timer and then fails to add it to the kernel's timer table! And that's just looking at the mdio functions. I suspect that there is more reentrable and stateful hardware bit twiddling going on here. I suggest that we should be adding spinlocks to the ioctl(), media_timer(), hard_start_xmit() and get_stats() fucntions ***As a general rule*** and only remove them if the driver has been reviewed and that non-raciness is demonstrable. What fun. Another issue: del_timer_sync(). It deletes a timer, but if that timer happens to be running, del_timer_sync() blocks until the timer's handler returns. Things like the call to del_timer() in acenic.c:ace_start_xmit() need to be carefully reviewed. -- -akpm- From owner-netdev@oss.sgi.com Mon May 1 21:29:42 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 21:29:32 -0700 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:42626 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 21:29:19 -0700 Received: from smtprich.nortel.com (actually zrchs148) by smtprch2.nortel.com; Mon, 1 May 2000 23:26:31 -0500 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprich.nortel.com; Mon, 1 May 2000 23:30:07 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KDQMRX6L; Mon, 1 May 2000 23:28:56 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id J8YN6235; Tue, 2 May 2000 14:28:56 +1000 Received: from uow.edu.au (IDENT:akpm@localhost [127.0.0.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id OAA12113 for ; Tue, 2 May 2000 14:29:00 +1000 Message-ID: <390E598C.8DE626F1@uow.edu.au> Date: Tue, 02 May 2000 04:29:00 +0000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.3.99-pre5 i686) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: Re: tx_timeout and timer serialisation References: <390DA08D.8979B5B7@uow.edu.au> <390E3143.5CF7D4AD@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew Morton wrote: > > ... > Things like the call to del_timer() in > acenic.c:ace_start_xmit() need to be carefully reviewed. That email sounded whiny. I'm not whining - I will sit down and work through these drivers starting in 1-2 weeks time, if we agree it would be useful. But I'd be interested in hearing from more experienced people whether these are indeed problems, and if there are other common bloopers to watch out for. -- -akpm- From owner-netdev@oss.sgi.com Tue May 2 06:35:10 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 06:34:50 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:54539 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 06:34:44 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id RAA14118; Tue, 2 May 2000 17:34:33 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005021334.RAA14118@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: andrewm@uow.EDU.AU (Andrew Morton) Date: Tue, 2 May 2000 17:34:33 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <390E3143.5CF7D4AD@uow.edu.au> from "Andrew Morton" at May 2, 0 06:13:23 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 2517 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > 1: hard_start_xmit and tx_timeout are serialised wrt each other. Add set_multicast_list() to this list. > 2: timer functions, ioctl() and get_stats() have no guarantees. Media timer is separate item, it is inivisible from outside at all. ioctl() is allowed to sleep, hence top level cannot do anything to serialize it wrt class 1. But it serializes it wrt itself and open(), close(), which is not so easy by the way. 8) get_stats() is very special thing. The problem with this is that it is broken by design, returning pointer to some static structure to code which knows nothing about its protection. It is pretty clear that if get_stats() does not modify counters, it needs no protection. But if it _does_ modify (== touch hardware registers), as most of ethernet drivers do, top level cannot do anything with this either in 2.2 or in 2.3. Protection wrt IRQs is out of area of its expertise. I would advise to avoid touching statistics from get_stats, when it is possible and to modify counters only in context, where they are naturally serialized. F.e. tulip get_stats() changes only rx_missed_errors. Does it really deserve mud with irq safe locks around? > 3: The ISR can be called during get_stats(), start_xmit(), > ioctl(), media_timer(), tx_timeout(), etc. Well, nothing is protected wrt irqs. > speedo_timer does mdio_read()s. speedo_tx_timeout() does mdio_read()s > and mdio_write()'s. mdio functions are stateful. Race. Are they touched in normal rx/tx path and/or irq? If they are not, it is easy to repair with _separate_ mdio bh protected spinlock. The problem can be with control registers, which are reprogrammed at IRQ level. > Another issue: del_timer_sync(). It deletes a timer, but if that timer > happens to be running, del_timer_sync() blocks until the timer's handler > returns. Do some drivers really use this? Beware, this function is smart, but it is not so easy to use. First, timer handler must do timer_exit() on exit. Second, del_timer_sync() cannot be called under spinlock, which could be grabbed inside timer handler, it will be deadlock. The best thing is to call it only in context _free_ of spinlocks: i.e. open(), close(), ioctl(). See? That's why open()/close()/ioctl() are called by top level without any locks held, though there are not so much of drivers really sleeping there. Lots of operations like to be done in normal process context. [ Including cli(), which does not synchronize to timers, when called from under spinlock. ] Alexey From owner-netdev@oss.sgi.com Tue May 2 07:33:11 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 07:33:02 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:40582 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Tue, 2 May 2000 07:32:49 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by smtprch1.nortel.com; Tue, 2 May 2000 09:26:48 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id J82A3K9G; Tue, 2 May 2000 22:26:40 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id J8YN6JB0; Wed, 3 May 2000 00:26:43 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id AAA14924; Wed, 3 May 2000 00:26:45 +1000 Message-ID: <390EE5BB.2AF2F1CD@uow.edu.au> Date: Wed, 03 May 2000 00:27:07 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <390E3143.5CF7D4AD@uow.edu.au> from "Andrew Morton" at May 2, 0 06:13:23 am <200005021334.RAA14118@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > ... > > > speedo_timer does mdio_read()s. speedo_tx_timeout() does mdio_read()s > > and mdio_write()'s. mdio functions are stateful. Race. > > Are they touched in normal rx/tx path and/or irq? If they are not, > it is easy to repair with _separate_ mdio bh protected spinlock. I believe you are correct. This is a good approach, because the mdio functions, although rarely called, are slow. 3Com's GPL'ed driver for the 3c90x series is interesting. They use four spinlocks. - One for setting multicast mode - One for the start_xmit path - One for the interrupt - One for misc "recv mode/close/timer". I haven't studied it closely; it's a _very_ differently structured driver from the norm. I would guess that it has been ported from another OS. They ifdef all the spinlocks out of existence if !__SMP__. Interesting... > The problem can be with control registers, which are reprogrammed > at IRQ level. > > > Another issue: del_timer_sync(). It deletes a timer, but if that timer > > happens to be running, del_timer_sync() blocks until the timer's handler > > returns. > > Do some drivers really use this? A few of them use it. With the exception of eepro100, they use it for killing the media timer in the close() routine, which sems sensible - we don't want to release resources underneath the timer handler's feet. a2065.c forgets to use timer_exit(). But eepro100 uses del_timer_sync() in tx_timeout() (not under any spinlock) and does not use timer_exit(). I don't see why eepro100's tx_timeout() doesn't simply lock up in SMP, actually. timer->running will be set in run_timer_list() and appears to never be cleared, and del_timer_sync() will wedge in timer_synchronize(). Wanna take a look? I must have missed something. > Beware, this function is smart, > but it is not so easy to use. First, timer handler must do timer_exit() > on exit. Second, del_timer_sync() cannot be called under spinlock, which > could be grabbed inside timer handler, it will be deadlock. It would be rather nice to document these requirements in kernel/timer.c! Many drivers call del_timer() in their close() methods. It is conceivable (but rather unlikely) that the timer routine could get confused trying to manage the media interface of a device which has been fully or partially closed. > The best thing is to call it only in context _free_ of spinlocks: > i.e. open(), close(), ioctl(). See? Is this what you mean? timer_func() { spin_lock(some_lock); <<<< Deadlock ... | spin_unlock(some_lock); | } | | some_func() | { | spin_lock(some_lock); | <<<<<<<+ del_timer_sync(some_timer); } > That's why open()/close()/ioctl() > are called by top level without any locks held, though there are > not so much of drivers really sleeping there. Lots of operations > like to be done in normal process context. [ Including cli(), which > does not synchronize to timers, when called from under spinlock. ] I don't understand your point about cli(). Do you meant that timer handlers can run on other CPUs (presumably via ret_from_syscall()) while the global IRQ lock is held??? -- -akpm- From owner-netdev@oss.sgi.com Tue May 2 07:49:00 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 07:48:40 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:31244 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 07:48:16 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id SAA14593; Tue, 2 May 2000 18:48:01 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005021448.SAA14593@ms2.inr.ac.ru> Subject: Re: PATCH 2.2.14 net/core/dev.c To: greearb@candelatech.com (Ben Greear) Date: Tue, 2 May 2000 18:48:01 +0400 (MSK DST) Cc: dlr@collab.net, netdev@oss.sgi.com In-Reply-To: <390E2EB1.44EEE20@candelatech.com> from "Ben Greear" at May 1, 0 06:26:09 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1563 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Surely there are no critical paths that access the list linearly??? All the places where result of device lookup is not cached. IP tries to cache it, but not everywhere. I am sorry, when list of thousands elements is scanned and strcmp()ed, it is visible even in not very critical paths. > If that is the problem, then those should be fixed, based on the > device index or some other hash. Exactly. Only it is more clever to avoid clogging system with devices, when they are not necessary really. List is ideal in this case. > What resources, other than memory, would the VLANs consume? I'm willing > to pay memory for constant-time lookups. 8)8)8) You lose both in this case. So, either you tell that list of devices must fit to single screen, or you hide them skillfully, so that nobody ever saw them occasionally and user doing "ifconfig" always saw some finite list. > interfaces use pseudo-devices, what about Frame-Relay PVCs? ATM PVCs? I remember, we have already talked about this. Do you remember? (NBMA 8)) Look, someone could find some merits in assigning special "eth666" to each MAC address on the wire. People do not make this usually. Why? Because it is inconvenient! When some amount exceeds dozen, it must be hidden and some classification scheme more reasonable than "xxxNNNNN" (f.e. given by dev_alloc_name()) must be used. That's why dev_alloc_name() is limited to two digits. And the maximum is not 99, as people extending it to 999 think, but 15. (When output of "ip link ls" still fits to one screen 8)8)) Alexey From owner-netdev@oss.sgi.com Tue May 2 07:49:00 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 07:48:41 -0700 Received: from colin.muc.de ([193.149.48.1]:65297 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 07:48:25 -0700 Received: by colin.muc.de id <140556-2>; Tue, 2 May 2000 16:47:38 +0200 Message-ID: <20000502164735.52502@colin.muc.de> From: Andi Kleen To: Andrew Morton Cc: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <390EE5BB.2AF2F1CD@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <390EE5BB.2AF2F1CD@uow.edu.au>; from Andrew Morton on Tue, May 02, 2000 at 04:34:46PM +0200 Date: Tue, 2 May 2000 16:47:36 +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, May 02, 2000 at 04:34:46PM +0200, Andrew Morton wrote: > I believe you are correct. This is a good approach, because the mdio > functions, although rarely called, are slow. > > 3Com's GPL'ed driver for the 3c90x series is interesting. They use four > spinlocks. > > - One for setting multicast mode > - One for the start_xmit path > - One for the interrupt > - One for misc "recv mode/close/timer". + Inherent deadlock with disable_irq() [the driver likes to lock up under high load on SMP] > > I haven't studied it closely; it's a _very_ differently structured > driver from the norm. I would guess that it has been ported from > another OS. They ifdef all the spinlocks out of existence if !__SMP__. > Interesting... The Intel e100 driver does that too. It is probably wrong (interrupt can race with TX) -Andi From owner-netdev@oss.sgi.com Tue May 2 08:49:31 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 08:49:21 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:3854 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 08:49:14 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA16319; Tue, 2 May 2000 19:49:00 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005021549.TAA16319@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: andrewm@uow.edu.au (Andrew Morton) Date: Tue, 2 May 2000 19:49:00 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <390EE5BB.2AF2F1CD@uow.edu.au> from "Andrew Morton" at May 3, 0 00:27:07 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 2897 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I haven't studied it closely; it's a _very_ differently structured > driver from the norm. I would guess that it has been ported from > another OS. They ifdef all the spinlocks out of existence if !__SMP__. > Interesting... Indeed 8) Where can I find it? > But eepro100 uses del_timer_sync() in tx_timeout() (not under any > spinlock) Do not forget, it is under dev->xmit_lock! If media timer will grab it too -> deadlock. > never be cleared, and del_timer_sync() will wedge in > timer_synchronize(). Yes. > It would be rather nice to document these requirements in > kernel/timer.c! Well, if this function were _correct_... 8) Essentially, it is replacement for combination: del_timer(); synchronize_bh(); used in 2.2. I.e. delete and wait for all pending BHs to complete. del_timer_sync() deletes and waits only for _this_ timer to complete. It can be used from any context, provided user is sure that there are no deadlocks. Alas, it has fatal bug. Namely, timer handler _code_ can be released in between timer_exit() and return from handler. It is utterly unlikely, but the bug is fatal. 8) I do not know how to repair this without refcounts. > Many drivers call del_timer() in their close() methods. It is > conceivable (but rather unlikely) that the timer routine could get > confused trying to manage the media interface of a device which has been > fully or partially closed. It is not so unlikely, actually. If someone will reopen device persistently, it will be hit very soon. At least, the first version of softnetted TCP died during hour on almost idle machine due to similar race condition. 8) > Is this what you mean? Exactly. > > That's why open()/close()/ioctl() > > are called by top level without any locks held, though there are > > not so much of drivers really sleeping there. Lots of operations > > like to be done in normal process context. [ Including cli(), which > > does not synchronize to timers, when called from under spinlock. ] > > I don't understand your point about cli(). Do you meant that timer > handlers can run on other CPUs (presumably via ret_from_syscall()) while > the global IRQ lock is held??? I mean that if you make cli() inside hard_start_xmit(), it does not synchronize to BHs. And if this time some timer run on another CPU, it will run in parallel with cli()ed code. See? cli() synchronizes to irqs only if it is not inside an IRQ handler and it synchronizes to BHs only if it is not inside BH. Otherwise, you would get deadlock without any possibility to avoid it, because these locks are global. This ingeniuous trick worked in 2.2 superbly, callers never ever noticed that cli() is not so mighty. They would notice this f.e. if tried to protect from irq 3 from handler for irq 4, and this was never necessary, fortunately. Well, now we have the same situation with networking and it is common. Alexey From owner-netdev@oss.sgi.com Tue May 2 09:20:51 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 09:20:41 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:59150 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 09:20:25 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA16502; Tue, 2 May 2000 20:20:19 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005021620.UAA16502@ms2.inr.ac.ru> Subject: Re: rt_cache_flush() To: jleu@mindspring.COM Date: Tue, 2 May 2000 20:20:19 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000501153601.D7152@doit.wisc.edu> from "James R. Leu" at May 2, 0 00:13:18 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 836 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Ideally I would like for only the route cache entries that matched All of them are matched. Dependencies are unknown. The only hint: do not flush, if nobody looked at changed entry. > It looks like the cache is being flushed, but it take a small amount > of time. In that time, some packet go by and continue to use the old entry. > If I stop all packets, modify the fib_node (resulting in a rt_cache_flush(-1)) > and wait a couple of seconds, then send packets, things seem to work as > expected. rt_cache_flush(-1) flushes cache after net/ipv4/route/min_delay. > Is there any way to get my ideal scenario above? If not can I speed up the > cache flush? rt_cache_flush(0), I guess. Only I am not sure that you really want this. Default 2 second delay is even less than it is necessary, Ciscos use 3 seconds. Alexey From owner-netdev@oss.sgi.com Tue May 2 09:37:11 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 09:37:02 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:783 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 09:36:41 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA16606; Tue, 2 May 2000 20:36:34 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005021636.UAA16606@ms2.inr.ac.ru> Subject: Re: Linux2.2.12 and 2.0.36 TCP differences? To: lhan@unity.ncsu.EDU (Liang Han) Date: Tue, 2 May 2000 20:36:34 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Liang Han" at May 2, 0 05:13:15 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 877 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I have changed the TOS field of the redirect message. It works and > Telnet and Ftp sessiones go through the new route now. Until you did not enabled routing by ports. 8) > But I also need to > send another redirect message with Tos set to 0x0 to make Tracetoute > work. You send redirect not for address (or tos or to something another). You send redirect for particular flow, after it is clear that this flow needs redirection. Unsolicited redirects are full non-sense. Use gated or some another routing daemon to distribute routes actively. > My conclusion is under the scheme of per TOS routing, we can not totally > rely on traceroute, ping, route and ping programms to determine the > actual route. Certainly. When I traceroute you, packets go through satellite link, when I telnet to the same address, they go through ground link. Alas. 8) Alexey From owner-netdev@oss.sgi.com Tue May 2 12:56:53 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 12:56:43 -0700 Received: from mailhost.uni-koblenz.de ([141.26.64.1]:15047 "EHLO mailhost.uni-koblenz.de") by oss.sgi.com with ESMTP id ; Tue, 2 May 2000 12:56:35 -0700 Received: from cacc-30.uni-koblenz.de (cacc-30.uni-koblenz.de [141.26.131.30]) by mailhost.uni-koblenz.de (8.9.3/8.9.3) with ESMTP id VAA04222; Tue, 2 May 2000 21:56:30 +0200 (MET DST) Received: by lappi.waldorf-gmbh.de id ; Tue, 2 May 2000 15:46:03 +0200 Date: Tue, 2 May 2000 15:46:03 +0200 From: Ralf Baechle To: Liang Han Cc: netdev@oss.sgi.com Subject: Re: Linux2.2.12 and 2.0.36 TCP differences? Message-ID: <20000502154603.A1470@uni-koblenz.de> References: <20000429012414.A2137@fred.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0.1i In-Reply-To: ; from lhan@unity.ncsu.edu on Mon, May 01, 2000 at 08:18:14PM -0400 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Mon, May 01, 2000 at 08:18:14PM -0400, Liang Han wrote: > Telnet and Ftp sessiones go through the new route now. But I also need to > send another redirect message with Tos set to 0x0 to make Tracetoute > work. > My conclusion is under the scheme of per TOS routing, we can not totally > rely on traceroute, ping, route and ping programms to determine the > actual route. At least traceroute has a -t option: [...] -t Set the type-of-service in probe packets to the following value (default zero). The value must be a decimal integer in the range 0 to 255. This option can be used to see if different types-of- service result in different paths. (If you are not running 4.4bsd, this may be academic since the nor­ mal network services like telnet and ftp don't let you control the TOS). Not all values of TOS are legal or meaningful - see the IP spec for defini­ tions. Useful values are probably `-t 16' (low delay) and `-t 8' (high throughput). [...] Ralf From owner-netdev@oss.sgi.com Tue May 2 18:49:32 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 18:49:12 -0700 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:778 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Tue, 2 May 2000 18:48:55 -0700 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.9.3) with ESMTP id TAA02403; Tue, 2 May 2000 19:21:51 -0700 Message-ID: <390F8D3F.3DC6F7CB@candelatech.com> Date: Tue, 02 May 2000 19:21:51 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: dlr@collab.net, netdev@oss.sgi.com Subject: Re: PATCH 2.2.14 net/core/dev.c References: <200005021448.SAA14593@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > Hello! > > > Surely there are no critical paths that access the list linearly??? > > All the places where result of device lookup is not cached. > IP tries to cache it, but not everywhere. If those are in the critical path, then things like virtual web servers (say with 20 virtual ip interfaces on them), should see a noticable drop in performance, eh? I haven't looked at the IP code enough to know the reasons, but it would seem that the need to linearly search the device list should be few and far between, at least once a port is bound to an IP. > I am sorry, when list of thousands elements is scanned and strcmp()ed, > it is visible even in not very critical paths. True. If someone wants to run 2000 VLANs, or 2000 tunnels, or 2000 PVCs, they will have a slow time of running ifconfig -a. Hopefully, as Linux finds it's way into these kinds of devices, someone will optimize it to better handle the large number of interfaces... > So, either you tell that list of devices must fit to single screen, > or you hide them skillfully, so that nobody ever saw them > occasionally and user doing "ifconfig" always saw some finite list. ifconfig is definately not critical path. If a faster interface is needed into the kernel, I'm sure someone can build one. > > > interfaces use pseudo-devices, what about Frame-Relay PVCs? ATM PVCs? > > I remember, we have already talked about this. Do you remember? (NBMA 8)) Vaguely, but I don't think I was satisfied with the answer then either. I can't think up any way to give an interface to the user, and yet not put an interface into the kernel. People shouldn't have to care what low layer interface their code uses, so that implies to me that VLANs, PVCs or whatever else must look, and in fact be, interfaces as far as the kernel is concerned. How else could you offer, say VLANs, without using interfaces? > > Look, someone could find some merits in assigning special "eth666" > to each MAC address on the wire. People do not make this usually. > Why? Because it is inconvenient! When some amount exceeds dozen, Layer 3 (IP & ARP) takes care of it for them. VLANs and PVCs are at lower layers, which are generally presented to the users as interfaces. > That's why dev_alloc_name() is limited to two digits. > And the maximum is not 99, as people extending it to 999 think, > but 15. (When output of "ip link ls" still fits to one screen 8)8)) I know of a box that is running 50 virtual IP interfaces. It seems to run just fine, so it seems the code supports more interfaces... Btw, I was planning on using source-routing (ie the ip command) in a PC with 20 real ethernet interfaces. Does this mean that the ip command will not support that? > > Alexey Thanks, Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Tue May 2 19:17:52 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 19:17:32 -0700 Received: from pizda.ninka.net ([216.101.162.242]:19841 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Tue, 2 May 2000 19:17:21 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id TAA02432; Tue, 2 May 2000 19:10:33 -0700 Date: Tue, 2 May 2000 19:10:33 -0700 Message-Id: <200005030210.TAA02432@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: greearb@candelatech.com CC: kuznet@ms2.inr.ac.ru, dlr@collab.net, netdev@oss.sgi.com In-reply-to: <390F8D3F.3DC6F7CB@candelatech.com> (message from Ben Greear on Tue, 02 May 2000 19:21:51 -0700) Subject: Re: PATCH 2.2.14 net/core/dev.c References: <200005021448.SAA14593@ms2.inr.ac.ru> <390F8D3F.3DC6F7CB@candelatech.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Tue, 02 May 2000 19:21:51 -0700 From: Ben Greear > That's why dev_alloc_name() is limited to two digits. And the > maximum is not 99, as people extending it to 999 think, but > 15. (When output of "ip link ls" still fits to one screen 8)8)) I know of a box that is running 50 virtual IP interfaces. It seems to run just fine, so it seems the code supports more interfaces... Btw, I was planning on using source-routing (ie the ip command) in a PC with 20 real ethernet interfaces. Does this mean that the ip command will not support that? Please notice Alexey's smiley, and then reread his words. He isn't stating that it cannot function, he is saying it "does not function" in terms of practicality. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue May 2 19:39:52 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 19:39:43 -0700 Received: from saw.sw.com.sg ([203.120.9.98]:11396 "HELO saw.sw.com.sg") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 19:39:28 -0700 Received: (qmail 29928 invoked by uid 577); 3 May 2000 02:39:22 -0000 Message-ID: <20000503103922.A29714@saw.sw.com.sg> Date: Wed, 3 May 2000 10:39:22 +0800 From: Andrey Savochkin To: kuznet@ms2.inr.ac.ru, Andrew Morton Cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <390EE5BB.2AF2F1CD@uow.edu.au> <200005021549.TAA16319@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <200005021549.TAA16319@ms2.inr.ac.ru>; from "A.N.Kuznetsov" on Tue, May 02, 2000 at 07:49:00PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello, On Tue, May 02, 2000 at 07:49:00PM +0400, A.N.Kuznetsov wrote: > > But eepro100 uses del_timer_sync() in tx_timeout() (not under any > > spinlock) > > Do not forget, it is under dev->xmit_lock! > > If media timer will grab it too -> deadlock. Calling del_timer_sync() may look dangerous, but media timer doesn't grab any locks (and I will keep it this way in the future). [snip] > Essentially, it is replacement for combination: > > del_timer(); > synchronize_bh(); > > used in 2.2. I.e. delete and wait for all pending BHs to complete. That's it. I made a mistake by not calling timer_exit(), I'll fix it. > > del_timer_sync() deletes and waits only for _this_ timer to complete. > It can be used from any context, provided user is sure that there > are no deadlocks. > > Alas, it has fatal bug. Namely, timer handler _code_ can be released > in between timer_exit() and return from handler. It is utterly > unlikely, but the bug is fatal. 8) I do not know how to repair > this without refcounts. I don't see a big problem here. We do not want to wait for the end of the handler. We want to synchronise against some operations inside the handler. If timer_exit() is called after the end of the operations, or goal is reached. The another important point is that the handler must not be called again in the same timer BH, but that's true. Best regards Andrey V. Savochkin From owner-netdev@oss.sgi.com Tue May 2 19:45:52 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 19:45:33 -0700 Received: from saw.sw.com.sg ([203.120.9.98]:12932 "HELO saw.sw.com.sg") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 19:45:29 -0700 Received: (qmail 29952 invoked by uid 577); 3 May 2000 02:45:26 -0000 Message-ID: <20000503104526.B29714@saw.sw.com.sg> Date: Wed, 3 May 2000 10:45:26 +0800 From: Andrey Savochkin To: kuznet@ms2.inr.ac.ru, Andrew Morton Cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <390E3143.5CF7D4AD@uow.edu.au> <200005021334.RAA14118@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <200005021334.RAA14118@ms2.inr.ac.ru>; from "A.N.Kuznetsov" on Tue, May 02, 2000 at 05:34:33PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello, On Tue, May 02, 2000 at 05:34:33PM +0400, A.N.Kuznetsov wrote: > [snip] > > speedo_timer does mdio_read()s. speedo_tx_timeout() does mdio_read()s > > and mdio_write()'s. mdio functions are stateful. Race. > > Are they touched in normal rx/tx path and/or irq? If they are not, > it is easy to repair with _separate_ mdio bh protected spinlock. > > The problem can be with control registers, which are reprogrammed > at IRQ level. mdio functions are called only from timer handler, open, and ioctl. They touch only MDIO specific control register, so they can be serialised by BH protection. I'll do it for eepro100 driver when I get some time. Best regards Andrey V. Savochkin From owner-netdev@oss.sgi.com Wed May 3 07:18:34 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 07:18:25 -0700 Received: from iwr1.iwr.uni-heidelberg.de ([129.206.104.40]:61924 "EHLO iwr1.iwr.uni-heidelberg.de") by oss.sgi.com with ESMTP id ; Wed, 3 May 2000 07:18:05 -0700 Received: from kenzo.iwr.uni-heidelberg.de (IDENT:bogdan@kenzo.iwr.uni-heidelberg.de [129.206.120.29]) by iwr1.iwr.uni-heidelberg.de (8.9.3/8.9.3) with ESMTP id QAA04287; Wed, 3 May 2000 16:18:01 +0200 (MET DST) Received: from localhost (bogdan@localhost) by kenzo.iwr.uni-heidelberg.de (8.8.7/8.8.7) with ESMTP id QAA03866; Wed, 3 May 2000 16:17:56 +0200 Date: Wed, 3 May 2000 16:17:56 +0200 (CEST) From: Bogdan Costescu To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <200005021549.TAA16319@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 2 May 2000 kuznet@ms2.inr.ac.ru wrote: > > I haven't studied it closely; it's a _very_ differently structured > > driver from the norm. I would guess that it has been ported from > > another OS. They ifdef all the spinlocks out of existence if !__SMP__. > > Interesting... > > Indeed 8) Where can I find it? http://support.3com.com/infodeli/tools/nic/linux.htm > Do not forget, it is under dev->xmit_lock! Sorry, I didn't understand from this discussion if hard_start_xmit is protected WRT itself outside the driver or the driver should implement locks to assure that the xxx_start_xmit routine is not executed simultaneously on 2 (or more) CPUs. Sincerely, Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From owner-netdev@oss.sgi.com Wed May 3 08:06:55 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 08:06:45 -0700 Received: from c855439-a.pinol1.sfba.home.com ([24.14.147.74]:17680 "EHLO despot.finemaltcoding.com") by oss.sgi.com with ESMTP id ; Wed, 3 May 2000 08:06:34 -0700 Received: from finemaltcoding.com (localhost.localdomain [127.0.0.1]) by despot.finemaltcoding.com (8.9.3/8.9.3) with ESMTP id IAA15302; Wed, 3 May 2000 08:06:03 -0700 Message-ID: <3910405B.159FC8A6@finemaltcoding.com> Date: Wed, 03 May 2000 08:06:03 -0700 From: "Daniel L. Rall" Organization: "Fine Malt Coding" X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David S. Miller" CC: greearb@candelatech.com, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: PATCH 2.2.14 net/core/dev.c References: <200005021448.SAA14593@ms2.inr.ac.ru> <390F8D3F.3DC6F7CB@candelatech.com> <200005030210.TAA02432@pizda.ninka.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing "David S. Miller" wrote: > > Date: Tue, 02 May 2000 19:21:51 -0700 > From: Ben Greear > > > That's why dev_alloc_name() is limited to two digits. And the > > maximum is not 99, as people extending it to 999 think, but > > 15. (When output of "ip link ls" still fits to one screen 8)8)) > > I know of a box that is running 50 virtual IP interfaces. It seems > to run just fine, so it seems the code supports more interfaces... > > Btw, I was planning on using source-routing (ie the ip command) in > a PC with 20 real ethernet interfaces. Does this mean that the ip > command will not support that? > > Please notice Alexey's smiley, and then reread his words. > > He isn't stating that it cannot function, he is saying it > "does not function" in terms of practicality. So, are large numbers of network devices just frowned on in general (because one should use techniques to reduce the need for so many), or would the rework of network device storage from a list to a hash make large numbers of devices more palatable? -- Daniel Rall From owner-netdev@oss.sgi.com Wed May 3 10:22:07 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 10:21:47 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:1802 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 3 May 2000 10:21:33 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA28060; Wed, 3 May 2000 21:17:06 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005031717.VAA28060@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: saw@saw.sw.com.sg (Andrey Savochkin) Date: Wed, 3 May 2000 21:17:06 +0400 (MSK DST) Cc: andrewm@uow.edu.au, netdev@oss.sgi.com In-Reply-To: <20000503103922.A29714@saw.sw.com.sg> from "Andrey Savochkin" at May 3, 0 10:39:22 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 403 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I don't see a big problem here. > We do not want to wait for the end of the handler. I said _code_. I did not see it too. Until understood that scheme for unloading netdevice modules (which I had assurance to claim to be the only modules, which are unloaded really cleanly 8)8)) is still incomplete without synchronize_bh(). So, it still shares bugs with all the rest of modules. 8) Alexey From owner-netdev@oss.sgi.com Wed May 3 11:04:08 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 11:03:58 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:30730 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 3 May 2000 11:03:32 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA28470; Wed, 3 May 2000 22:03:03 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005031803.WAA28470@ms2.inr.ac.ru> Subject: Re: PATCH 2.2.14 net/core/dev.c To: greearb@candelatech.com (Ben Greear) Date: Wed, 3 May 2000 22:03:03 +0400 (MSK DST) Cc: dlr@collab.net, netdev@oss.sgi.com In-Reply-To: <390F8D3F.3DC6F7CB@candelatech.com> from "Ben Greear" at May 2, 0 07:21:51 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1525 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > If those are in the critical path, then things like virtual web servers > (say with 20 virtual ip interfaces on them), should see a noticable drop > in performance, eh? "Virtual IP interfaces" are split to separate level in 2.2. They are _not_ _devices_ and you may create dozen of thousands of them without harm. See? > finds it's way into these kinds of devices, someone will optimize it to > better handle the large number of interfaces... One day, when people understood that number of files sometimes becomes pretty large, they invented "directories". Imagine, I worked on machines, which had no "directories". I do not want to rememeber about this experience. 8)8) It does not mean, that device list must stay suboptimal. But the things must be made in time. Plain list is enough for now, and it dictates programming style. > How else could you offer, say VLANs, without using interfaces? Do you ask me? Think! 8) > Layer 3 (IP & ARP) takes care of it for them. VLANs and PVCs are at lower layers, > which are generally presented to the users as interfaces. See above. When a level explodes, it must be split. No questions. Invent a way to split it. > Btw, I was planning on using source-routing (ie the ip command) in a > PC with 20 real ethernet interfaces. Does this mean that the ip command > will not support that? 8)8)8) ip was written specially to parse huge amounts of data. But it helps to parse _well_ _structured_ data, rather than flat data enumerated by random numbers. Alexey From owner-netdev@oss.sgi.com Wed May 3 11:09:27 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 11:09:08 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:32522 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 3 May 2000 11:08:57 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA28512; Wed, 3 May 2000 22:08:42 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005031808.WAA28512@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: Bogdan.Costescu@IWR.Uni-Heidelberg.De (Bogdan Costescu) Date: Wed, 3 May 2000 22:08:42 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Bogdan Costescu" at May 3, 0 04:17:56 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 295 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Sorry, I didn't understand from this discussion if hard_start_xmit is > protected WRT itself It is. > outside the driver or the driver should implement > locks to assure that the xxx_start_xmit routine is not executed > simultaneously on 2 (or more) CPUs. It should not. Alexey From owner-netdev@oss.sgi.com Wed May 3 12:44:58 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 12:44:48 -0700 Received: from tyholt.uninett.no ([158.38.60.10]:38406 "EHLO tyholt.uninett.no") by oss.sgi.com with ESMTP id ; Wed, 3 May 2000 12:44:29 -0700 Received: (from venaas@localhost) by tyholt.uninett.no (8.9.3/8.8.8) id VAA09227 for netdev@oss.sgi.com; Wed, 3 May 2000 21:44:26 +0200 (METDST) From: Stig Venaas Message-Id: <200005031944.VAA09227@tyholt.uninett.no> Subject: IPv6 source address selection To: netdev@oss.sgi.com Date: Wed, 03 May 2000 21:44:25 METDST X-Mailer: Elm [revision: 212.4] Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi I think Linux needs better IPv6 source address selection. When there are several global non-deprecated addresses, the current code (2.3.99-pre6) simply picks the first it finds, while I would like to choose one with longest common prefix compared to the destination address. This is described in draft-ietf-ipngwg-default-addr-select-00.txt paragraph 4 rule 6, and seems to be what for instance Solaris 8 does. This is generally useful when one has several global IPv6 addresses (multihomed). An important special case is when one of the addresses is a 6to4 address (starting with 2002::/16). If the destination address is 6to4 the source address must also be 6to4. If the destionation is some other global address (3ffe or something) it must not be used. The following patch against 2.3.99-pre6 seems to work for me: ------8<------- --- addrconf-2.3.99-pre6.c Wed May 3 21:21:40 2000 +++ addrconf.c Wed May 3 21:26:00 2000 @@ -434,12 +434,53 @@ } /* + * find the first different bit between two addresses + * length of address must be a multiple of 32bits + * + * Copied from ip6_fib.c + */ +static __inline__ int addr_diff(void *token1, void *token2, int addrlen) +{ + __u32 *a1 = token1; + __u32 *a2 = token2; + int i; + + addrlen >>= 2; + + for (i = 0; i < addrlen; i++) { + __u32 xb; + + xb = a1[i] ^ a2[i]; + + if (xb) { + int j = 31; + + xb = ntohl(xb); + + while (test_bit(j, &xb) == 0) + j--; + + return (i * 32 + 31 - j); + } + } + + /* + * we should *never* get to this point since that + * would mean the addrs are equal + */ + + return addrlen<<5; +} + +/* * Choose an apropriate source address * should do: * i) get an address with an apropriate scope * ii) see if there is a specific route for the destination and use * an address of the attached interface * iii) don't use deprecated addresses + * iv) if several valid addresses, use one with longest common prefix + * with destination address */ int ipv6_get_saddr(struct dst_entry *dst, struct in6_addr *daddr, struct in6_addr *saddr) @@ -447,10 +488,13 @@ int scope; struct inet6_ifaddr *ifp = NULL; struct inet6_ifaddr *match = NULL; + struct inet6_ifaddr *best = NULL; struct net_device *dev = NULL; struct inet6_dev *idev; struct rt6_info *rt; int err; + int common_len; + int max_common_len = -1; rt = (struct rt6_info *) dst; if (rt) @@ -481,10 +525,14 @@ for (ifp=idev->addr_list; ifp; ifp=ifp->if_next) { if (ifp->scope == scope) { if (!(ifp->flags & (IFA_F_DEPRECATED|IFA_F_TENTATIVE))) { - in6_ifa_hold(ifp); - read_unlock_bh(&idev->lock); - read_unlock(&addrconf_lock); - goto out; + common_len = addr_diff(daddr, &ifp->addr, 16); + if (common_len > max_common_len) { + in6_ifa_hold(ifp); + if (best) + in6_ifa_put(best); + best = ifp; + max_common_len = common_len; + } } if (!match && !(ifp->flags & IFA_F_TENTATIVE)) { @@ -496,6 +544,10 @@ read_unlock_bh(&idev->lock); } read_unlock(&addrconf_lock); + if (best) { + ifp = best; + goto out; + } } if (scope == IFA_LINK) ------8<------- Note that I did not write the addr_diff routine, I simply copied it from ip6_fib.c. If the patch looks good I guess one should reorganize the code a bit so that addr_diff isn't duplicated. What do you think? Is this something that can go into 2.4? If necessary I can try to explain better why it's important. Stig -- Stig Venaas UNINETT From owner-netdev@oss.sgi.com Thu May 4 14:28:36 2000 Received: by oss.sgi.com id ; Thu, 4 May 2000 14:28:26 -0700 Received: from seattle.3com.com ([129.213.128.97]:50671 "EHLO seattle.3com.com") by oss.sgi.com with ESMTP id ; Thu, 4 May 2000 14:28:03 -0700 Received: from new-york.3com.com (new-york.3com.com [129.213.157.12]) by seattle.3com.com (8.8.8/8.8.8) with ESMTP id OAA24736 for ; Thu, 4 May 2000 14:28:03 -0700 (PDT) From: Jim_March@3com.com Received: from hqoutbound.ops.3com.com (hqoutbound.OPS.3Com.COM [139.87.48.104]) by new-york.3com.com (8.8.8/8.8.8) with SMTP id OAA25776 for ; Thu, 4 May 2000 14:28:02 -0700 (PDT) Received: by hqoutbound.ops.3com.com(Lotus SMTP MTA v4.6.7 (934.1 12-30-1999)) id 882568D5.0075AE49 ; Thu, 4 May 2000 14:25:23 -0700 X-Lotus-FromDomain: 3COM To: netdev@oss.sgi.com Message-ID: <882568D5.0075A109.00@hqoutbound.ops.3com.com> Date: Thu, 4 May 2000 14:25:53 -0700 Subject: RSIP under Linux Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Having just joined this mailing list, I'm looking forward to hearing about other active projects. Of particular interest to me are those of you considering adding Realm Specific IP, or other Network Address Translation protocols, to Linux, as I am currently designing an RSIP implementation. From owner-netdev@oss.sgi.com Thu May 4 15:29:56 2000 Received: by oss.sgi.com id ; Thu, 4 May 2000 15:29:46 -0700 Received: from icasun1.epfl.ch ([128.178.151.148]:16548 "EHLO icasun1.epfl.ch") by oss.sgi.com with ESMTP id ; Thu, 4 May 2000 15:29:30 -0700 Received: from ica.epfl.ch (jpmf@tcomhp33.epfl.ch [128.178.151.24]) by icasun1.epfl.ch (8.8.X/EPFL-8.1d for ICA) with ESMTP id AAA09286; Fri, 5 May 2000 00:29:16 +0200 (MET DST) Message-Id: <200005042229.AAA09286@icasun1.epfl.ch> X-Mailer: exmh version 2.1.1 10/15/1999 From: "J.P. Martin-Flatin" To: davem@redhat.com, ak@muc.de, kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com cc: "J.P. Martin-Flatin" Reply-To: "J.P. Martin-Flatin" Subject: TCP_RTO_MAX: 120 --> 240 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 05 May 2000 00:29:14 +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, In Linux 2.3.99pre6, TCP_RTO_MAX is set to (120*HZ) in include/net/tcp.h (line 297). I think it should be set to (240*HZ). In net/ipv4/tcp_timer.c:tcp_retransmit_timer in Linux 2.3.99pre6, there's a comment lines 590-604 saying: /* Increase the timeout each time we retransmit. Note that * we do not increase the rtt estimate. rto is initialized * from rtt, but increases here. Jacobson (SIGCOMM 88) suggests * that doubling rto each time is the least we can get away with. * In KA9Q, Karn uses this for the first few times, and then * goes to quadratic. netBSD doubles, but only goes up to *64, * and clamps at 1 to 64 sec afterwards. Note that 120 sec is * defined in the protocol as the maximum possible RTT. I guess * we'll have to use something other than TCP to talk to the * University of Mars. * * PAWS allows us longer timeouts and large windows, so once * implemented ftp to mars will work nicely. We will have to fix * the 120 second clamps though! */ I checked RFC 793 and it doesn't mention a limit of 120 secs. RFC 1122 says the following in Section 4.2.3.1 "Retransmission Timeout Calculation", page 96: The recommended upper and lower bounds on the RTO are known to be inadequate on large internets. The lower bound SHOULD be measured in fractions of a second (to accommodate high speed LANs) and the upper bound should be 2*MSL, i.e., 240 seconds. I therefore believe that the comment is incorrect and that TCP_RTO_MAX should be set to 240 seconds, or to be strict (240*HZ). Please confirm or correct me if I'm wrong. Thanks JP ____________________________________________________________ J.P. Martin-Flatin, EPFL-DSC-ICA, 1015 Lausanne, Switzerland Email: jp.martin-flatin@ieee.org Fax: +41-21-693-66-10 Web: http://icawww.epfl.ch/~jpmf/ From owner-netdev@oss.sgi.com Thu May 4 22:16:03 2000 Received: by oss.sgi.com id ; Thu, 4 May 2000 22:15:54 -0700 Received: from [206.24.4.33] ([206.24.4.33]:32531 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 4 May 2000 22:15:39 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id BAA04311; Fri, 5 May 2000 01:15:26 -0400 Date: Fri, 5 May 2000 01:15:25 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: kuznet@ms2.inr.ac.ru cc: Andrew Morton , netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <200005021334.RAA14118@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Tue, 2 May 2000 kuznet@ms2.inr.ac.ru wrote: > Subject: Re: tx_timeout and timer serialisation Here is the write-up I have in pci-skeleton.c v2.** from http://www.scyld.com/network/index.html ftp://scyld.com/pub/network/ ________________ IIId. SMP semantics The following are serialized with respect to each other via the "xmit_lock". dev->hard_start_xmit() Transmit a packet dev->tx_timeout() Transmit watchdog for stuck Tx dev->set_multicast_list() Set the recieve filter. Note: The Tx timeout watchdog code is implemented by the timer routine in kernels up to 2.2.*. In 2.4.* and later the timeout code is part of the driver interface. The following fall under the global kernel lock. The module will not be unloaded during the call, unless a call with a potential reschedule e.g. kmalloc() is called. No other synchronization assertion is made. dev->open() dev->do_ioctl() dev->get_stats() Caution: The lock for dev->open() is commonly broken with request_irq() or kmalloc(). It is best to avoid any lock-breaking call in do_ioctl() and get_stats(), or additional module locking code must be implemented. The following is self-serialized (no simultaneous entry) An handler registered with request_irq(). ________________ > > 2: timer functions, ioctl() and get_stats() have no guarantees. > > Media timer is separate item, it is inivisible from outside at all. > > ioctl() is allowed to sleep, hence top level cannot do anything > to serialize it wrt class 1. But it serializes it wrt itself > and open(), close(), which is not so easy by the way. 8) It's important for code simplicity to keep this assurance. An ioctl() usually examines or sets something to do with chip operation, just the sort of action that open() is close() are trying to do. > get_stats() is very special thing. The problem with this is > that it is broken by design, returning pointer to some static structure > to code which knows nothing about its protection. To reword slightly: it returns a pointer to an internally maintained structure. That structure consists of unsigned, aligned, word-long values which typically remained fixed in value, but may be monotonically increasing. When reporting dynamically increasing values, any consistent value is permitted to be reported e.g. the user may see either N or N+1. (The phrase "returning a pointer to some static structure" usually implies a 'static struct foo' value.) > It is pretty > clear that if get_stats() does not modify counters, it needs > no protection. But if it _does_ modify (== touch hardware registers), > as most of ethernet drivers do, top level cannot do anything with this > either in 2.2 or in 2.3. Protection wrt IRQs is out of area of its expertise. Most hardware has good semantics: reading a value zeros the register. Thus the only bad thing that can happen is a increment-race. > I would advise to avoid touching statistics from get_stats, when it > is possible and to modify counters only in context, where they > are naturally serialized. F.e. tulip get_stats() changes only > rx_missed_errors. Does it really deserve mud with irq safe locks around? Some of the drivers note that they have an potential SMP race, but that it's mostly harmless. A race is extremely unlikely, and almost never impacts the count -- who cares in what order you add 0 to the error count. The statistics are generally non-critical values used primarily for getting a idea of the system behavior. Few people care if they have 9876 or 9877 CRC error, the problem is that it's not zero errors. > > 3: The ISR can be called during get_stats() Generally not a conflict, except on chips with register windows e.g. 8390 and 3Coms start_xmit() This frequently doesn't need a lock, if the cur_tx and dirty_tx values are incremented atomically at the proper point. ioctl() Generally not a problem media_timer() The media timers used to not do very much. They just checked link beat. Now they check duplex, flow control, timeouts, etc. and should be checked carefully at each change. tx_timeout(), etc. In some places the assumption is made that this is only called if the chip has stopped working some time ago. So no IRQ lock is put in place. > > speedo_timer does mdio_read()s. speedo_tx_timeout() does mdio_read()s > > and mdio_write()'s. mdio functions are stateful. Race. The original eepro100 code didn't do mdio_read() in the timer routine. The i82557 handled duplex switching itself. With the i82559 it must now check for negotiated flow control to avoid a bug with the chip sending flow control packets on half duplex links. This change does result in a race with ioctl() calls. > Are they touched in normal rx/tx path and/or irq? If they are not, > it is easy to repair with _separate_ mdio bh protected spinlock. Only the MDIO access register needs to be protected. > > Another issue: del_timer_sync(). It deletes a timer, but if that timer > > happens to be running, del_timer_sync() blocks until the timer's handler > > returns. > > Do some drivers really use this? Not that I know of. They use the regular del_timer(). Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Fri May 5 10:48:36 2000 Received: by oss.sgi.com id ; Fri, 5 May 2000 10:48:26 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:24333 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 5 May 2000 10:48:11 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA04634; Fri, 5 May 2000 21:45:27 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005051745.VAA04634@ms2.inr.ac.ru> Subject: Re: TCP_RTO_MAX: 120 --> 240 To: jp.martin-flatin@ieee.org Date: Fri, 5 May 2000 21:45:27 +0400 (MSK DST) Cc: davem@redhat.com, ak@muc.de, netdev@oss.sgi.com In-Reply-To: <200005042229.AAA09286@icasun1.epfl.ch> from "J.P. Martin-Flatin" at May 5, 0 00:29:14 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 622 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > In Linux 2.3.99pre6, TCP_RTO_MAX is set to (120*HZ) in include/net/tcp.h > (line 297). I think it should be set to (240*HZ). If you plan to talk to Mars, it is not enough. 8) If you do not plan to leave Earth, 60 sec is much more than enough. 8) Another place from the same RFC: The TCP specification [TCP:1] arbitrarily assumes a value of 2 minutes for MSL. So, MSL is an arbitrary value selected to be realistic. Nowadays, MSL is 60 seconds. RFC1122 is too conservative in most of places, wrong in another ones and real implemntation fix it, when it is necessary. Alexey From owner-netdev@oss.sgi.com Fri May 5 13:43:47 2000 Received: by oss.sgi.com id ; Fri, 5 May 2000 13:43:38 -0700 Received: from tunnel.bieringer.de ([195.226.187.50]:59140 "EHLO tunnel.bieringer.de") by oss.sgi.com with ESMTP id ; Fri, 5 May 2000 13:43:26 -0700 Received: (from peter@localhost) by tunnel.bieringer.de (8.9.3/8.9.3) id XAA04592; Fri, 5 May 2000 23:43:02 +0200 Message-Id: <3.0.6.32.20000505224541.008d3940@mail.bieringer.de> X-URL: http://www.bieringer.de/pb/ X-Sender: peter@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32) Date: Fri, 05 May 2000 22:45:41 +0200 To: netdev@oss.sgi.com From: Peter Bieringer Subject: Q: 2.2.15 default behavior for IPv4 source address selection ICMP/UDP on multi-alias host Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, perhaps someone could update me, how a Linux kernel select the source IPv4 address on ICMP/UDP. Following "real world" environment (multi domain hosting with IP based accounting) causes me a little bit of headache: One PC with one Ethernet interface Basic IP is: eth0: x.y.z.62 Also I defined some aliases: eth0:0 x.y.z.61 eth0:1 x.y.z.60 eth0:2 x.y.z.59 eth0:3 x.y.z.58 My amateur thoughts were that packets send by following tools ping traceroute ntp (xntp3) should have source IP address of "eth0" (*.62). Unfortunately, they uses "eth0:3" (*.58). Is it able to control upper written behavior by default (/proc ?)? I.e. all outgoing packets without any special binds are using the one of "eth0"? Thanks for any hints, Peter From owner-netdev@oss.sgi.com Sat May 6 06:49:30 2000 Received: by oss.sgi.com id ; Sat, 6 May 2000 06:49:21 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:48906 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Sat, 6 May 2000 06:49:03 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id RAA18924; Sat, 6 May 2000 17:48:52 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005061348.RAA18924@ms2.inr.ac.ru> Subject: Re: Q: 2.2.15 default behavior for IPv4 source address selection To: pb@bieringer.DE (Peter Bieringer) Date: Sat, 6 May 2000 17:48:52 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <3.0.6.32.20000505224541.008d3940@mail.bieringer.de> from "Peter Bieringer" at May 6, 0 01:13:05 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 461 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Basic IP is: > eth0: x.y.z.62 .... > should have source IP address of "eth0" (*.62). My thoughts are the same. > Unfortunately, they uses "eth0:3" (*.58). Really? Well, delete it then and look what will occur. 8) > Is it able to control upper written behavior by default (/proc ?)? I.e. all > outgoing packets without any special binds are using the one of "eth0"? The rules are described in one of appenices to ip-cref from iproute2. Alexey From owner-netdev@oss.sgi.com Sat May 6 08:41:42 2000 Received: by oss.sgi.com id ; Sat, 6 May 2000 08:41:31 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:50058 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Sat, 6 May 2000 08:41:18 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id LAA14747; Sat, 6 May 2000 11:41:18 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id LAA16408; Sat, 6 May 2000 11:41:18 -0400 (EDT) Date: Sat, 6 May 2000 11:41:18 -0400 (EDT) From: jamal To: kuznet@ms2.inr.ac.ru cc: jp.martin-flatin@ieee.org, davem@redhat.com, Andi Kleen , netdev@oss.sgi.com Subject: Re: TCP_RTO_MAX: 120 --> 240 In-Reply-To: <200005051745.VAA04634@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing We need to support inter-plenatary TCP/IP! ;-> OK i hope that is funny. It was to me when i walked into some presentation someone from http://www.ipnsig.org/ was giving. But this is serious stuff The presentation is at: http://www.ipnsig.org/reports/ipnnodes/index.htm Interplanetary TCP: Hmm... DoS now means some kiddie robot on the moon just dropping all SYNs going by for fun ;-> cheers, jamal On Fri, 5 May 2000 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > In Linux 2.3.99pre6, TCP_RTO_MAX is set to (120*HZ) in include/net/tcp.h > > (line 297). I think it should be set to (240*HZ). > > If you plan to talk to Mars, it is not enough. 8) > If you do not plan to leave Earth, 60 sec is much more than enough. 8) > > > Another place from the same RFC: > > > The TCP specification [TCP:1] arbitrarily > assumes a value of 2 minutes for MSL. > > So, MSL is an arbitrary value selected to be realistic. > > Nowadays, MSL is 60 seconds. RFC1122 is too conservative in most > of places, wrong in another ones and real implemntation fix it, > when it is necessary. > > Alexey > From owner-netdev@oss.sgi.com Mon May 8 04:29:11 2000 Received: by oss.sgi.com id ; Mon, 8 May 2000 04:28:50 -0700 Received: from mailhostnew.tbit.dk ([194.182.135.150]:20409 "EHLO mailhostnew.tbit.dk") by oss.sgi.com with ESMTP id ; Mon, 8 May 2000 04:28:20 -0700 Received: from ric.tbit.dk (ric.tbit.dk [194.182.135.53]) by mailhostnew.tbit.dk (8.9.3+Sun/8.9.3) with ESMTP id NAA20691 for ; Mon, 8 May 2000 13:28:17 +0200 (MET DST) Received: (from ric@localhost) by ric.tbit.dk (8.9.3/8.9.3) id NAA24104; Mon, 8 May 2000 13:28:16 +0200 To: netdev@oss.sgi.com Subject: Re: Q: 2.2.15 default behavior for IPv4 source address selection ICMP/UDP on multi-alias host References: <3.0.6.32.20000505224541.008d3940@mail.bieringer.de> From: "Richard =?iso-8859-1?q?J=F8rgensen?=" Reply-To: ric@tbit.dk Date: 08 May 2000 13:28:15 +0200 In-Reply-To: Peter Bieringer's message of "Fri, 05 May 2000 22:45:41 +0200" Message-ID: Lines: 34 User-Agent: Gnus/5.070098 (Pterodactyl Gnus v0.98) Emacs/20.3 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Peter Bieringer writes: > perhaps someone could update me, how a Linux kernel select the source IPv4 > address on ICMP/UDP. I haven't read the kernel source, but I might be able to help anyway. > One PC with one Ethernet interface > > Basic IP is: > eth0: x.y.z.62 > Also I defined some aliases: > eth0:0 x.y.z.61 [...] > ping [...] should have source IP address of "eth0" (*.62). My experience with using aliases is that the source address is based on the routing table: "route add -host x.y.x.t" will cause "ping x.y.z.t" to have source address x.y.z.62 whereas "route add -host x.y.x.t dev eth0:0" will cause "ping x.y.z.t" to have source address x.y.z.61 The aliases you create will automatically be added to the routing, so if you use several ip-adresses belonging to the same net, pinging a host on that net will use the last alias you defined as source address. Note: The routing table will show Iface = eth0 regardless of whether it is eth0, eth0:0, eth0:1, ... Hope this helps. -- Richard Jørgensen System Developer, M. Sc. From owner-netdev@oss.sgi.com Mon May 8 14:16:24 2000 Received: by oss.sgi.com id ; Mon, 8 May 2000 14:16:04 -0700 Received: from sabre-wulf.nvg.ntnu.no ([129.241.210.67]:13573 "EHLO sabre-wulf.nvg.ntnu.no") by oss.sgi.com with ESMTP id ; Mon, 8 May 2000 14:15:50 -0700 Received: from tyrell.nvg.ntnu.no ([IPv6:::ffff:129.241.210.70]:13842 "EHLO tyrell.nvg.ntnu.no" ident: "TIMEDOUT2" whoson: "-unregistered-") by sabre-wulf.nvg.ntnu.no with ESMTP id ; Mon, 8 May 2000 23:15:19 +0200 Received: (from venaas@localhost) by tyrell.nvg.ntnu.no (8.9.3/8.8.4) id XAA05575; Mon, 8 May 2000 23:15:08 +0200 From: Date: Mon, 8 May 2000 23:15:08 +0200 To: Stig Venaas Cc: netdev@oss.sgi.com Subject: Re: IPv6 source address selection Message-ID: <20000508231508.A5490@nvg.ntnu.no> References: <200005031944.VAA09227@tyholt.uninett.no> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <200005031944.VAA09227@tyholt.uninett.no>; from Stig.Venaas@uninett.no on Wed, May 03, 2000 at 09:44:25PM +0000 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi I posted a suggestion for IPv6 source address selection last Wednesday but haven't heard anything. It would be nice to hear what people think of this. What do you say Alexey? Do you agree this is useful, and is something that can be included? If the idea is good but not the patch, let me know. Should I explain why I think it's needed? Stig -- Duct tape is like the force. It has a light side, and a dark side, and it holds the universe together ... -- Carl Zwanzig From owner-netdev@oss.sgi.com Wed May 10 07:52:58 2000 Received: by oss.sgi.com id ; Wed, 10 May 2000 07:52:49 +0000 Received: from mars.arts.u-szeged.hu ([160.114.28.163]:23816 "EHLO mars.arts.u-szeged.hu") by oss.sgi.com with ESMTP id ; Wed, 10 May 2000 07:52:34 +0000 Received: from localhost (sogor@localhost) by mars.arts.u-szeged.hu (8.9.3/8.9.3/Debian/GNU) with SMTP id JAA02291 for ; Wed, 10 May 2000 09:52:17 +0200 Date: Wed, 10 May 2000 09:52:17 +0200 (CEST) From: Sogor Laszlo Reply-To: Sogor Laszlo To: netdev@oss.sgi.com Subject: IPv4_MAPPED destination addresses Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! We do a SIIT (Stateless IP/ICMP Translator) for the Linux kernel (2.2.14). ------------- -------- ------------- | IPv6 only |----| SIIT |--------| IPv4 only | ------------- -------- ------------- The IPv6 only host must send the packet with source address: IPv4 TRANSLATED IPv6 (::ffff:0:0:0/96 prefix) dest. address: IPv4 MAPPED IPv6 (::ffff:0:0/96 prefix) The SIIT gets the source and destination embedded addresses, drop the IPv6 header (and the extension headers), put an IPv4 header into the packet, and send it. We have the following problem: if we want to connect to an IPv4_MAPPED address with IPv6, it opens an IPv4 socket instead of IPv6. The IPv4 only host gets the converted packet, responds it, the SIIT converts it into IPv6, and the IPv6 only host's kernel drops it, because it sent an IPv4 connect and got IPv6 packet. We found the codes that were responsible for the "IPv6->IPv4 fall back", we commented it out. It works, but there must be a "nice" solution. Is there any special reason that the IPv6 implementation of Linux works this way? The another problem is: if we "ping" the IPv6 only host from the IPv4 only host over the SIIT, the packet the IPv6 only host receives has IPv4 MAPPED IPv6 source address, and it doesn't response for it. shogy From owner-netdev@oss.sgi.com Wed May 10 08:02:48 2000 Received: by oss.sgi.com id ; Wed, 10 May 2000 08:02:39 +0000 Received: from mail.tiszanet.hu ([195.228.98.38]:44810 "EHLO mail.tiszanet.hu") by oss.sgi.com with ESMTP id ; Wed, 10 May 2000 08:02:27 +0000 Received: from adm3.local.tiszanet.hu [195.228.98.5] by mail.tiszanet.hu (SMTPD32-6.00) id A78D1DC80146; Wed, 10 May 2000 10:02:21 +0200 From: Dikan Gabor To: netdev@oss.sgi.com Subject: network Date: Wed, 10 May 2000 09:56:40 +0200 X-Mailer: KMail [version 1.0.29] Content-Type: text/plain MIME-Version: 1.0 Message-Id: <00051010022003.02704@adm3.local.tiszanet.hu> Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! Do you know about a documentation how the kernel handles network packages? (From getting to sending. Which functions, etc...) Thank you for your helping, Gabor From owner-netdev@oss.sgi.com Wed May 10 08:38:09 2000 Received: by oss.sgi.com id ; Wed, 10 May 2000 08:37:59 +0000 Received: from mailhostnew.tbit.dk ([194.182.135.150]:37095 "EHLO mailhostnew.tbit.dk") by oss.sgi.com with ESMTP id ; Wed, 10 May 2000 08:37:44 +0000 Received: from ric.tbit.dk (ric.tbit.dk [194.182.135.53]) by mailhostnew.tbit.dk (8.9.3+Sun/8.9.3) with ESMTP id KAA11122 for ; Wed, 10 May 2000 10:37:42 +0200 (MET DST) Received: (from ric@localhost) by ric.tbit.dk (8.9.3/8.9.3) id KAA01939; Wed, 10 May 2000 10:37:41 +0200 To: netdev@oss.sgi.com Subject: Re: IPv4_MAPPED destination addresses References: From: "Richard =?iso-8859-1?q?J=F8rgensen?=" Reply-To: ric@tbit.dk Date: 10 May 2000 10:37:40 +0200 In-Reply-To: Sogor Laszlo's message of "Wed, 10 May 2000 09:52:17 +0200 (CEST)" Message-ID: Lines: 66 User-Agent: Gnus/5.070098 (Pterodactyl Gnus v0.98) Emacs/20.3 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Sogor Laszlo writes: > We do a SIIT (Stateless IP/ICMP Translator) for the Linux kernel (2.2.14). [...] > The another problem is: if we "ping" the IPv6 only host from the IPv4 only > host over the SIIT, the packet the IPv6 only host receives has IPv4 MAPPED > IPv6 source address, and it doesn't response for it. If you follow the RFC-2765(SIIT) then the translated ping requests has a fragmentation header added. Kernel 2.2.14 (and 2.2.15 for that matter) doesn't handle nonfragmented IPv6 packets, that contains a fragmentation header. Alexey has found that bug i Kernel 2.3.99-pre5, and sent me a patch. I've extrapolated a patch to Kernel 2.2.14 that also fixes the bug. I'll append that patch below. Regards, Richard Jørgensen --------------------------------------------------------------------------- diff -ur linux-2.2.14_orig/net/ipv6/ip6_input.c /usr/src/linux-2.2.14/net/ipv6/ip6_input.c --- linux-2.2.14_orig/net/ipv6/ip6_input.c Thu Aug 26 02:29:53 1999 +++ /usr/src/linux-2.2.14/net/ipv6/ip6_input.c Fri Apr 28 09:49:11 2000 @@ -99,7 +99,7 @@ struct raw6_opt *opt; opt = &sk->tp_pinfo.tp_raw; - icmph = (struct icmp6hdr *) (skb->nh.ipv6h + 1); + icmph = (struct icmp6hdr *) skb->h.raw; return test_bit(icmph->icmp6_type, &opt->filter); } diff -ur linux-2.2.14_orig/net/ipv6/ip6_output.c /usr/src/linux-2.2.14/net/ipv6/ip6_output.c --- linux-2.2.14_orig/net/ipv6/ip6_output.c Fri Apr 23 04:45:20 1999 +++ /usr/src/linux-2.2.14/net/ipv6/ip6_output.c Fri Apr 28 10:00:04 2000 @@ -121,7 +121,7 @@ if (skb_headroom(skb) < head_room) { struct sk_buff *skb2 = skb_realloc_headroom(skb, head_room); - kfree(skb); + kfree_skb(skb); skb = skb2; if (skb == NULL) return -ENOBUFS; diff -ur linux-2.2.14_orig/net/ipv6/reassembly.c /usr/src/linux-2.2.14/net/ipv6/reassembly.c --- linux-2.2.14_orig/net/ipv6/reassembly.c Fri Aug 28 04:33:09 1998 +++ /usr/src/linux-2.2.14/net/ipv6/reassembly.c Thu Apr 27 12:38:52 2000 @@ -160,6 +160,15 @@ icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, skb->h.raw); return NULL; } + + if (!(fhdr->frag_off & __constant_htons(0xFFF9))) { + /* It is not a fragmnted frame */ + skb->h.raw += sizeof(struct frag_hdr); + ipv6_statistics.Ip6ReasmOKs++; + + return &fhdr->nexthdr; + } + if (atomic_read(&ip6_frag_mem) > sysctl_ip6frag_high_thresh) frag_prune(); From owner-netdev@oss.sgi.com Wed May 10 17:32:44 2000 Received: by oss.sgi.com id ; Wed, 10 May 2000 17:32:34 +0000 Received: from minus.inr.ac.ru ([193.233.7.97]:44560 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 10 May 2000 17:32:18 +0000 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA22760; Wed, 10 May 2000 21:31:45 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005101731.VAA22760@ms2.inr.ac.ru> Subject: Re: IPv4_MAPPED destination addresses To: sogor@mars.ARts.u-szeged.HU Date: Wed, 10 May 2000 21:31:45 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Sogor Laszlo" at May 10, 0 12:13:11 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 404 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > We found the codes that were responsible for the "IPv6->IPv4 fall back", > we commented it out. And broke all the stack, right? Mapped addresses must never leave host and host cannot use mapped addresses talking to IPv6 world. Seems, I do not understand your problem at all. > It works, but there must be a "nice" solution. I do not see any solution now, neither ugly nor nice. Alexey From owner-netdev@oss.sgi.com Wed May 10 18:43:14 2000 Received: by oss.sgi.com id ; Wed, 10 May 2000 18:43:04 +0000 Received: from mars.arts.u-szeged.hu ([160.114.28.163]:52491 "EHLO mars.arts.u-szeged.hu") by oss.sgi.com with ESMTP id ; Wed, 10 May 2000 18:42:42 +0000 Received: from localhost (sogor@localhost) by mars.arts.u-szeged.hu (8.9.3/8.9.3/Debian/GNU) with SMTP id UAA19334; Wed, 10 May 2000 20:42:22 +0200 Date: Wed, 10 May 2000 20:42:22 +0200 (CEST) From: Sogor Laszlo To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com Subject: Re: IPv4_MAPPED destination addresses In-Reply-To: <200005101731.VAA22760@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing hello! On Wed, 10 May 2000 kuznet@ms2.inr.ac.ru wrote: > > We found the codes that were responsible for the "IPv6->IPv4 fall back", > > we commented it out. > > And broke all the stack, right? > > Mapped addresses must never leave host and host cannot use > mapped addresses talking to IPv6 world. Seems, I do not understand > your problem at all. The SIIT only converts IPv6 packets with IPv4 MAPPED IPv6 destination and IPv4 TRANSLATED IPv6 source addresses. So to send a packet from an IPv6 only host to an IPv4 only host we MUST send an IPv6 packet with MAPPED/TRANSLATED addresses (d/s) over the translator. shogy From owner-netdev@oss.sgi.com Wed May 10 18:50:24 2000 Received: by oss.sgi.com id ; Wed, 10 May 2000 18:50:04 +0000 Received: from minus.inr.ac.ru ([193.233.7.97]:4625 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 10 May 2000 18:49:53 +0000 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA23515; Wed, 10 May 2000 22:49:45 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005101849.WAA23515@ms2.inr.ac.ru> Subject: Re: IPv4_MAPPED destination addresses To: sogor@mars.arts.u-szeged.hu (Sogor Laszlo) Date: Wed, 10 May 2000 22:49:45 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Sogor Laszlo" at May 10, 0 08:42:22 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 313 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > The SIIT only converts IPv6 packets with IPv4 MAPPED IPv6 destination > and IPv4 TRANSLATED IPv6 source addresses. Translated ones are fully OK. Mapped ones are not to be used for communication, their scope is limited by top level socket API, they never appear as _addresses_ in some packets. Alexey From owner-netdev@oss.sgi.com Wed May 10 19:12:57 2000 Received: by oss.sgi.com id ; Thu, 11 May 2000 02:12:37 +0000 Received: from [202.102.223.33] ([202.102.223.33]:57625 "EHLO mx1.ustc.edu.cn") by oss.sgi.com with ESMTP id ; Thu, 11 May 2000 02:12:20 +0000 Received: from ustc.edu.cn (hpe25.nic.ustc.edu.cn [202.38.64.1]) by mx1.ustc.edu.cn (8.8.7/8.8.6) with SMTP id KAA09567; Thu, 11 May 2000 10:31:11 -0800 Received: from tarn.isdn.ustc.edu.cn by ustc.edu.cn with ESMTP (8.6.10/16.2) id KAA11374; Thu, 11 May 2000 10:17:06 +0800 Date: Thu, 11 May 2000 10:12:59 +0800 (CST) From: Wang Hui X-Sender: hwang@tarn.isdn.ustc.edu.cn To: Sogor Laszlo cc: netdev@oss.sgi.com, 6bone@ISI.EDU Subject: Re: IPv4_MAPPED destination addresses In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > We do a SIIT (Stateless IP/ICMP Translator) for the Linux kernel (2.2.14). > > ------------- -------- ------------- > | IPv6 only |----| SIIT |--------| IPv4 only | > ------------- -------- ------------- > The IPv6 only host must send the packet with > source address: IPv4 TRANSLATED IPv6 (::ffff:0:0:0/96 prefix) > dest. address: IPv4 MAPPED IPv6 (::ffff:0:0/96 prefix) > > The SIIT gets the source and destination embedded addresses, drop the IPv6 > header (and the extension headers), put an IPv4 header into the packet, > and send it. > > We have the following problem: if we want to connect to an IPv4_MAPPED > address with IPv6, it opens an IPv4 socket instead of IPv6. > Here we got a problem for our box is not a real *IPv6-only* box. i.e., it has a IPv4 implemention witch make the box a dual stack box. In the dual stack kernel, when it finds a IPv4-mapped address as the dest address, the kernel will use IPv4. So there is a rule: a dual stack kernel dosen't need a SIIT, since it can understand both IPv6 and IPv4; while a IPv6-only or IPv4-only box needs SIIT(RFC 2764) or NAT-PT(RFC 2765). --Wang Hui. IPv4 is IP before. http://v6RT.ecn.6test.edu.cn/ From owner-netdev@oss.sgi.com Wed May 10 19:30:48 2000 Received: by oss.sgi.com id ; Thu, 11 May 2000 02:30:27 +0000 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:59383 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Thu, 11 May 2000 02:30:14 +0000 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by smtprch1.nortel.com; Wed, 10 May 2000 21:25:08 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id K192HCCM; Thu, 11 May 2000 10:24:58 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id J8YN6Z9X; Thu, 11 May 2000 12:25:01 +1000 Received: from uow.edu.au (IDENT:akpm@localhost [127.0.0.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id MAA05381; Thu, 11 May 2000 12:24:54 +1000 Message-ID: <391A19F6.5C72A6E0@uow.edu.au> Date: Thu, 11 May 2000 02:24:54 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.3.99-pre5 i686) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <390EE5BB.2AF2F1CD@uow.edu.au> from "Andrew Morton" at May 3, 0 00:27:07 am <200005021549.TAA16319@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > Hello! Hello indeed. I'm catching up on a week away... [ Regarding del_timer_sync() ] > Alas, it has fatal bug. Namely, timer handler _code_ can be released > in between timer_exit() and return from handler. It is utterly > unlikely, but the bug is fatal. 8) I do not know how to repair > this without refcounts. Why does the handler have to call timer_exit() at all? Could we not clear timer->running in run_timer_list()? That would certainly protect us from the problem you identify... -- -akpm- From owner-netdev@oss.sgi.com Thu May 11 06:10:44 2000 Received: by oss.sgi.com id ; Thu, 11 May 2000 13:10:25 +0000 Received: from minus.inr.ac.ru ([193.233.7.97]:53006 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 11 May 2000 13:09:55 +0000 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id RAA32574; Thu, 11 May 2000 17:09:29 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005111309.RAA32574@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: andrewm@uow.edu.au (Andrew Morton) Date: Thu, 11 May 2000 17:09:29 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <391A19F6.5C72A6E0@uow.edu.au> from "Andrew Morton" at May 11, 0 02:24:54 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 657 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Why does the handler have to call timer_exit() at all? > > Could we not clear timer->running in run_timer_list()? That would > certainly protect us from the problem you identify... Timers are self-destructable as rule. See? Normal usage for timer is to have it allocated inside an object and timer event detroys the object together with timer. In this case we have to use refcounts external to timer to avoid races. Actually, existing *_timer primitives are very inconvenient. And I did not find any good way to improve them. Essentially, del_timer_sync(), timer->running and mod_timer() returning value are all that I was able to do. Alexey From owner-netdev@oss.sgi.com Thu May 11 06:27:14 2000 Received: by oss.sgi.com id ; Thu, 11 May 2000 13:27:05 +0000 Received: from minus.inr.ac.ru ([193.233.7.97]:13071 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 11 May 2000 13:26:54 +0000 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id RAA00573; Thu, 11 May 2000 17:26:27 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005111326.RAA00573@ms2.inr.ac.ru> Subject: Re: IPv4_MAPPED destination addresses To: sogor@mars.arts.u-szeged.hu (Sogor Laszlo) Date: Thu, 11 May 2000 17:26:27 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Sogor Laszlo" at May 10, 0 09:13:11 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 955 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > > > that is not IPv6-capable. In addition to its use in the API this protocol > > > uses IPv4-mapped addresses in the IPv6 packets to refer to an IPv4 node. After thinking a bit I understood one evident thing eventually. (I apologize that process took so long time 8)) SIIT is absolutely incompatible with standard transition mechanisms and cannot co-exist with them on one machine and even on one link. And it is not negative, it is quite positive statement. Essentially, you have to create new compile time option (CONFIG_IPV6_ONLY), grep kernel source and delete _ALL_ the places, where mapped IPv6 addresses are checked or created. In SIIT mode mapped addresses have nothing special. After this all the things will work. Please, send me patch after you will make this. I am not sure that it will be ever included to kernel (SIIT is too semantically broken to be takeseriously), but at least we will have something to include. Alexey From owner-netdev@oss.sgi.com Thu May 11 10:12:00 2000 Received: by oss.sgi.com id ; Thu, 11 May 2000 17:11:40 +0000 Received: from imail.knoware.nl ([195.64.48.18]:65032 "HELO imail.knoware.nl") by oss.sgi.com with SMTP id ; Thu, 11 May 2000 17:11:22 +0000 Received: from bobo.knoware.nl (bobo.knoware.nl [195.64.36.76]) by imail.knoware.nl (Postfix) with ESMTP id D4E2DBD610 for ; Thu, 11 May 2000 19:11:19 +0200 (CEST) Received: from starbreeze.knoware.nl (starbreeze.knoware.nl [195.64.36.7]) by bobo.knoware.nl (Postfix) with ESMTP id 7210C46A8 for ; Thu, 11 May 2000 19:11:19 +0200 (CEST) Date: Thu, 11 May 2000 19:11:19 +0200 (CEST) From: Spark X-Sender: hugo@starbreeze.knoware.nl To: netdev@oss.sgi.com Subject: Linux and IPv6 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi everybody, I'm Hugo and i'm working for an ISP in the Netherlands. Recently i decided to delve into IPv6 and started experimenting with the Linux IPv6 stuff like net-tools ipv6 and inet6-apps. The net result of my attempts are that i can see an ipv6 address on my eth0 interface, but i can't even ping it.. ping (from net-tool ipv6) complains about not finding icmp6. Mainly i'm trying to find some info like a quickstart guide to start with IPv6 on a test LAN and later to connect to 6bone. Can anybody help me with this? The linux kernel just provided this mailinglist as a place to find things out about IPv6. I also read the IPv6 HOWTO at http://www.bieringer.de/linux/IPv6/.. but i think i'm still missing certain things here since i'm a newbie to IPv6.. The distro i'm using is RedHat 6.2, most machines are just default installs with the packaged kernel, one of my own workstations is running the 2.3.99-pre3 kernel. Thanks! Hugo -------------------------------------------------------------- "That i'm paranoid doesn't mean they aren't out to get me!" -------------------------------------------------------------- Hugo Trippaers email personal: spark@knoware.nl System Engineer (RHCE) email @ work : trippaers@knoware.nl Knoware B.V. www : http://www.knoware.nl From owner-netdev@oss.sgi.com Thu May 11 10:35:00 2000 Received: by oss.sgi.com id ; Thu, 11 May 2000 17:34:50 +0000 Received: from sabre-wulf.nvg.ntnu.no ([129.241.210.67]:15372 "EHLO sabre-wulf.nvg.ntnu.no") by oss.sgi.com with ESMTP id ; Thu, 11 May 2000 17:34:37 +0000 Received: from tyrell.nvg.ntnu.no ([IPv6:::ffff:129.241.210.70]:1796 "EHLO tyrell.nvg.ntnu.no" ident: "TIMEDOUT2" whoson: "-unregistered-") by sabre-wulf.nvg.ntnu.no with ESMTP id ; Thu, 11 May 2000 19:34:10 +0200 Received: (from venaas@localhost) by tyrell.nvg.ntnu.no (8.9.3/8.8.4) id TAA01284; Thu, 11 May 2000 19:34:00 +0200 From: Date: Thu, 11 May 2000 19:33:59 +0200 To: Spark Cc: netdev@oss.sgi.com Subject: Re: Linux and IPv6 Message-ID: <20000511193359.A1231@nvg.ntnu.no> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from spark@knoware.nl on Thu, May 11, 2000 at 07:11:19PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, May 11, 2000 at 07:11:19PM +0200, Spark wrote: > I'm Hugo and i'm working for an ISP in the Netherlands. Recently i decided > to delve into IPv6 and started experimenting with the Linux IPv6 stuff > like net-tools ipv6 and inet6-apps. The net result of my attempts are > that i can see an ipv6 address on my eth0 interface, but i can't even ping > it.. ping (from net-tool ipv6) complains about not finding icmp6. That one is easy, just add icmp6 58 ICMP6 # ICMPv6 to /etc/protocols Stig -- Duct tape is like the force. It has a light side, and a dark side, and it holds the universe together ... -- Carl Zwanzig From owner-netdev@oss.sgi.com Thu May 11 12:27:22 2000 Received: by oss.sgi.com id ; Thu, 11 May 2000 19:27:03 +0000 Received: from imail.knoware.nl ([195.64.48.18]:25605 "HELO imail.knoware.nl") by oss.sgi.com with SMTP id ; Thu, 11 May 2000 19:26:56 +0000 Received: from bobo.knoware.nl (bobo.knoware.nl [195.64.36.76]) by imail.knoware.nl (Postfix) with ESMTP id 593D1BD68F; Thu, 11 May 2000 21:26:51 +0200 (CEST) Received: from sunbeam.spark.knoware.nl (sunbeam.spark.knoware.nl [195.64.36.194]) by bobo.knoware.nl (Postfix) with ESMTP id 6CA9846A7; Thu, 11 May 2000 21:26:49 +0200 (CEST) Date: Thu, 11 May 2000 21:25:19 +0200 (CEST) From: Spark X-Sender: spark@sunbeam.spark.knoware.nl To: venaas@nvg.ntnu.no Cc: netdev@oss.sgi.com Subject: Re: Linux and IPv6 In-Reply-To: <20000511193359.A1231@nvg.ntnu.no> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Thanks Stig! It works now.. Strange enough the 58 entry was in /etc/protocols, but with the wrong name: icmpv6 58 ICMPV6 # Internet Control Message Protocol version 6 Greetings, Hugo On Thu, 11 May 2000 venaas@nvg.ntnu.no wrote: > On Thu, May 11, 2000 at 07:11:19PM +0200, Spark wrote: > > I'm Hugo and i'm working for an ISP in the Netherlands. Recently i decided > > to delve into IPv6 and started experimenting with the Linux IPv6 stuff > > like net-tools ipv6 and inet6-apps. The net result of my attempts are > > that i can see an ipv6 address on my eth0 interface, but i can't even ping > > it.. ping (from net-tool ipv6) complains about not finding icmp6. > > That one is easy, just add > > icmp6 58 ICMP6 # ICMPv6 > > to /etc/protocols > > Stig > > -- > Duct tape is like the force. It has a light side, and a dark side, and > it holds the universe together ... > -- Carl Zwanzig > From owner-netdev@oss.sgi.com Thu May 11 12:34:53 2000 Received: by oss.sgi.com id ; Thu, 11 May 2000 19:34:33 +0000 Received: from mea.tmt.tele.fi ([194.252.70.162]:2460 "EHLO mea.tmt.tele.fi") by oss.sgi.com with ESMTP id ; Thu, 11 May 2000 19:34:17 +0000 Received: (mea@mea.tmt.tele.fi) by mea.tmt.tele.fi id ; Thu, 11 May 2000 22:34:06 +0300 Date: Thu, 11 May 2000 22:34:06 +0300 From: Matti Aarnio To: netdev@oss.sgi.com Subject: Multiple default routes at multihomed machine ? Message-ID: <20000511223406.B829@mea.tmt.tele.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I have machines which I would like to multihome - attach to several different networks. Doing that isn't quite trivial, as we run IP source address verification at edges, and these two (or more) machine interfaces end up at different allowed sets. These machine do *not* route in between interfaces. Used networks are also firewalled with statefull FW-1 (and/or some similar), so that reply packets going wrong way will fail. That is: eth0: A.B.1.2/24 gw A.B.1.1 eth1: A.B.2.2/24 gw A.B.2.1 eth2: A.B.3.2/24 gw A.B.3.1 def-gw A.B.1.1 Locally initiated connections/streams go out via default-gw, but incoming ones should send packets back via gateways related to the interface they came in from. TCP connections are bound on interface address (or alias), so finding return path should be trivial ? UDP servers bound to fully specified interface addresses can also find return paths, and ubound wild-cards do get what they deserve.. Can this be done already with tools like ANK's IP, or do we need some kernel development ? /Matti Aarnio From owner-netdev@oss.sgi.com Sun May 14 03:25:01 2000 Received: by oss.sgi.com id ; Sun, 14 May 2000 10:24:41 +0000 Received: from tunnel.bieringer.de ([195.226.187.50]:41490 "EHLO convert rfc822-to-8bit tunnel.bieringer.de") by oss.sgi.com with ESMTP id ; Sun, 14 May 2000 10:24:17 +0000 Received: (from peter@localhost) by tunnel.bieringer.de (8.9.3/8.9.3) id MAA24928; Sun, 14 May 2000 12:24:14 +0200 Message-Id: <3.0.6.32.20000514122646.00836100@mail.bieringer.de> X-URL: http://www.bieringer.de/pb/ X-Sender: peter@mail.bieringer.de X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32) Date: Sun, 14 May 2000 12:26:46 +0200 To: ric@tbit.dk, netdev@oss.sgi.com From: Peter Bieringer Subject: Re: Q: 2.2.15 default behavior for IPv4 ... (solved) In-Reply-To: References: <3.0.6.32.20000505224541.008d3940@mail.bieringer.de> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8BIT Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing At 13:28 08.05.2000 +0200, Richard Jørgensen wrote: >> perhaps someone could update me, how a Linux kernel select the source IPv4 >> address on ICMP/UDP. > >I haven't read the kernel source, but I might be able to help anyway. > >> One PC with one Ethernet interface >> >> Basic IP is: >> eth0: x.y.z.62 >> Also I defined some aliases: >> eth0:0 x.y.z.61 >[...] >> ping [...] should have source IP address of "eth0" (*.62). > >My experience with using aliases is that the source address is based >on the routing table: >"route add -host x.y.x.t" will cause "ping x.y.z.t" to have source >address x.y.z.62 whereas >"route add -host x.y.x.t dev eth0:0" will cause "ping x.y.z.t" to have >source address x.y.z.61 > >The aliases you create will automatically be added to the routing, so >if you use several ip-adresses belonging to the same net, pinging a >host on that net will use the last alias you defined as source address. > >Note: The routing table will show Iface = eth0 regardless of whether >it is eth0, eth0:0, eth0:1, ... Good hint, this solved the problem. You're right, I got about 4 default route entries after setting 1 basic and 3 alias IPs. If I've removed 3 of them, the first IP address set (eth0) is used on outgoing packets. Because of on 2 other similar RedHat 6.2 installations this won't occur, I've looked for diffs and found one: /etc/sysconfig/network must contain: GATEWAYDEV="eth0" Afterwards, only one default route is set up, not per each alias. Thanks for helping! Peter From owner-netdev@oss.sgi.com Sun May 14 17:25:12 2000 Received: by oss.sgi.com id ; Mon, 15 May 2000 00:24:51 +0000 Received: from studserv.stud.uni-hannover.de ([130.75.176.2]:58288 "EHLO studserv.stud.uni-hannover.de") by oss.sgi.com with ESMTP id ; Mon, 15 May 2000 00:24:33 +0000 Received: from mindless.com (a057.home.uni-hannover.de [130.75.232.57]) by studserv.stud.uni-hannover.de (8.9.3/8.9.3/1) with ESMTP id CAA05196; Mon, 15 May 2000 02:24:23 +0200 (MET DST) Message-ID: <391F3E57.F17F4649@mindless.com> Date: Mon, 15 May 2000 02:01:35 +0200 From: Bernd Kischnick X-Mailer: Mozilla 4.5 (Macintosh; I; PPC) X-Accept-Language: de,en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: linux-net@vger.rutgers.edu, linux-kernel@vger.rutgers.edu Subject: PATCH: drivers/net/ncr885e.c Content-Type: multipart/mixed; boundary="------------C6AD24CAE5415C2363570D5C" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. --------------C6AD24CAE5415C2363570D5C Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi all, this patch for 2.3.99-pre8 fixes a compilation error. - Bernd --------------C6AD24CAE5415C2363570D5C Content-Type: text/plain; charset=us-ascii; x-mac-type="54455854"; x-mac-creator="522A6368"; name="ncr885e.patch" Content-Transfer-Encoding: 7bit Content-Description: Unknown Document Content-Disposition: inline; filename="ncr885e.patch" diff -ru linux/drivers/net/ncr885e.c my/linux/drivers/net/ncr885e.c --- linux/drivers/net/ncr885e.c Fri Mar 3 07:46:07 2000 +++ my/linux/drivers/net/ncr885e.c Sun May 14 02:31:12 2000 @@ -1393,13 +1393,7 @@ #endif /* NCR885E_DEBUG_MII */ -int -init_module(void) -{ - return ncr885e_probe(); -} - -static void __exit cleanup_module(void) +static void __exit ncr885e_cleanup(void) { struct ncr885e_private *np; --------------C6AD24CAE5415C2363570D5C-- From owner-netdev@oss.sgi.com Mon May 15 09:03:50 2000 Received: by oss.sgi.com id ; Mon, 15 May 2000 16:03:31 +0000 Received: from mail.gmx.net ([194.221.183.63]:61000 "HELO mail3.gmx.net") by oss.sgi.com with SMTP id ; Mon, 15 May 2000 16:03:16 +0000 Received: (qmail 7190 invoked by uid 0); 15 May 2000 16:03:10 -0000 Received: from p3e9e0377.dip.t-dialin.net (HELO flux.local) (62.158.3.119) by mail.gmx.net with SMTP; 15 May 2000 16:03:10 -0000 Received: from thomas by flux.local with local (Exim 3.13 #1) id 12rNPJ-00008s-00 for netdev@oss.sgi.com; Mon, 15 May 2000 18:07:49 +0200 Date: Mon, 15 May 2000 18:07:49 +0200 From: Thomas Moestl To: netdev@oss.sgi.com Subject: ICMPv6 Echo Reply bug in Linux 2.2.15 ? Message-ID: <20000515180749.A479@flux.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi! I was building a ping implementation for IPv6, when I noticed a strange thing: When I send an ICMP echo request to the link- local address of my network adaptor (the one that is automatically assigned to an eth? interface), I get an ICMP echo reply (with matching id and seq) from ::1, and no reply from the address I pinged. This is probably not very important, but it surely is not compliant to rfc 2463, that states that the reply must come frome the address the Echo Request went to. Has anybody any idea where the problem is? Here is my configuration: ---------- ip addr ls: 1: lo: mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host 2: sit0@NONE: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 3: eth0: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:00:86:39:0c:b0 brd ff:ff:ff:ff:ff:ff inet 192.168.0.89/24 brd 192.168.0.255 scope global eth0 inet6 fe80::200:86ff:fe39:cb0/10 scope link ---------- ip route ls: 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.89 127.0.0.0/8 dev lo scope link default via 192.168.0.3 dev eth0 metric 1 ---------- ip link ls: 1: lo: mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: sit0@NONE: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 3: eth0: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:00:86:39:0c:b0 brd ff:ff:ff:ff:ff:ff ---------- The IP that I pinged was fe80::200:86ff:fe39:cb0 in this case. I tested this on a network without any IPv6 routers. If somebody needs it, I can post some code. Thomas -- ------------------------------------------------------------------------------- Thomas Moestl --- http://home.t-online.de/home/Moestl/ gpg/pgp key fingerprint: 6011 FAD1 73FF 775F 052F A022 A813 81AE CFE6 C8BB From owner-netdev@oss.sgi.com Tue May 16 13:46:32 2000 Received: by oss.sgi.com id ; Tue, 16 May 2000 20:46:13 +0000 Received: from 221.juhasz.rlab.telia.net ([192.36.218.221]:32436 "EHLO 221.juhasz.rlab.telia.net") by oss.sgi.com with ESMTP id ; Tue, 16 May 2000 20:46:05 +0000 Received: from b221 ([127.0.0.1] helo=sch.bme.hu ident=cell) by 221.juhasz.rlab.telia.net with esmtp (Exim 3.12 #1 (Debian)) id 12roL7-0003M0-00 for ; Tue, 16 May 2000 22:53:17 +0200 Message-ID: <3921B53C.D0C0A237@sch.bme.hu> Date: Tue, 16 May 2000 22:53:16 +0200 From: Marcell GAL X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.3.41 i686) X-Accept-Language: hu, en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Re: PATCH 2.2.14 net/core/dev.c Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi Guys, Having many ATM VCs, each having its own ethernet segment inside I am exactly in the same situation as the VLAN guys. (the kernel code for rfc2684/~1483/ bridged encapsulation is coming out soon, those who want an early look /perhaps audit/, please write to me) My users want many interfaces, they want to ifconfig them, see counters, set MAC addresses, they want whatever they do with their physical ethernet interfaces. Several hundred interfaces per physical ATM card. (still less than 409x VLANs though :) I tried hard to follow Alexey's hints ("Think!"). I still do not know a better way than using the _existing_ tools (eg. ifconfig) for exactly what they were written for. If a user chooses to have a longer list of interfaces than her terminal can handle ;) I do not see a reason to restrict her. Besides having a longer (but acceptable for 99%) creation and listing time for interfaces the only remarkable thing was /proc/sys/net/ipv4/conf. (in our case only 145 entries were accessible, so we had to use the conf/default directory -ugly,eh?- before creating the devices and forget about changing those values thereafter) I do not talk for patching dev_alloc_name() , I used my own name allocation (like Werner's net/atm/clip.c). I am just curious how I am supposed to 'split the level' and 'hide' those interfaces and still have the comfort of ifconfig and route, sorry, I mean ip. All this with a reasonable amount of work and a maintainable set of patches so there migth be a slight chance of getting into 2.4 Actually some users create a large number of interfaces with the same address and netmask, and have many route entries so that the kernel chooses the right one (and the kernel handles it right and efficiently; thx, Alexey). In this case it seems more reasonable to make one interface and attach several ATM VCs to it and have a small in-device routing (or we can call it perm-arp) to choose among the VCs. However this raises proxy-arp issues; (we have to proxy-arp because the segments in different VCs do not otherwise see each other). This is possible to solve, but not in a nice way. (now that they are different interfaces the generic arp just solves this...) Sorry to stir this again, maybe I just want to hear that large number of traditional interfaces is OK for now until we get back to this at 2.5 ---- and now something completely different (not the larch) - brdev->stats.tx_packets++; - brdev->stats.tx_bytes += skb->len; + atomic_inc (&brdev->stats.tx_packets); + atomic_add (skb->len, &brdev->stats.tx_bytes) I know that for SMP the atomic is nicer but for statistics the simpler should do just fine (and I see the simple one more often in the kernel, but it might be just inheritance.) For now I'll define dev_stats_inc() and dev_stats_add() macros so that this decision can be done later, but I am interested in the official policy. Thank you, Cell From owner-netdev@oss.sgi.com Tue May 16 18:02:40 2000 Received: by oss.sgi.com id ; Wed, 17 May 2000 01:02:19 +0000 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:54542 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Wed, 17 May 2000 01:02:04 +0000 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.9.3) with ESMTP id SAA07871; Tue, 16 May 2000 18:36:32 -0700 Message-ID: <3921F79F.62BC95CB@candelatech.com> Date: Tue, 16 May 2000 18:36:31 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) X-Accept-Language: en MIME-Version: 1.0 To: Marcell GAL , "netdev@oss.sgi.com" Subject: Re: PATCH 2.2.14 net/core/dev.c [Many interfaces issue] References: <3921B53C.D0C0A237@sch.bme.hu> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Marcell GAL wrote: > > Hi Guys, > > Having many ATM VCs, each having its own ethernet segment inside > I am exactly in the same situation as the VLAN guys. > (the kernel code for rfc2684/~1483/ bridged encapsulation is coming out > soon, > those who want an early look /perhaps audit/, please write to me) Does that mean we will be able to do cool things like bridge a PVC to/from a VLAN? > My users want many interfaces, they want to ifconfig them, see counters, > set MAC addresses, they want whatever they do with their physical > ethernet interfaces. Several hundred interfaces per physical ATM card. > (still less than 409x VLANs though :) Heh, if you bridged a single PVC to a single VLAN (ie PVCoE), then you could have twice as many interfaces!! :) > ;) I do not see a reason to restrict her. > Besides having a longer (but acceptable for 99%) creation and listing > time for interfaces > the only remarkable thing was /proc/sys/net/ipv4/conf. > (in our case only 145 entries were accessible, so we had to use the > conf/default directory > -ugly,eh?- before creating the devices and forget about changing those > values thereafter) Why is the limit 145...is that how many can fit into a single 4k 'block' in the proc FS? > Sorry to stir this again, maybe I just want to hear that large number of > traditional interfaces is OK for now until we get back to this at 2.5 If you, or anyone else, can think up a scheme that fixes this perceived problem, please let the list or me know so I can consider changes to the VLAN code too... Enjoy, Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu May 18 07:05:41 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 14:05:21 +0000 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:27087 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Thu, 18 May 2000 14:05:02 +0000 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Thu, 18 May 2000 09:01:37 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCMTN6PR; Thu, 18 May 2000 09:04:25 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY5680G; Fri, 19 May 2000 00:04:25 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id AAA22457; Fri, 19 May 2000 00:04:25 +1000 Message-ID: <3923F8CD.AECBDA6D@uow.edu.au> Date: Fri, 19 May 2000 00:06:05 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <391A19F6.5C72A6E0@uow.edu.au> from "Andrew Morton" at May 11, 0 02:24:54 am <200005111309.RAA32574@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > Hello! > > > Why does the handler have to call timer_exit() at all? > > > > Could we not clear timer->running in run_timer_list()? That would > > certainly protect us from the problem you identify... > > Timers are self-destructable as rule. See? Normal usage > for timer is to have it allocated inside an object and > timer event detroys the object together with timer. > In this case we have to use refcounts external to timer > to avoid races. > > Actually, existing *_timer primitives are very inconvenient. > And I did not find any good way to improve them. Essentially, > del_timer_sync(), timer->running and mod_timer() returning > value are all that I was able to do. > I think there's a race in the timer code at present: int del_timer_sync(struct timer_list * timer) { int ret = 0; for (;;) { unsigned long flags; int running; spin_lock_irqsave(&timerlist_lock, flags); ** The timer handler could be running now. It can delete the timer and kfree it, or reuse its memory for something else, or turn it into a semantically different timer ** ret += detach_timer(timer); timer->list.next = timer->list.prev = 0; ** uh-oh ** Still, my immediate concern is this: I'll be spending the next working through the old net drivers. One very common theme/bug in these is the pattern: xxx_close() { ... del_timer(); release(some_resources); ... } xxx_timer() { use(some_resources); } These need to be turned into del_timer_sync()'s [1]. This means I have to add timer_exit() calls as well. I'd like you to confirm that we cannot move the timer_exit() funtionality into run_timer_list(): static inline void run_timer_list(void) { ... timer->list.next = timer->list.prev = NULL; timer_set_running(timer); spin_unlock_irq(&timerlist_lock); fn(data); + timer->running = 0; spin_lock_irq(&timerlist_lock); goto repeat; Are you saying that we can't do this because we should not touch the timer after its handler has executed? (Even though del_timer_sync can do this - see above :)) [1]: del_timer_sync() is only needed if the driver is to support SMP. With a lot of the old ISA drivers this is a lost cause. The probability of breaking the driver for UP is sufficiently high, and the usefulness of making it SMP-aware is sufficiently low that we should simply say #ifdef CONFIG_SMP #error This driver does not support SMP #endif I'll be doing this if the SMP fixes look risky. -- -akpm- From owner-netdev@oss.sgi.com Thu May 18 07:31:41 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 14:31:21 +0000 Received: from adsl-151-196-242-17.bellatlantic.net ([151.196.242.17]:49658 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 18 May 2000 14:31:07 +0000 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id KAA08226; Thu, 18 May 2000 10:34:02 -0400 Date: Thu, 18 May 2000 10:34:02 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Andrew Morton cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <3923F8CD.AECBDA6D@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 19 May 2000, Andrew Morton wrote: > kuznet@ms2.inr.ac.ru wrote: > > Timers are self-destructable as rule. See? Normal usage > > for timer is to have it allocated inside an object and > > timer event detroys the object together with timer. I'm curious -- what code does this? > Still, my immediate concern is this: > > I'll be spending the next working through the > old net drivers. One very common theme/bug in these is the pattern: > > xxx_close() > { > ... > del_timer(); > release(some_resources); > ... > } > > xxx_timer() > { > use(some_resources); > } I don't see the semantic problem here. This was the recommended way to use the timer routines. If the semantics have changed, there should be new names for the changed semantics. It would be useful to distinguish between "bugs" and "interfaces changes that have made the following no longer correct since version X.Y.Z". > With a lot of the old ISA drivers this is a lost cause. The probability > of breaking the driver for UP is sufficiently high, and the usefulness > of making it SMP-aware is sufficiently low that we should simply say > > #ifdef CONFIG_SMP > #error This driver does not support SMP > #endif Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Thu May 18 08:11:42 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 15:11:22 +0000 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:33001 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Thu, 18 May 2000 15:11:06 +0000 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Thu, 18 May 2000 10:05:39 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCMTN9HY; Thu, 18 May 2000 10:08:27 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY569AG; Fri, 19 May 2000 01:08:28 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id BAA22856 for ; Fri, 19 May 2000 01:08:29 +1000 Message-ID: <392407D4.BE586507@uow.edu.au> Date: Fri, 19 May 2000 01:10:12 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: "netdev@oss.sgi.com" Subject: Tx queueing Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing A number of drivers do this: start_xmit() { netif_stop_queue() ... if (room for another packet) netif_wake_queue() ... } I suspect this is a simple port from the dev->tbusy days. It would seem to be more sensible to do start_xmit() { ... if (!room for another packet) netif_stop_queue() } but the functional difference here is that we are no longer scheduling another BH run, so if there are additional packets queued "up there" then their presentation to the driver will be delayed until **this CPU** makes another BH run. For devices which have a Tx packet ring or decent FIFO I don't expect this to be a problem, because the Tx ISR will call netif_wake_queue() and the subsequent BH run will keep stuffing packets into the Tx ring until it's full. But for devices which have very limited Tx buffering there may be a lost opportunity to refill the Tx buffer earlier. Seems unlikely to me. Do we see a problem with the above approach? Or is the benefit so small that it's not worth bothering about? Incidentally, Alexey. You should change int cpu = smp_processor_id(); to const int cpu = smp_processor_id(); This allows GCC to generate _much_ better code on UP. It only saves 50-60 insns in dev.c, but it's free... Also, I'm still attracted to the idea of dequeueing packets within the driver (the 'pull' model) rather than stuffing them in via qdisc_restart() and the BH callback. A while back Don said: > The BSD stack uses the scheme of dequeuing packets in the ISR. This was a > good design in the VAX days, and with primative hardware that handled only > single packets. But it has horrible cache behavior, needs an extra lock, > and can result the interrupt service routine running a very long time, > blocking interrupts. I never understood the point about cache behaviour. Perhaps he was referring to the benefit which a sequence of short loops has over a single, long loop? And nowadays we only block interrupts for this device (or things on this device's IRQ?). One advantage which the 'pull' model has is with CPU/NIC bonding. AFAIK, the only way at present of bonding a NIC to a CPU is via the IRQ. This is fine for the ISR and the BH callback, but at present the direct userland->socket->qdisc->driver path will be executed on a random CPU. Moving some of this into the ISR will make bonding more effective. Or teach qdisc_restart() to simply queue packets and rely on the CPU-specific softnet callback to do the transmit. Probably doesn't make much diff. Of course, all this is simply noise without benchmarks... Has anyone done any serious work with NIC/CPU bonding? -- -akpm- From owner-netdev@oss.sgi.com Thu May 18 09:21:02 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 16:20:43 +0000 Received: from web118.yahoomail.com ([205.180.60.99]:64774 "HELO web118.yahoomail.com") by oss.sgi.com with SMTP id ; Thu, 18 May 2000 16:20:32 +0000 Received: (qmail 7230 invoked by uid 60001); 18 May 2000 16:20:30 -0000 Message-ID: <20000518162030.7229.qmail@web118.yahoomail.com> Received: from [156.153.255.126] by web118.yahoomail.com; Thu, 18 May 2000 09:20:30 PDT Date: Thu, 18 May 2000 09:20:30 -0700 (PDT) From: Cacophonix Subject: Re: Tx queueing To: Andrew Morton Cc: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing --- Andrew Morton wrote: > > Of course, all this is simply noise without benchmarks... > netperf aggregate outbound tcp thruput (Mb/s) between linux-linux and linux-bsd, using a single GA620 2.2.16-1 => 1cpu = 496, 2cpu = 458 2.3.99-pre6 => 1cpu = 568, 2cpu = 626 freebsd 4.0 => 1cpu = 365, 2cpu = 373 __________________________________________________ Do You Yahoo!? Send instant messages & get email alerts with Yahoo! Messenger. http://im.yahoo.com/ From owner-netdev@oss.sgi.com Thu May 18 09:35:33 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 16:35:13 +0000 Received: from adsl-151-196-242-17.bellatlantic.net ([151.196.242.17]:32508 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 18 May 2000 16:34:50 +0000 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id MAA09575; Thu, 18 May 2000 12:37:53 -0400 Date: Thu, 18 May 2000 12:37:53 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Andrew Morton cc: "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: <392407D4.BE586507@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 19 May 2000, Andrew Morton wrote: > A number of drivers do this: > > start_xmit() > { > netif_stop_queue() > ... > if (room for another packet) > netif_wake_queue() > ... > } > > I suspect this is a simple port from the dev->tbusy days. Yes, it's likely from a blind search-and-substitute. Using stop_queue() and wake_queue() in the transmit path is part of the reason why the 2.3 changes are bogus. It is not possible, with these semantics, to write a driver that works well in both pre-2.3 and 2.3. > It would seem to be more sensible to do > > start_xmit() > { > ... > if (!room for another packet) > netif_stop_queue() > } This doesn't give us a way to set dev->tbusy, which is required for all pre-2.3 kernels. > Also, I'm still attracted to the idea of dequeueing packets within the > driver (the 'pull' model) rather than stuffing them in via > qdisc_restart() and the BH callback. A while back Don said: > > > The BSD stack uses the scheme of dequeuing packets in the ISR. This was a > > good design in the VAX days, and with primative hardware that handled only > > single packets. But it has horrible cache behavior, needs an extra lock, > > and can result the interrupt service routine running a very long time, > > blocking interrupts. > > I never understood the point about cache behaviour. Perhaps he was > referring to the benefit which a sequence of short loops has over a > single, long loop? There are several issues. Here are a few. A 'pull' semantic has the code tracing everywhere, with the obvious bad cache impact. You need both a 'pull' transmit routine and a 'push' transmit routine. > And nowadays we only block interrupts for this > device (or things on this device's IRQ?). Perhaps all on that IRQ. But you didn't really need to service the SCSI controller or the other network interfaces, did you? > Has anyone done any serious work with NIC/CPU bonding? The Mindcraft "benchmark" is superficially obvious, but the big network difference was that they were apparently using the TCP/IP checksum hardware on the i82559. This has far more effect on SMP performance than anything else that was done. We didn't even find out that the chip had the feature until months later, and still don't have the documentation on how to use it. Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Thu May 18 10:02:53 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 17:02:43 +0000 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:30138 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Thu, 18 May 2000 17:02:35 +0000 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch1.nortel.com; Thu, 18 May 2000 12:00:23 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCMT3CRG; Thu, 18 May 2000 11:59:58 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY569BZ; Fri, 19 May 2000 02:59:58 +1000 Received: from uow.edu.au (IDENT:akpm@[47.152.41.30]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id CAA26775; Fri, 19 May 2000 02:59:54 +1000 Message-ID: <392421F1.249E284F@uow.edu.au> Date: Fri, 19 May 2000 03:01:37 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Donald Becker CC: "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <392407D4.BE586507@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Donald Becker wrote: > > ... > > > > start_xmit() > > { > > ... > > if (!room for another packet) > > netif_stop_queue() > > } > > This doesn't give us a way to set dev->tbusy, which is required for all > pre-2.3 kernels. Burn them boats :) I've been looking for a compatibility header which maps netif_stop_queue() into dev->tbusy manipulation, but I can't find one. Have I missed something? > ... > > Perhaps all on that IRQ. But you didn't really need to service the SCSI > controller or the other network interfaces, did you? :) Think of the cache impact! > > Has anyone done any serious work with NIC/CPU bonding? > > The Mindcraft "benchmark" is superficially obvious, but the big network > difference was that they were apparently using the TCP/IP checksum hardware > on the i82559. This has far more effect on SMP performance than anything > else that was done. We didn't even find out that the chip had the feature > until months later, and still don't have the documentation on how to use > it. Does it make that much difference? When we discussed this starting April 24 the consensus seemed to be that the overhead of s/w checksumming is in the noise floor when combined with the copy. Anyway, I have religious objections to hardware IP checksums. I've seen several instances where they would have failed to detect errors which software checksumming would detect: - cs89x0 with Rx DMA on a too-fast EISA bus was dropping bits when DMAing into main memory. If the checksum had been calculated on the NIC, it would have said "OK", while the in-memory packet was corrupted. - 3c59x bug (I forget what is was, but DMA ponters were getting changed at the wrong time) causing in-memory corruption. Again, the NIC would have told us it was OK. Layer 3 checksums are there to detect layer 3 errors. Ergo, they should be put at the top of layer 3, not at the top of L2. -- -akpm- From owner-netdev@oss.sgi.com Thu May 18 11:05:43 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 18:05:34 +0000 Received: from adsl-151-196-242-17.bellatlantic.net ([151.196.242.17]:54780 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Thu, 18 May 2000 18:05:18 +0000 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id OAA10224; Thu, 18 May 2000 14:08:20 -0400 Date: Thu, 18 May 2000 14:08:20 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Andrew Morton cc: "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: <392421F1.249E284F@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Fri, 19 May 2000, Andrew Morton wrote: > I've been looking for a compatibility header which maps > netif_stop_queue() into dev->tbusy manipulation, but I can't find one. > Have I missed something? David Hinds has compatibility macros. But he can be more pragmatic about the network drivers than I am. It only has to work well enough for his PC Card system, it doesn't have to be The Right Thing. > > Perhaps all on that IRQ. But you didn't really need to service the SCSI > > controller or the other network interfaces, did you? > > :) Think of the cache impact! And those damn user programs are always flushing *my* cache lines. We should just kill them off so that the network drivers can operate unmolested. > > > Has anyone done any serious work with NIC/CPU bonding? > > > > The Mindcraft "benchmark" is superficially obvious, but the big network > > difference was that they were apparently using the TCP/IP checksum hardware > > on the i82559. This has far more effect on SMP performance than anything > > else that was done. We didn't even find out that the chip had the feature > > until months later, and still don't have the documentation on how to use > > it. > > Does it make that much difference? When we discussed this starting > April 24 the consensus seemed to be that the overhead of s/w > checksumming is in the noise floor when combined with the copy. Tx checksumming vs. Rx checksumming. Checksumming is almost free if you are in the midst of doing the copying anyway. If you Rx checksum on one processor, and use the data on another processor, it is astonishingly costly. > Anyway, I have religious objections to hardware IP checksums. As do I, as I have stated in years past, but... Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Thu May 18 11:25:34 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 18:25:23 +0000 Received: from laurin.munich.netsurf.de ([194.64.166.1]:34282 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Thu, 18 May 2000 18:25:05 +0000 Received: from fred.muc.de (none@ns1241.munich.netsurf.de [195.180.235.241]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id UAA00018; Thu, 18 May 2000 20:25:00 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12sUi7-0007Sp-00; Thu, 18 May 2000 20:07:51 +0200 Date: Thu, 18 May 2000 20:07:51 +0200 From: Andi Kleen To: Donald Becker Cc: netdev@oss.sgi.com Subject: Re: Tx queueing Message-ID: <20000518200751.A28688@fred.muc.de> References: <392407D4.BE586507@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: ; from Donald Becker on Thu, May 18, 2000 at 06:36:32PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, May 18, 2000 at 06:36:32PM +0200, Donald Becker wrote: > > Has anyone done any serious work with NIC/CPU bonding? > > The Mindcraft "benchmark" is superficially obvious, but the big network > difference was that they were apparently using the TCP/IP checksum hardware > on the i82559. This has far more effect on SMP performance than anything > else that was done. We didn't even find out that the chip had the feature > until months later, and still don't have the documentation on how to use > it. The Intel e100 driver implicitely documents it in C source. They unfortunately use a very ugly method to implement it (own UDP/IP header parsing instead of CHECKSUM_HW). I was even porting it to your driver, until I discovered that my eepro100 had a far too old chip to implement it (i82557 stepping 1). BTW, newer eepro100 driver versions like to often flood the network (collision light straight red) with bogus packets. The interrupts triggered on the other system on the same segment lead to near solid livelock. That didn't happen with the older driver in straight 2.2.13. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Thu May 18 15:57:24 2000 Received: by oss.sgi.com id ; Thu, 18 May 2000 22:57:04 +0000 Received: from ertpg14e1.nortelnetworks.com ([47.234.0.35]:14736 "EHLO ertpg14e1.nortelnetworks.com") by oss.sgi.com with ESMTP id ; Thu, 18 May 2000 22:56:50 +0000 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by ertpg14e1.nortelnetworks.com; Thu, 18 May 2000 11:37:44 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJYKQF; Thu, 18 May 2000 23:37:38 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY569A6; Fri, 19 May 2000 01:37:38 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id BAA23053; Fri, 19 May 2000 01:37:38 +1000 Message-ID: <39240EA9.389AE2FC@uow.edu.au> Date: Fri, 19 May 2000 01:39:21 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Donald Becker CC: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <3923F8CD.AECBDA6D@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Donald Becker wrote: > > On Fri, 19 May 2000, Andrew Morton wrote: > > kuznet@ms2.inr.ac.ru wrote: > > > > Timers are self-destructable as rule. See? Normal usage > > > for timer is to have it allocated inside an object and > > > timer event detroys the object together with timer. > > I'm curious -- what code does this? me too. > > Still, my immediate concern is this: > > > > I'll be spending the next working through the > > old net drivers. One very common theme/bug in these is the pattern: > > > > xxx_close() > > { > > ... > > del_timer(); > > release(some_resources); > > ... > > } > > > > xxx_timer() > > { > > use(some_resources); > > } > > I don't see the semantic problem here. Well, the timer handler could be executing on another CPU _while_ the timer is being deleted. So when del_timer() returns, the handler is still executing! So in this example, CPU0 could execute use(some_resources) AFTER CPU1 has done release(some_resources). del_timer_sync() is a 2.3 function which spins until the handler tells the world that it has finished (by clearing timer->running). > This was the recommended way to use the timer routines. If the semantics > have changed, there should be new names for the changed semantics. I agree. I assume what happened was that when the timer code went SMP, the synchronous semantics of del_timer() became asynchronous but the name was unchanged. I believe what _should_ have been done when converting this to SMP was to - manage the "handler is running" flag _outside_ the handler - preserve del_timer()'s synchronous semantics (via the spin) - introduce a new function del_timer_async() which has the handler-can-still-be-running semantics. But this is a historical guess. I'm not sure if 2.2 is clean in this regard. It doesn't have del_timer_sync(). Spent ten minutes peering at 2.2, and I can't see why, if some random piece of kernel code does a del_timer() in IRQ or process context, concurrent execution of the post-del_timer() code and the handler will not occur. > It would be useful to distinguish between "bugs" and "interfaces changes > that have made the following no longer correct since version X.Y.Z". If you come into the game late enough, these are synonymous :) But you're right - it is the latter. -- -akpm- From owner-netdev@oss.sgi.com Thu May 18 19:00:59 2000 Received: by oss.sgi.com id ; Fri, 19 May 2000 02:00:50 +0000 Received: from mail.cyberus.ca ([209.195.95.1]:12473 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 19 May 2000 02:00:30 +0000 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id WAA26173; Thu, 18 May 2000 22:00:29 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id WAA14662; Thu, 18 May 2000 22:00:29 -0400 (EDT) Date: Thu, 18 May 2000 22:00:29 -0400 (EDT) From: jamal To: Andrew Morton cc: "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: <392407D4.BE586507@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing OK, Andrew, i know i am not as entertaining as some people, but i can try ... Damn Aussie! On Fri, 19 May 2000, Andrew Morton wrote: > A number of drivers do this: > > start_xmit() > { > netif_stop_queue() > ... > if (room for another packet) > netif_wake_queue() > ... > } > > I suspect this is a simple port from the dev->tbusy days. > > It would seem to be more sensible to do > > start_xmit() > { > ... > if (!room for another packet) > netif_stop_queue() > } > > but the functional difference here is that we are no longer scheduling > another BH run, so if there are additional packets queued "up there" > then their presentation to the driver will be delayed until **this CPU** > makes another BH run. > This seems fine to me. In 2.3, the device is already serialized by the txmit lock at this point. So your proposal should be fine. > For devices which have a Tx packet ring or decent FIFO I don't expect > this to be a problem, because the Tx ISR will call netif_wake_queue() To be correct: It is the invocation of the interupt handler (which could be caused by quiet a few sources other than tx completion) that forces the reclamation of the TX DMA ring descs. > and the subsequent BH run will keep stuffing packets into the Tx ring > until it's full. But for devices which have very limited Tx buffering > there may be a lost opportunity to refill the Tx buffer earlier. Seems > unlikely to me. > tx desc harvesting might be key here. Donald's drivers typically do the reclamation in the interupt path. I have tried to do it in both the tx and rx paths on the tulip (locking of course, which Donald dislikes so much ;->) by setting thresholds such that you only do it on the txmit if the number of available descs is < 1/2 of total. This greatly reduces the amount of txnobuffs. Of course this is impossible to do without locks ;-> Perhaps Donald has some words of wisdom about his choices. > > Also, I'm still attracted to the idea of dequeueing packets within the > driver (the 'pull' model) rather than stuffing them in via > qdisc_restart() and the BH callback. you will have to rewrite a _lot_ of the upper layers' code. And you will really have to prove the benefit of going to this path. What events will activate the pull? QoS will probably totaly break. > A while back Don said: > > > The BSD stack uses the scheme of dequeuing packets in the ISR. This was a > > good design in the VAX days, and with primative hardware that handled only > > single packets. But it has horrible cache behavior, needs an extra lock, > > and can result the interrupt service routine running a very long time, > > blocking interrupts. > > I never understood the point about cache behaviour. Perhaps he was > referring to the benefit which a sequence of short loops has over a > single, long loop? And nowadays we only block interrupts for this > device (or things on this device's IRQ?). > Interupt means context switch? > One advantage which the 'pull' model has is with CPU/NIC bonding. > AFAIK, the only way at present of bonding a NIC to a CPU is via the > IRQ. This is fine for the ISR and the BH callback, but at present the > direct userland->socket->qdisc->driver path will be executed on a random > CPU. Moving some of this into the ISR will make bonding more effective. > > Or teach qdisc_restart() to simply queue packets and rely on the > CPU-specific softnet callback to do the transmit. Probably doesn't make > much diff. > > Of course, all this is simply noise without benchmarks... > > Has anyone done any serious work with NIC/CPU bonding? You can do NIC CPU bonding today in 2.3 using IRQ affinity. cheers, jamal From owner-netdev@oss.sgi.com Thu May 18 19:03:19 2000 Received: by oss.sgi.com id ; Fri, 19 May 2000 02:03:00 +0000 Received: from mail.cyberus.ca ([209.195.95.1]:59833 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 19 May 2000 02:02:43 +0000 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id WAA26997; Thu, 18 May 2000 22:02:43 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id WAA14666; Thu, 18 May 2000 22:02:43 -0400 (EDT) Date: Thu, 18 May 2000 22:02:43 -0400 (EDT) From: jamal To: Cacophonix cc: netdev@oss.sgi.com Subject: Re: Tx queueing In-Reply-To: <20000518162030.7229.qmail@web118.yahoomail.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 18 May 2000, Cacophonix wrote: > netperf aggregate outbound tcp thruput (Mb/s) between linux-linux > and linux-bsd, using a single GA620 > > 2.2.16-1 => 1cpu = 496, 2cpu = 458 > 2.3.99-pre6 => 1cpu = 568, 2cpu = 626 > freebsd 4.0 => 1cpu = 365, 2cpu = 373 Amy Fong and I have done some tests with much higher throughputs on 2.3 with the GA-620. What was the h/ware you used? What was the MTU you used? cheers, jamal From owner-netdev@oss.sgi.com Thu May 18 19:14:39 2000 Received: by oss.sgi.com id ; Fri, 19 May 2000 02:14:29 +0000 Received: from mail.cyberus.ca ([209.195.95.1]:28349 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 19 May 2000 02:14:01 +0000 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id WAA00465; Thu, 18 May 2000 22:14:00 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id WAA14677; Thu, 18 May 2000 22:14:00 -0400 (EDT) Date: Thu, 18 May 2000 22:14:00 -0400 (EDT) From: jamal To: Donald Becker cc: Andrew Morton , "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, 18 May 2000, Donald Becker wrote: > Using stop_queue() and wake_queue() in the transmit path is part of the > reason why the 2.3 changes are bogus. Typically, the stop will be done on the tx path and the wakeup on the receive path. > It is not possible, with these semantics, to write a driver that works well > in both pre-2.3 and 2.3. > Well, i think backward compatibility is a philosphy that Linux gas never conformed to anyways ;-> I would say it is a strength not a weakness ;-> > This doesn't give us a way to set dev->tbusy, which is required for all > pre-2.3 kernels. > You are right. cheers, jamal From owner-netdev@oss.sgi.com Thu May 18 19:18:10 2000 Received: by oss.sgi.com id ; Fri, 19 May 2000 02:17:59 +0000 Received: from mail.cyberus.ca ([209.195.95.1]:62142 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Fri, 19 May 2000 02:17:51 +0000 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id WAA01793; Thu, 18 May 2000 22:17:51 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id WAA14681; Thu, 18 May 2000 22:17:51 -0400 (EDT) Date: Thu, 18 May 2000 22:17:51 -0400 (EDT) From: jamal To: Donald Becker cc: Andrew Morton , "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Am i overposting? yikes. Should have read everything first. On Thu, 18 May 2000, Donald Becker wrote: > On Fri, 19 May 2000, Andrew Morton wrote: > > > I've been looking for a compatibility header which maps > > netif_stop_queue() into dev->tbusy manipulation, but I can't find one. > > Have I missed something? > > David Hinds has compatibility macros. But he can be more pragmatic about > the network drivers than I am. It only has to work well enough for his PC > Card system, it doesn't have to be The Right Thing. > Jes Sorensen has probably got the cleanest interface i have seen. Look at the acenic driver. cheers, jamal From owner-netdev@oss.sgi.com Thu May 18 20:16:29 2000 Received: by oss.sgi.com id ; Fri, 19 May 2000 03:16:20 +0000 Received: from saw.sw.com.sg ([203.120.9.98]:10129 "HELO saw.sw.com.sg") by oss.sgi.com with SMTP id ; Fri, 19 May 2000 03:16:11 +0000 Received: (qmail 1798 invoked by uid 577); 19 May 2000 03:16:08 -0000 Message-ID: <20000519111608.A1774@saw.sw.com.sg> Date: Fri, 19 May 2000 11:16:08 +0800 From: Andrey Savochkin To: Andi Kleen Cc: netdev@oss.sgi.com Subject: Re: Tx queueing References: <392407D4.BE586507@uow.edu.au> <20000518200751.A28688@fred.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <20000518200751.A28688@fred.muc.de>; from "Andi Kleen" on Thu, May 18, 2000 at 08:07:51PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Thu, May 18, 2000 at 08:07:51PM +0200, Andi Kleen wrote: > The Intel e100 driver implicitely documents it in C source. They unfortunately > use a very ugly method to implement it (own UDP/IP header parsing instead > of CHECKSUM_HW). I was even porting it to your driver, until I discovered > that my eepro100 had a far too old chip to implement it (i82557 stepping 1). Could you share your code? I would like to implement it finally in the mainstream kernel when I get time. Your code may help to save a few brain-cycles :-) > > BTW, newer eepro100 driver versions like to often flood the network (collision > light straight red) with bogus packets. The interrupts triggered > on the other system on the same segment lead to near solid livelock. > > That didn't happen with the older driver in straight 2.2.13. This problem is known for 2.3.99pre kernels and was fixed about day or two ago. I'll send the patch to Linus shortly. But, if you suffer right now, grab the driver from ftp://ftp.sw.com.sg/pub/Linux/people/saw/kernel/v2.3/ ftp://ftp.swusa.com/pub/Linux/people/saw/kernel/v2.3/ Best regards Andrey From owner-netdev@oss.sgi.com Thu May 18 22:21:29 2000 Received: by oss.sgi.com id ; Fri, 19 May 2000 05:21:10 +0000 Received: from [203.44.49.131] ([203.44.49.131]:54802 "HELO relay.sausage.com") by oss.sgi.com with SMTP id ; Fri, 19 May 2000 05:20:37 +0000 Received: (qmail 15255 invoked by uid 505); 19 May 2000 05:20:29 -0000 Received: from peterd@sausage.com by sausage.com.au with scan4virus-0.19 (sweep: 1.8/3.33. . Clean. Processed in 0.499892 secs); 19/05/2000 15:20:28 Received: from unknown (HELO mail.sausage.com.au) (203.44.49.151) by 203.44.49.131 with SMTP; 19 May 2000 05:20:28 -0000 Received: from peter ([172.16.24.86]) by mail.sausage.com.au (Post.Office MTA v3.5.3 release 223 ID# 0-59745U200L100S0V35) with SMTP id au for ; Fri, 19 May 2000 15:21:38 +1000 Message-ID: <002101bfc151$f053d1c0$561810ac@sausage.com.au> From: "Peter Dettman" To: Subject: Bug Report Date: Fri, 19 May 2000 15:20:25 +1000 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Here follows my first Linux bug report. I have tried to follow the recommended layout and include all relevant info., but if there are more questions, feel free to get back to me at peterd@sausage.com Pete. [1.] One line summary of the problem: Unhandled kernel paging request whilst running VNC server. [2.] Full description of the problem/report: This happens fairly often when I am running Xvnc, but only when the VNC client is actually connected. Combine this with the stack trace info. and it looks like a problem with networking. [3.] Keywords (i.e., modules, networking, kernel): kernel, networking [4.] Kernel version (from /proc/version): 2.2.14-5.0 ( RedHat 6.2 ) [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) Options used: -V (default) -o /lib/modules/2.2.14-5.0/ (default) -k /proc/ksyms (default) -l /proc/modules (default) -m /usr/src/linux/System.map (default) -c 1 (default) Unable to handle kernel paging request at virtual address 00140098 current->tss.cr3 = 018c8000, %cr3 = 018c8000 *pde = 00000000 Oops: 0002 CPU: 0 EIP: 0010:[] EFLAGS: 00010002 eax: 00000000 ebx: 00000202 ecx: c2b67d74 edx: 00140098 esi: c2422090 edi: c2b67c40 ebp: c1723da8 esp: c1723dac ds: 0018 es: 0018 ss: 0018 Process Xvnc (pid: 1298, process nr: 48, stackpage=c1723000) Stack: c0166b83 c2b67d74 c110ad80 c2b67cf0 c2b67cf0 c2b67c40 c2b67cf0 c2b67cf0 c2b67cf0 c014df8c c3fff980 00000008 c110ad80 c110a92c c016708b c2b67c40 c110a900 c110ad80 00000008 00000001 00000001 c2b67cf0 c1723e78 00000001 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 89 02 85 c0 74 03 89 50 04 b8 01 00 00 00 eb 03 90 31 c0 c7 >>EIP: c01113a7 Trace: c0166b83 Trace: c014df8c Trace: c016708b Trace: c0162e6a Trace: c01716f0 Trace: c0169818 Trace: c0171773 Trace: c014ba20 Code: c01113a7 00000000 <_EIP>: <=== Code: c01113a7 0: 89 02 mov %eax,(%edx) <=== Code: c01113a9 2: 85 c0 test %eax,%eax Code: c01113ab 4: 74 03 je c01113b0 Code: c01113ad 6: 89 50 04 mov %edx,0x4(%eax) Code: c01113b0 9: b8 01 00 00 00 mov $0x1,%eax Code: c01113b5 e: eb 03 jmp c01113ba Code: c01113b7 10: 90 nop Code: c01113b8 11: 31 c0 xor %eax,%eax Code: c01113ba 13: c7 00 00 00 00 00 movl $0x0,(%eax) [6.] A small shell script or example program which triggers the problem (if possible) I am running VNC server v3.3.3r1 on Linux (Xvnc) and connecting to it from a Windows VNC client v3.3.3r2. The actual crash is of course a little unpredictable, but I have only seen it when the client was actually connected. [7.] Environment [7.1.] Software (add the output of the ver_linux script here) Linux devnull 2.2.14-5.0 #10 Tue May 16 15:34:25 EST 2000 i586 unknown Kernel modules 2.3.10-pre1 Gnu C egcs-2.91.66 Binutils 2.9.5.0.22 Linux C Library 2.1.3 Dynamic linker ldd (GNU libc) 2.1.3 Procps 2.0.6 Mount 2.10f Net-tools 1.54 Console-tools 0.3.3 Sh-utils 2.0 Modules Loaded lockd sunrpc de4x5 [7.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : GenuineIntel cpu family : 5 model : 2 model name : Pentium 75 - 200 stepping : 12 cpu MHz : 133.639029 fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : yes coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 bogomips : 53.25 [7.3.] Module information (from /proc/modules): lockd 31592 1 (autoclean) sunrpc 53540 1 (autoclean) [lockd] de4x5 41268 1 (autoclean) [7.4.] SCSI information (from /proc/scsi/scsi) N/A [7.5.] Other information that might be relevant to the problem (please look in /proc and include all information that you think to be relevant): The box has a single PCI network card, with 'SMC' emblazoned on it; the chips say 'Digital'. The driver that RedHat selected for it was 'de4x5'. There might be a mismatch here, but I've had no networking problems apart from this. [X.] Other notes, patches, fixes, workarounds: VNC is available for free download from http://www.uk.research.att.com/vnc From owner-netdev@oss.sgi.com Fri May 19 07:11:54 2000 Received: by oss.sgi.com id ; Fri, 19 May 2000 14:11:44 +0000 Received: from rrzd1.rz.uni-regensburg.de ([132.199.1.6]:44811 "EHLO rrzd1.rz.uni-regensburg.de") by oss.sgi.com with ESMTP id ; Fri, 19 May 2000 14:11:32 +0000 Received: from rss1.rz.uni-regensburg.de (rss1.rz.uni-regensburg.de [132.199.1.200]) by rrzd1.rz.uni-regensburg.de (8.9.3/8.9.3-KLW-Linux-0.1) with SMTP id QAA15509 for ; Fri, 19 May 2000 16:11:04 +0200 Received: (qmail 29561 invoked from network); 19 May 2000 16:11:29 +0200 Received: from rrzc3.rz.uni-regensburg.de (132.199.38.3) by rss1.rz.uni-regensburg.de with QMQP; 19 May 2000 16:11:29 +0200 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 19 May 2000 16:11:28 +0200 Date: Fri, 19 May 2000 16:11:28 +0200 (MET DST) From: Rolf Schillinger X-Sender: scr19100@rrzc3.rz.uni-regensburg.de To: netdev@oss.sgi.com Subject: kernel oops in > 2.3.99-pre7 Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/Mixed; BOUNDARY="-559023410-1804928587-958744721=:4143" Content-ID: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---559023410-1804928587-958744721=:4143 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-ID: Hi all, I hope to have picked the right address for this bug-report as it is my first and also the first time dealing with oopsa in general. I get an oops when doing isdnctrl dial ipppX. Somewhere along the pre7-3 or -4 this behaviour started. It could be connected with a crash I had around that time that forced me to fsck. That fsck had to repair lots of errors on the fs. But then again I replaced all necessary libs to the best of my knowledge and it still happened. Now I am up to pre9-2 and still no go. Here is the oops: ksymoops 2.3.4 on i686 2.3.99-pre9. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.3.99-pre9/ (default) -m /boot/System.map-2.3.99-pre9 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (regular_file): read_system_map stat /boot/System.map-2.3.99-pre9 failed c01b0533 Oops: 0002 CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 eax: 00000045 ebx: cfafa320 ecx: c02386d4 edx: c02386d4 esi: 000080fd edi: ceda0000 ebp: cccc89a0 esp: c024de64 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c024d000) Stack: c021f4e0 d0843efc afafa320 ceda0000 d0843efc afafa320 afafa320 ceda0000 ccc89c00 ccc89a00 ccc89c10 d0843e93 ccc89c00 ccc89a00 cfafa320 000080fe cfafa320 cfafa320 ccc89c10 ccc89a00 00000000 d083b38b ccc89c00 ccc89a00 Call trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] <[] [] [] [] [] [] [] [] [] [] [] [] Code: c7 05 00 00 00 00 00 00 00 00 83 c4 08 8b 44 24 0c 8b 40 2c >>EIP; c01b0533 <__kfree_skb+1b/100> <===== Trace; c021f4e0 Trace; d0843efc <[bttv].bss.end+7d/141e1> Trace; d0843efc <[bttv].bss.end+7d/141e1> Trace; d0843c93 <[bttv]__module_author+12f3/13b0> Trace; d083b39b <[bttv]bttv_ioctl+877/21e0> Trace; d083b41a <[bttv]bttv_ioctl+8f6/21e0> Trace; d08354c9 <[i2c-algo-bit].rodata.start+2e9/969> Trace; d086ed27 <[isdn]isdn_net_getphones+4b/d4> Trace; d0865a25 <[msp3400]__module_parm_mixer+61df/781a> Trace; d0865a4f <[msp3400]__module_parm_mixer+6209/781a> Trace; c011ec72 Trace; c011c4e3 Trace; c011c438 Trace; c011c331 Trace; c010bf64 Trace; c0110a7c Trace; c0110a7c Trace; c0108ae0 Trace; c0108b49 Trace; c0105000 Trace; c010018d Before first symbol Code; c01b0533 <__kfree_skb+1b/100> 00000000 <_EIP>: Code; c01b0533 <__kfree_skb+1b/100> <===== 0: c7 05 00 00 00 00 00 movl $0x0,0x0 <===== Code; c01b053a <__kfree_skb+22/100> 7: 00 00 00 Code; c01b053d <__kfree_skb+25/100> a: 83 c4 08 add $0x8,%esp Code; c01b0540 <__kfree_skb+28/100> d: 8b 44 24 0c mov 0xc(%esp,1),%eax Code; c01b0544 <__kfree_skb+2c/100> 11: 8b 40 2c mov 0x2c(%eax),%eax Aiee, killing interrupt handler Kernel panic: Attempted to kill the idle task My Hardware is as follows: Biostar M6VBE motherboard, P-III 450, 256MB, Teledat PCI (AVM Fritz PCI), Matrox G400 single head, creative dxr3 dvd (no module loaded at that stage), Hauppauge wintv based on bt878. silverado:/home/rolf# sh /usr/src/linux/scripts/ver_linux -- Versions installed: (if some fields are empty or look -- unusual then possibly you have very old versions) Linux silverado 2.2.14 #2 Wed May 10 21:00:11 CEST 2000 i686 unknown Kernel modules 2.3.11 Gnu C 2.95.2 Binutils 2.9.5.0.31 Linux C Library 2.1.3 Dynamic linker ldd: version 1.9.11 Procps . Mount 2.10f Net-tools 2.05 Kbd 0.99 Sh-utils 2.0g Modules Loaded hisax isdn msp3400 tuner bttv i2c videodev snd-pcm-oss snd-pcm-plugin snd-mixer-oss snd-card-es1938 snd-es1938 snd-pcm snd-timer snd-mixer snd soundcore The module list doesnt represent the modules loaded when it oopses At that point the modules loaded are hisax isdn msp3400 tuner bttv i2c videov in version from stock pre2.3. the lspci -vvv output is included as attachment. What's left to say is that all is well with 2.2.14 and with pre8-final it dialled 3 times before oopsing without being able to connect tho. I hope I supplied the correct information and want to thank you all for making linux possible. Bis bald, Rolf ---559023410-1804928587-958744721=:4143 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="lspci.silverado" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: ATTACHMENT; FILENAME="lspci.silverado" MDA6MDAuMCBIb3N0IGJyaWRnZTogVklBIFRlY2hub2xvZ2llcywgSW5jLiBW VDgyQzY5MSBbQXBvbGxvIFBST10gKHJldiAwNikNCglDb250cm9sOiBJL08t IE1lbSsgQnVzTWFzdGVyKyBTcGVjQ3ljbGUtIE1lbVdJTlYtIFZHQVNub29w LSBQYXJFcnItIFN0ZXBwaW5nLSBTRVJSLSBGYXN0QjJCLQ0KCVN0YXR1czog Q2FwKyA2Nk1oei0gVURGLSBGYXN0QjJCKyBQYXJFcnItIERFVlNFTD1tZWRp dW0gPlRBYm9ydC0gPFRBYm9ydC0gPE1BYm9ydCsgPlNFUlItIDxQRVJSKw0K CUxhdGVuY3k6IDE2IHNldA0KCVJlZ2lvbiAwOiBNZW1vcnkgYXQgZTAwMDAw MDAgKDMyLWJpdCwgcHJlZmV0Y2hhYmxlKQ0KCUNhcGFiaWxpdGllczogW2Ew XSBBR1AgdmVyc2lvbiAxLjANCgkJU3RhdHVzOiBSUT03IFNCQSsgNjRiaXQt IEZXLSBSYXRlPXgxLHgyDQoJCUNvbW1hbmQ6IFJRPTAgU0JBLSBBR1AtIDY0 Yml0LSBGVy0gUmF0ZT08bm9uZT4NCg0KMDA6MDEuMCBQQ0kgYnJpZGdlOiBW SUEgVGVjaG5vbG9naWVzLCBJbmMuIFZUODJDNTk4IFtBcG9sbG8gTVZQMyBB R1BdIChwcm9nLWlmIDAwIFtOb3JtYWwgZGVjb2RlXSkNCglDb250cm9sOiBJ L08rIE1lbSsgQnVzTWFzdGVyKyBTcGVjQ3ljbGUtIE1lbVdJTlYtIFZHQVNu b29wLSBQYXJFcnItIFN0ZXBwaW5nLSBTRVJSLSBGYXN0QjJCLQ0KCVN0YXR1 czogQ2FwLSA2Nk1oeisgVURGLSBGYXN0QjJCLSBQYXJFcnItIERFVlNFTD1t ZWRpdW0gPlRBYm9ydC0gPFRBYm9ydC0gPE1BYm9ydCsgPlNFUlItIDxQRVJS LQ0KCUxhdGVuY3k6IDAgc2V0DQoJQnVzOiBwcmltYXJ5PTAwLCBzZWNvbmRh cnk9MDEsIHN1Ym9yZGluYXRlPTAxLCBzZWMtbGF0ZW5jeT0wDQoJISEhIFVu a25vd24gSS9PIHJhbmdlIHR5cGVzIGZmL2ZmDQoJTWVtb3J5IGJlaGluZCBi cmlkZ2U6IGU0MDAwMDAwLWU3ZmZmZmZmDQoJUHJlZmV0Y2hhYmxlIG1lbW9y eSBiZWhpbmQgYnJpZGdlOiBlODAwMDAwMC1lOWZmZmZmZg0KCVNlY29uZGFy eSBzdGF0dXM6IFNFUlINCglCcmlkZ2VDdGw6IFBhcml0eS0gU0VSUi0gTm9J U0ErIFZHQSsgTUFib3J0LSA+UmVzZXQtIEZhc3RCMkItDQoNCjAwOjA3LjAg SVNBIGJyaWRnZTogVklBIFRlY2hub2xvZ2llcywgSW5jLiBWVDgyQzU5NiBJ U0EgW0Fwb2xsbyBQUk9dIChyZXYgMDcpDQoJU3Vic3lzdGVtOiBWSUEgVGVj aG5vbG9naWVzLCBJbmMuOiBVbmtub3duIGRldmljZSAwMDAwDQoJQ29udHJv bDogSS9PKyBNZW0rIEJ1c01hc3RlcisgU3BlY0N5Y2xlLSBNZW1XSU5WLSBW R0FTbm9vcC0gUGFyRXJyLSBTdGVwcGluZysgU0VSUi0gRmFzdEIyQi0NCglT dGF0dXM6IENhcC0gNjZNaHotIFVERi0gRmFzdEIyQi0gUGFyRXJyLSBERVZT RUw9bWVkaXVtID5UQWJvcnQtIDxUQWJvcnQtIDxNQWJvcnQtID5TRVJSLSA8 UEVSUi0NCglMYXRlbmN5OiAwIHNldA0KDQowMDowNy4xIElERSBpbnRlcmZh Y2U6IFZJQSBUZWNobm9sb2dpZXMsIEluYy4gVlQ4MkM1ODYgSURFIFtBcG9s bG9dIChyZXYgMDYpIChwcm9nLWlmIDhhIFtNYXN0ZXIgU2VjUCBQcmlQXSkN CglDb250cm9sOiBJL08rIE1lbSsgQnVzTWFzdGVyKyBTcGVjQ3ljbGUtIE1l bVdJTlYtIFZHQVNub29wLSBQYXJFcnItIFN0ZXBwaW5nLSBTRVJSLSBGYXN0 QjJCLQ0KCVN0YXR1czogQ2FwLSA2Nk1oei0gVURGLSBGYXN0QjJCKyBQYXJF cnItIERFVlNFTD1tZWRpdW0gPlRBYm9ydC0gPFRBYm9ydC0gPE1BYm9ydC0g PlNFUlItIDxQRVJSLQ0KCUxhdGVuY3k6IDY0IHNldA0KCVJlZ2lvbiA0OiBJ L08gcG9ydHMgYXQgZDAwMA0KDQowMDowNy4yIFVTQiBDb250cm9sbGVyOiBW SUEgVGVjaG5vbG9naWVzLCBJbmMuIFZUODJDNTg2QiBVU0IgKHJldiAwMikg KHByb2ctaWYgMDAgW1VIQ0ldKQ0KCVN1YnN5c3RlbTogVW5rbm93biBkZXZp Y2UgMDkyNToxMjM0DQoJQ29udHJvbDogSS9PKyBNZW0rIEJ1c01hc3Rlcisg U3BlY0N5Y2xlLSBNZW1XSU5WLSBWR0FTbm9vcC0gUGFyRXJyLSBTdGVwcGlu Zy0gU0VSUi0gRmFzdEIyQi0NCglTdGF0dXM6IENhcC0gNjZNaHotIFVERi0g RmFzdEIyQi0gUGFyRXJyLSBERVZTRUw9bWVkaXVtID5UQWJvcnQtIDxUQWJv cnQtIDxNQWJvcnQtID5TRVJSLSA8UEVSUi0NCglMYXRlbmN5OiA2NCBzZXQs IGNhY2hlIGxpbmUgc2l6ZSAwOA0KCUludGVycnVwdDogcGluIEQgcm91dGVk IHRvIElSUSAwDQoJUmVnaW9uIDQ6IEkvTyBwb3J0cyBhdCBkNDAwDQoNCjAw OjA3LjMgSG9zdCBicmlkZ2U6IFZJQSBUZWNobm9sb2dpZXMsIEluYy46IFVu a25vd24gZGV2aWNlIDMwNTANCglDb250cm9sOiBJL08tIE1lbS0gQnVzTWFz dGVyLSBTcGVjQ3ljbGUtIE1lbVdJTlYtIFZHQVNub29wLSBQYXJFcnItIFN0 ZXBwaW5nLSBTRVJSLSBGYXN0QjJCLQ0KCVN0YXR1czogQ2FwLSA2Nk1oei0g VURGLSBGYXN0QjJCKyBQYXJFcnItIERFVlNFTD1tZWRpdW0gPlRBYm9ydC0g PFRBYm9ydC0gPE1BYm9ydC0gPlNFUlItIDxQRVJSLQ0KDQowMDowOC4wIE11 bHRpbWVkaWEgY29udHJvbGxlcjogU2lnbWEgRGVzaWducywgSW5jLiBSRUFM bWFnaWMgSG9sbHl3b29kIFBsdXMgRFZEIERlY29kZXIgKHJldiAwMikNCglD b250cm9sOiBJL08tIE1lbSsgQnVzTWFzdGVyKyBTcGVjQ3ljbGUtIE1lbVdJ TlYtIFZHQVNub29wLSBQYXJFcnItIFN0ZXBwaW5nLSBTRVJSLSBGYXN0QjJC LQ0KCVN0YXR1czogQ2FwKyA2Nk1oei0gVURGLSBGYXN0QjJCLSBQYXJFcnIt IERFVlNFTD1tZWRpdW0gPlRBYm9ydC0gPFRBYm9ydC0gPE1BYm9ydC0gPlNF UlItIDxQRVJSLQ0KCUxhdGVuY3k6IDY0IHNldA0KCUludGVycnVwdDogcGlu IEEgcm91dGVkIHRvIElSUSAxMQ0KCVJlZ2lvbiAwOiBNZW1vcnkgYXQgZWEw MDAwMDAgKDMyLWJpdCwgbm9uLXByZWZldGNoYWJsZSkNCglDYXBhYmlsaXRp ZXM6IFs0MF0gUG93ZXIgTWFuYWdlbWVudCB2ZXJzaW9uIDENCgkJRmxhZ3M6 IFBNRUNsay0gQXV4UHdyLSBEU0ktIEQxLSBEMi0gUE1FLQ0KCQlTdGF0dXM6 IEQwIFBNRS1FbmFibGUtIERTZWw9MCBEU2NhbGU9MCBQTUUtDQoNCjAwOjA5 LjAgTmV0d29yayBjb250cm9sbGVyOiBBVk0gQXVkaW92aXN1ZWxsZXMgTUtU RyAmIENvbXB1dGVyIFN5c3RlbSBHbWJIIEExIElTRE4gW0ZyaXR6XSAocmV2 IDAyKQ0KCVN1YnN5c3RlbTogQVZNIEF1ZGlvdmlzdWVsbGVzIE1LVEcgJiBD b21wdXRlciBTeXN0ZW0gR21iSDogVW5rbm93biBkZXZpY2UgMGEwMA0KCUNv bnRyb2w6IEkvTysgTWVtKyBCdXNNYXN0ZXItIFNwZWNDeWNsZS0gTWVtV0lO Vi0gVkdBU25vb3AtIFBhckVyci0gU3RlcHBpbmctIFNFUlItIEZhc3RCMkIt DQoJU3RhdHVzOiBDYXAtIDY2TWh6LSBVREYtIEZhc3RCMkIrIFBhckVyci0g REVWU0VMPW1lZGl1bSA+VEFib3J0LSA8VEFib3J0LSA8TUFib3J0LSA+U0VS Ui0gPFBFUlItDQoJSW50ZXJydXB0OiBwaW4gQSByb3V0ZWQgdG8gSVJRIDEw DQoJUmVnaW9uIDA6IE1lbW9yeSBhdCBlYTEwMDAwMCAoMzItYml0LCBub24t cHJlZmV0Y2hhYmxlKQ0KCVJlZ2lvbiAxOiBJL08gcG9ydHMgYXQgZDgwMA0K DQowMDowYS4wIE11bHRpbWVkaWEgdmlkZW8gY29udHJvbGxlcjogQnJvb2t0 cmVlIENvcnBvcmF0aW9uIEJ0ODc4IChyZXYgMDIpDQoJU3Vic3lzdGVtOiBI YXVwcGFnZSBjb21wdXRlciB3b3JrcyBJbmMuOiBVbmtub3duIGRldmljZSAx M2ViDQoJQ29udHJvbDogSS9PLSBNZW0rIEJ1c01hc3RlcisgU3BlY0N5Y2xl LSBNZW1XSU5WLSBWR0FTbm9vcC0gUGFyRXJyLSBTdGVwcGluZy0gU0VSUi0g RmFzdEIyQi0NCglTdGF0dXM6IENhcC0gNjZNaHotIFVERi0gRmFzdEIyQisg UGFyRXJyLSBERVZTRUw9bWVkaXVtID5UQWJvcnQtIDxUQWJvcnQtIDxNQWJv cnQtID5TRVJSLSA8UEVSUi0NCglMYXRlbmN5OiAxNiBtaW4sIDQwIG1heCwg NjQgc2V0DQoJSW50ZXJydXB0OiBwaW4gQSByb3V0ZWQgdG8gSVJRIDUNCglS ZWdpb24gMDogTWVtb3J5IGF0IGVhMTAxMDAwICgzMi1iaXQsIHByZWZldGNo YWJsZSkNCg0KMDA6MGEuMSBNdWx0aW1lZGlhIGNvbnRyb2xsZXI6IEJyb29r dHJlZSBDb3Jwb3JhdGlvbiBCdDg3OCAocmV2IDAyKQ0KCVN1YnN5c3RlbTog SGF1cHBhZ2UgY29tcHV0ZXIgd29ya3MgSW5jLjogVW5rbm93biBkZXZpY2Ug MTNlYg0KCUNvbnRyb2w6IEkvTy0gTWVtKyBCdXNNYXN0ZXIrIFNwZWNDeWNs ZS0gTWVtV0lOVi0gVkdBU25vb3AtIFBhckVyci0gU3RlcHBpbmctIFNFUlIt IEZhc3RCMkItDQoJU3RhdHVzOiBDYXAtIDY2TWh6LSBVREYtIEZhc3RCMkIr IFBhckVyci0gREVWU0VMPW1lZGl1bSA+VEFib3J0LSA8VEFib3J0LSA8TUFi b3J0LSA+U0VSUi0gPFBFUlItDQoJTGF0ZW5jeTogNCBtaW4sIDI1NSBtYXgs IDY0IHNldA0KCUludGVycnVwdDogcGluIEEgcm91dGVkIHRvIElSUSA1DQoJ UmVnaW9uIDA6IE1lbW9yeSBhdCBlYTEwMjAwMCAoMzItYml0LCBwcmVmZXRj aGFibGUpDQoNCjAwOjBiLjAgTXVsdGltZWRpYSBhdWRpbyBjb250cm9sbGVy OiBFU1MgVGVjaG5vbG9neSBFUzE5NjkgU29sby0xIEF1ZGlvZHJpdmUgKHJl diAwMSkNCglTdWJzeXN0ZW06IEVTUyBUZWNobm9sb2d5OiBVbmtub3duIGRl dmljZSA4ODg4DQoJQ29udHJvbDogSS9PKyBNZW0tIEJ1c01hc3RlcisgU3Bl Y0N5Y2xlLSBNZW1XSU5WLSBWR0FTbm9vcC0gUGFyRXJyLSBTdGVwcGluZy0g U0VSUi0gRmFzdEIyQi0NCglTdGF0dXM6IENhcCsgNjZNaHotIFVERi0gRmFz dEIyQisgUGFyRXJyLSBERVZTRUw9bWVkaXVtID5UQWJvcnQtIDxUQWJvcnQt IDxNQWJvcnQtID5TRVJSLSA8UEVSUi0NCglMYXRlbmN5OiAyIG1pbiwgMjQg bWF4LCA2NCBzZXQNCglJbnRlcnJ1cHQ6IHBpbiBBIHJvdXRlZCB0byBJUlEg OQ0KCVJlZ2lvbiAwOiBJL08gcG9ydHMgYXQgZGMwMA0KCVJlZ2lvbiAxOiBJ L08gcG9ydHMgYXQgZTAwMA0KCVJlZ2lvbiAyOiBJL08gcG9ydHMgYXQgZTQw MA0KCVJlZ2lvbiAzOiBJL08gcG9ydHMgYXQgZTgwMA0KCVJlZ2lvbiA0OiBJ L08gcG9ydHMgYXQgZWMwMA0KCUNhcGFiaWxpdGllczogW2MwXSBQb3dlciBN YW5hZ2VtZW50IHZlcnNpb24gMQ0KCQlGbGFnczogUE1FQ2xrLSBBdXhQd3It IERTSSsgRDErIEQyKyBQTUUtDQoJCVN0YXR1czogRDAgUE1FLUVuYWJsZS0g RFNlbD0wIERTY2FsZT0wIFBNRS0NCg0KMDE6MDAuMCBWR0EgY29tcGF0aWJs ZSBjb250cm9sbGVyOiBNYXRyb3ggR3JhcGhpY3MsIEluYy4gTUdBIEc0MDAg QUdQIChyZXYgMDQpIChwcm9nLWlmIDAwIFtWR0FdKQ0KCVN1YnN5c3RlbTog TWF0cm94IEdyYXBoaWNzLCBJbmMuIE1pbGxlbm5pdW0gRzQwMCAzMk1iIFNH UkFNDQoJQ29udHJvbDogSS9PKyBNZW0rIEJ1c01hc3RlcisgU3BlY0N5Y2xl LSBNZW1XSU5WLSBWR0FTbm9vcC0gUGFyRXJyLSBTdGVwcGluZy0gU0VSUi0g RmFzdEIyQi0NCglTdGF0dXM6IENhcCsgNjZNaHotIFVERi0gRmFzdEIyQisg UGFyRXJyLSBERVZTRUw9bWVkaXVtID5UQWJvcnQtIDxUQWJvcnQtIDxNQWJv cnQtID5TRVJSLSA8UEVSUi0NCglMYXRlbmN5OiAxNiBtaW4sIDMyIG1heCwg NjQgc2V0LCBjYWNoZSBsaW5lIHNpemUgMDgNCglJbnRlcnJ1cHQ6IHBpbiBB IHJvdXRlZCB0byBJUlEgMTENCglSZWdpb24gMDogTWVtb3J5IGF0IGU4MDAw MDAwICgzMi1iaXQsIHByZWZldGNoYWJsZSkNCglSZWdpb24gMTogTWVtb3J5 IGF0IGU0MDAwMDAwICgzMi1iaXQsIG5vbi1wcmVmZXRjaGFibGUpDQoJUmVn aW9uIDI6IE1lbW9yeSBhdCBlNTAwMDAwMCAoMzItYml0LCBub24tcHJlZmV0 Y2hhYmxlKQ0KCUNhcGFiaWxpdGllczogW2RjXSBQb3dlciBNYW5hZ2VtZW50 IHZlcnNpb24gMg0KCQlGbGFnczogUE1FQ2xrLSBBdXhQd3ItIERTSSsgRDEt IEQyLSBQTUUtDQoJCVN0YXR1czogRDAgUE1FLUVuYWJsZS0gRFNlbD0wIERT Y2FsZT0wIFBNRS0NCglDYXBhYmlsaXRpZXM6IFtmMF0gQUdQIHZlcnNpb24g Mi4wDQoJCVN0YXR1czogUlE9MzEgU0JBKyA2NGJpdC0gRlctIFJhdGU9eDEs eDINCgkJQ29tbWFuZDogUlE9MzEgU0JBKyBBR1ArIDY0Yml0LSBGVy0gUmF0 ZT14MQ0KDQo= ---559023410-1804928587-958744721=:4143-- From owner-netdev@new-oss Fri May 19 11:35:12 2000 Received: (from majordomo@localhost) by new-oss (8.10.1/8.10.1) id e4JIZCu01323 for netdev-outgoing; Fri, 19 May 2000 11:35:12 -0700 X-Authentication-Warning: new-oss: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from vaio.greennet ([206.24.4.33]) by new-oss (8.10.1/8.10.1) with ESMTP id e4JIZAr01320 for ; Fri, 19 May 2000 11:35:11 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id PAA04236; Fri, 19 May 2000 15:37:47 -0400 Date: Fri, 19 May 2000 15:37:46 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Andrew Morton cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <39240EA9.389AE2FC@uow.edu.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 19 May 2000, Andrew Morton wrote: > Donald Becker wrote: > > > kuznet@ms2.inr.ac.ru wrote: > > > > Timers are self-destructable as rule. See? Normal usage > > > > for timer is to have it allocated inside an object and > > > > timer event detroys the object together with timer. > > I'm curious -- what code does this? > me too. I think this is the crux of this specific problem: someone has assumed that - timers deal only with objects allocated specifically for that instance, - the timer function always frees the object. This assumption is flawed - it's not the way timers are used - the timer function cannot delete its own object, since del_timer() would lead to memory leaks > > I don't see the semantic problem here. > > Well, the timer handler could be executing on another CPU _while_ the > timer is being deleted. So when del_timer() returns, the handler is > still executing! So in this example, CPU0 could execute > use(some_resources) AFTER CPU1 has done release(some_resources). When I saw the claim that timer handling was flawed in 2.3, I had guessed some similar scenerio. But I couldn't imagine something thinking that such flawed semantics would be reasonable. > > This was the recommended way to use the timer routines. If the semantics > > have changed, there should be new names for the changed semantics. > > I agree. I assume what happened was that when the timer code went SMP, > the synchronous semantics of del_timer() became asynchronous but the > name was unchanged. > > I believe what _should_ have been done when converting this to SMP was > to > > - manage the "handler is running" flag _outside_ the handler > - preserve del_timer()'s synchronous semantics (via the spin) > - introduce a new function del_timer_async() which has the > handler-can-still-be-running semantics. That would be the only reasonable way to handle this. People would quickly find out that del_timer_async() is almost useless, because you the timer routine is presumably doing something, and that something might happen at some unpredictable future time. Keep in mind that with modversions, the following will not work #define del_timer(..) working_del_timer(..) nor will static inline del_timer() {...} > I'm not sure if 2.2 is clean in this regard. It doesn't have > del_timer_sync(). Spent ten minutes peering at 2.2, and I can't see > why, if some random piece of kernel code does a del_timer() in IRQ or > process context, concurrent execution of the post-del_timer() code and > the handler will not occur. The problem window for net drivers is presumably small, since the timer routines run for minimal time every few seconds. The only thing that might take more than a handful of microseconds is a MII read, which can take tens of microseconds, and the routines are run as little as every 60 seconds. The risk is that the timer handler re-adds itself, when the driver thinks that it has been shut down. > > It would be useful to distinguish between "bugs" and "interfaces changes > > that have made the following no longer correct since version X.Y.Z". Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Fri May 19 13:20:55 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4JKKtl01710 for netdev-outgoing; Fri, 19 May 2000 13:20:55 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4JKKsU01707 for ; Fri, 19 May 2000 13:20:54 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id QAA23820; Fri, 19 May 2000 16:20:47 -0400 Message-ID: <3925A238.95C10DAE@mandrakesoft.com> Date: Fri, 19 May 2000 16:21:12 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: Donald Becker , "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <392407D4.BE586507@uow.edu.au> <392421F1.249E284F@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Andrew Morton wrote: > Donald Becker wrote: > > ... > > > > > > start_xmit() > > > { > > > ... > > > if (!room for another packet) > > > netif_stop_queue() > > > } > > > > This doesn't give us a way to set dev->tbusy, which is required for all > > pre-2.3 kernels. > > Burn them boats :) > > I've been looking for a compatibility header which maps > netif_stop_queue() into dev->tbusy manipulation, but I can't find one. > Have I missed something? IMNSHO you should try the compatibility module at http://gtf.org/garzik/drivers/kcompat24/ It provides the necessary glue to port 2.3.x/2.4.0 drivers back to 2.2.x. Note that its still an early development version, but it is already being used in a couple places. > > > Has anyone done any serious work with NIC/CPU bonding? > > > > The Mindcraft "benchmark" is superficially obvious, but the big network > > difference was that they were apparently using the TCP/IP checksum hardware > > on the i82559. This has far more effect on SMP performance than anything > > else that was done. We didn't even find out that the chip had the feature > > until months later, and still don't have the documentation on how to use > > it. > > Does it make that much difference? When we discussed this starting > April 24 the consensus seemed to be that the overhead of s/w > checksumming is in the noise floor when combined with the copy. > > Anyway, I have religious objections to hardware IP checksums. > > I've seen several instances where they would have failed to detect > errors which software checksumming would detect: [...] That's the bugger of the problem. I take the opposite stance because I want to fully maximize my hardware's capabilities -- if it can do good and accurate checksums, then I would certainly prefer to offload that processing. But if the network hardware has problems with checksumming, finding the cause of the problem is often more difficult than normal since it is easy to blame the other side, or unrelated hardware, for the checksumming problems. Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Fri May 19 13:16:13 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4JKGDn01675 for netdev-outgoing; Fri, 19 May 2000 13:16:13 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4JKG2U01670 for ; Fri, 19 May 2000 13:16:02 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id QAA23633; Fri, 19 May 2000 16:15:46 -0400 Message-ID: <3925A0F3.D3184229@mandrakesoft.com> Date: Fri, 19 May 2000 16:15:47 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Donald Becker CC: Andrew Morton , "netdev@oss.sgi.com" Subject: Re: Tx queueing References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Donald Becker wrote: > > On Fri, 19 May 2000, Andrew Morton wrote: > > > A number of drivers do this: > > > > start_xmit() > > { > > netif_stop_queue() > > ... > > if (room for another packet) > > netif_wake_queue() > > ... > > } > > > > I suspect this is a simple port from the dev->tbusy days. Not only is it simple, it's wrong wrong wrong. The only exceptions are going to be older NICs which only support a single Tx packet buffer (3c501 is like this?). Andrew (or anyone else) -- do you know any of existing drivers which use this broken logic? That logic was used in a few of my softnet conversions, but was quickly replaced with bug free and more correct code. > Yes, it's likely from a blind search-and-substitute. > Using stop_queue() and wake_queue() in the transmit path is part of the > reason why the 2.3 changes are bogus. > It is not possible, with these semantics, to write a driver that works well > in both pre-2.3 and 2.3. Since the example logic above is wrong that's not surprising :) There should not be a problem creating drivers for both 2.2 and 2.3 once broken code such as the above is corrected. > > It would seem to be more sensible to do > > > > start_xmit() > > { > > ... > > if (!room for another packet) > > netif_stop_queue() > > } > > This doesn't give us a way to set dev->tbusy, which is required for all > pre-2.3 kernels. Why not? Maybe I am missing something. A 2.2.x implementation of stop_queue should set dev->tbusy's bit 0. That's what acenic and my kcompat software both do. > > Has anyone done any serious work with NIC/CPU bonding? > > The Mindcraft "benchmark" is superficially obvious, but the big network > difference was that they were apparently using the TCP/IP checksum hardware > on the i82559. This has far more effect on SMP performance than anything > else that was done. We didn't even find out that the chip had the feature > until months later, and still don't have the documentation on how to use > it. Has anyone tried to get NDA'd doc from Intel? Andrey? Donald? Some companies are reluctant to give out databooks, but in my experience very few of those companies are in turn reluctant to give out reference source code and databooks under an NDA which allows for open source development. Regards, Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Fri May 19 13:26:32 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4JKQWM01750 for netdev-outgoing; Fri, 19 May 2000 13:26:32 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from Altitude.CAM.ORG (Altitude.CAM.ORG [198.168.100.1]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4JKQVU01747 for ; Fri, 19 May 2000 13:26:32 -0700 Received: (from uucp@localhost) by Altitude.CAM.ORG (8.9.3/8.9.3) with UUCP id PAA03800 for netdev@oss.sgi.com; Fri, 19 May 2000 15:11:41 -0400 (EDT) Received: (from hendrik@localhost) by topoi.cam.org (8.9.3/8.9.3/SuSE Linux 8.9.3-0.1) id OAA16131 for netdev@oss.sgi.com; Fri, 19 May 2000 14:33:08 -0400 From: Hendrik Boom Message-Id: <200005191833.OAA16131@topoi.cam.org> Subject: Re: Tx queueing In-Reply-To: from jamal at "May 18, 2000 10:14:00 pm" To: netdev@oss.sgi.com Date: Fri, 19 May 2000 14:33:07 -0400 (EDT) X-Mailer: ELM [version 2.4ME+ PL60 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk > > Well, i think backward compatibility is a philosphy that Linux gas never > conformed to anyways ;-> > I would say it is a strength not a weakness ;-> Actually, I find backwards conptibilityto be much better in Linux than in other OSes for the same hardware. I suspect no one is using OS upgrades as a tool for forcing purchase of new application softeare here. From owner-netdev@oss.sgi.com Fri May 19 13:35:25 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4JKZPt01898 for netdev-outgoing; Fri, 19 May 2000 13:35:25 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from athena.nuclecu.unam.mx (athena.nuclecu.unam.mx [132.248.29.9]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4JKZOU01895 for ; Fri, 19 May 2000 13:35:24 -0700 Received: from www.lcc.ac.th (Mena:/home/users/xtra/federico:/bin/bash [203.148.194.130] (may be forged)) by athena.nuclecu.unam.mx (8.9.3/8.9.3) with SMTP id PAA32394 for ; Fri, 19 May 2000 15:35:05 -0500 Received: from workathome (unverified [38.37.60.26]) by www.lcc.ac.th (EMWAC SMTPRS 0.83) with SMTP id ; Sat, 20 May 2000 03:42:52 +0700 From: "Work At Home" To: Subject: The #1 Work at Home Classified Site on the Net! Date: Fri, 19 May 2000 16:34:30 -0400 Message-ID: <008401bfc1d1$a2aaee60$e33c2526@workathome> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk This Home Based Business Classified Site Will Get You Results! Located on the web at: http://www.cashfromhome.com, CASHFROMHOME offers all of the tools you need to make your current opportunity a complete success! Don't belong to an opportunity yet? Don't worry, if you don't already work at home, CASHFROMHOME.COM can help you find a job that will allow you to work at home part or full time. Here is a little of what you can expect from: http://www.cashfromhome.com: --------------------------------------------------------------- Advertising Package Deals that include a combination of FFA advertising (40,000+ sites), opt-in newsletter advertising, banner advertising and/or classified advertising!!! http://www.cashfromhome.com/pack.html --------------------------------------------------------------- Opt-in Work at Home Newsletter Advertising (9300+ subscribers) http://www.cashfromhome.com/news2.html --------------------------------------------------------------- MEGA Advertising on THREE very popular wort-at-home sites! To learn more about this Special Deal, go here: http://www.cashfromhome.com/heavy2.html _____________________________________________________________________ We understand you have an interest in working at home. You have been now added to our in-house mailing lists. If you do not wish to receive any further messages from CashFromHome please FORWARD this entire message and type REMOVE in the SUBJECT heading to remove@cashfromhome.com. You will be removed from ALL CashFromHome emailings. It may take up to 24 hours to process a remove request, however you will not receive any e-mails from us within that time period. If you do not wish to be removed, please be advised that you will receive another e-mail from us no sooner than 50 days after you receive this original e-mail. From owner-netdev@oss.sgi.com Fri May 19 14:42:06 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4JLg6m02508 for netdev-outgoing; Fri, 19 May 2000 14:42:06 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4JLg5U02505 for ; Fri, 19 May 2000 14:42:05 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id RAA27792; Fri, 19 May 2000 17:42:00 -0400 Message-ID: <3925B528.51973F08@mandrakesoft.com> Date: Fri, 19 May 2000 17:42:00 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrey Savochkin CC: Donald Becker , netdev@oss.sgi.com Subject: Re: 2.2.15 with eepro100: eth0: Too much work at interrupt References: <4.2.2.20000516141302.00b53f00@pop3.ir.corsair.com> <4.2.2.20000516141302.00b53f00@pop3.ir.corsair.com> <20000517102928.A24697@saw.sw.com.sg> <4.2.2.20000517090400.00c1e1d0@pop3.ir.corsair.com> <20000518180559.G23500@valinux.com> <20000519091409.D1125@saw.sw.com.sg> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Andrey Savochkin wrote: > On Thu, May 18, 2000 at 06:05:59PM -0700, Dragan Stancevic wrote: > > From looking at the error messages that you are getting I second Andrey's > > mail, this is happening because the driver is experiencing problems when > > reading the eeprom and fails to detect you PHY. Unless you have no PHY :^) > > > > The driver has issues with reading the eeprom, it doesn't drive the > > eeprom serial clock correctly. By not following clock frequency > > specification it clocks data "into/out of" epprom incorrectly. > > It may be some other problems... > Original Donald's driver worked with udelay(100) until Donald himself > replaced the udelay by a delay implemented as a single inw()! A tangent subject: as I mentioned in a previous message, this inw() is really writew() in disguise, because eepro100 defaults to MMIO operation. This is fairly common in many Becker drivers, and also not a little bit misleading. I have converted 8139too over to a scheme which defines driver-specific register read/write macros, for example #ifdef USE_IO #define EE_W16(reg,data) outw((data), ioaddr+(reg)) #define EE_W16_F EE_W16 #else #define EE_W16(reg,data) writew((data), ioaddr+(reg)) #define EE_W16_F(reg,data) do { \ writew((data), ioaddr+(reg)); \ readw(ioaddr+(reg)); } while (0) #endif The purpose is twofold -- stop the misleading practice of redefining inw to be an MMIO operation -- and also make use of Donald's standard "ioaddr" temp var in the macro, to make code more readable. I think that redefining inw (etc.) is not good because it leads to bugs and confusion.. In general redefining a well-known function is not good practice. Donald, maybe you could be convinced to consider an alternative to '#define inw writew' in the next rev of your code? Regards, Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Fri May 19 15:02:48 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4JM2m202668 for netdev-outgoing; Fri, 19 May 2000 15:02:48 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4JM2lU02665 for ; Fri, 19 May 2000 15:02:48 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id SAA28648; Fri, 19 May 2000 18:02:41 -0400 Message-ID: <3925BA01.CC49AC1@mandrakesoft.com> Date: Fri, 19 May 2000 18:02:41 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <392407D4.BE586507@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Andrew Morton wrote: > > A number of drivers do this: > > start_xmit() > { > netif_stop_queue() > ... > if (room for another packet) > netif_wake_queue() > ... > } Which ones? They need fixing.. > For devices which have a Tx packet ring or decent FIFO I don't expect > this to be a problem, because the Tx ISR will call netif_wake_queue() > and the subsequent BH run will keep stuffing packets into the Tx ring > until it's full. But for devices which have very limited Tx buffering > there may be a lost opportunity to refill the Tx buffer earlier. Seems > unlikely to me. If you do something like that, it seems like it should be dependent on certain thresholds, not necessarily occurring all the time. And you'd want to sync with the Tx reaper called from the interrupt handler, too. Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Fri May 19 15:06:38 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4JM6cP02699 for netdev-outgoing; Fri, 19 May 2000 15:06:38 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4JM6bU02696 for ; Fri, 19 May 2000 15:06:37 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id SAA28813; Fri, 19 May 2000 18:06:31 -0400 Message-ID: <3925BB00.B1CDDFE7@mandrakesoft.com> Date: Fri, 19 May 2000 18:06:56 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Donald Becker CC: Andrew Morton , netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Donald Becker wrote: > On Fri, 19 May 2000, Andrew Morton wrote: > > I'll be spending the next working through the > > old net drivers. One very common theme/bug in these is the pattern: > > > > xxx_close() > > { > > ... > > del_timer(); > > release(some_resources); > > ... > > } > > > > xxx_timer() > > { > > use(some_resources); > > } > > I don't see the semantic problem here. > This was the recommended way to use the timer routines. If the semantics > have changed, there should be new names for the changed semantics. There doesn't seem to be anything in 2.2.x to prevent this sort of race at del_timer time. It always seemed to me like a driver-specific wait queue was needed for certain points in the close() process, like this. Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Fri May 19 17:26:00 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K0Q0p03953 for netdev-outgoing; Fri, 19 May 2000 17:26:00 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from smtprch1.nortel.com (smtprch1.nortelnetworks.com [192.135.215.14]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K0PuU03949 for ; Fri, 19 May 2000 17:25:56 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by smtprch1.nortel.com; Fri, 19 May 2000 19:15:54 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJZCC9; Sat, 20 May 2000 08:15:27 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY560T4; Sat, 20 May 2000 10:15:28 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id KAA08428; Sat, 20 May 2000 10:15:27 +1000 Message-ID: <3925D985.51472B30@uow.edu.au> Date: Sat, 20 May 2000 10:17:09 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <392407D4.BE586507@uow.edu.au> <3925BA01.CC49AC1@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Jeff Garzik wrote: > > Andrew Morton wrote: > > > > A number of drivers do this: > > > > start_xmit() > > { > > netif_stop_queue() > > ... > > if (room for another packet) > > netif_wake_queue() > > ... > > } > > Which ones? They need fixing.. 3c50?.c. Probably others... I'm disinclined to change these: - They're slow anyway. - Increased possiblity of breaking them - Broken 2.2/kcompat24 compatibility > > For devices which have a Tx packet ring or decent FIFO I don't expect > > this to be a problem, because the Tx ISR will call netif_wake_queue() > > and the subsequent BH run will keep stuffing packets into the Tx ring > > until it's full. But for devices which have very limited Tx buffering > > there may be a lost opportunity to refill the Tx buffer earlier. Seems > > unlikely to me. > > If you do something like that, it seems like it should be dependent on > certain thresholds, not necessarily occurring all the time. And you'd > want to sync with the Tx reaper called from the interrupt handler, too. I guess if you need more performance out of most of the Linux net drivers, you get it with a chequebook. That leaves us only really caring about performance on a handful of drivers, which is good. So. - No performance tweaks for [E]ISA drivers in 2.3. - Correctness fixes if they're obvious. - If the SMP-safety fixes are not obvious, mark the driver as UP-only. Sound sensible? -- -akpm- From owner-netdev@oss.sgi.com Fri May 19 17:44:49 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K0in904371 for netdev-outgoing; Fri, 19 May 2000 17:44:49 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from vaio.greennet ([206.24.4.33]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K0ilU04367 for ; Fri, 19 May 2000 17:44:48 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id UAA05845 for ; Fri, 19 May 2000 20:48:15 -0400 Date: Fri, 19 May 2000 20:48:15 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <3925BB00.B1CDDFE7@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 19 May 2000, Jeff Garzik wrote: > > I don't see the semantic problem here. > > This was the recommended way to use the timer routines. If the semantics > > have changed, there should be new names for the changed semantics. > > There doesn't seem to be anything in 2.2.x to prevent this sort of race > at del_timer time. It always seemed to me like a driver-specific wait > queue was needed for certain points in the close() process, like this. There is no "wait queue" that can cover broken semantics. The expected semantics must be "remove this timer from the kernel timer control". After calling del_timer(&timer), - kfree(timer.data) is safe - a module with the timer.function() code may be immediately removed. It's possible for each driver to add locks so that #1 is true, but there is no work-around if #2 does not hold true. The timer function might have released its lock, but still be exiting. Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Fri May 19 19:23:28 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K2NSm05476 for netdev-outgoing; Fri, 19 May 2000 19:23:28 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from cyberus.ca (mail.cyberus.ca [209.195.95.1]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K2NRU05473 for ; Fri, 19 May 2000 19:23:27 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id WAA05599; Fri, 19 May 2000 22:23:27 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id WAA17193; Fri, 19 May 2000 22:23:27 -0400 (EDT) Date: Fri, 19 May 2000 22:23:27 -0400 (EDT) From: jamal To: davem@redhat.com, kuznet@ms2.inr.ac.ru, Andi Kleen , SteveW@ACM.org cc: netdev@oss.sgi.com Subject: [PATCH]: Cruft fixes Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-758783491-958789407=:17134" Sender: owner-netdev@oss.sgi.com Precedence: bulk This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---559023410-758783491-958789407=:17134 Content-Type: TEXT/PLAIN; charset=US-ASCII This patch removes a bit of cruft. Please double check. cheers, jamal ---559023410-758783491-958789407=:17134 Content-Type: APPLICATION/octet-stream; name="cruft-patch.gz" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="cruft-patch.gz" H4sICPz0JTkAA2NydWZ0LXBhdGNoAJ2UbW+iQBDHX8OnmOTeqEAFH6jFXKJp bWvOYtPi9eWGLoslpYuBxfNe3He/WSDaVrQPYnY2u/Nn5jezrGEYsErZoM2Z aEertV0MJEvo8zLO2QlVbhION/5fsPpgdZxOz+n2oGOapqppWiE1OPvzBXlv UMpHIzD6A/0UtGIcjVRQCKEJz4TPBXkSOGtMvGtyS6a3v+2mDujgLmYzXZG/ dgvGcQwBW0eUZdBq424RO6VrXTWUxjqJglbT0lWtVFVqFf4NVVCNd9w9HEiS i1Uu9pPuOt3uYeYPpD17x2uZVl8SV/Y4c1PfAtfDfoSqH2T109V+qhb+j1HW ijrb0hR8Vs8u+Ep7jG989xqwlhADfqebAaPS+CEpZ/tJ9x2c1JN+Rtw9e0V8 Zp8WxKU9Rnzhjsmd9xF0wEmKh4l9q7k0SRkirPey7lvb724P+bDIdsxXzR30 zyRpaSSofH5EYcBCOJ+7l9Mr4k48cv1ALmfzB1zx7uYz1cBCiIiCL5KXiBIB GBKDkSBNVquIL+EnjL35zfScTN2p1zCbQ1X7gi9Ur895Fi05CyBO0K/ShZS8 +Nkzyiz0POSyScIQXUz5MowS4921C437cgH372+nLpnNz3+RhSvN5GK4K37E aZwHrB1HPN+0Sy329OTpfUlPHatT04dP6weOZe1aYuM1YoNmb28TthEs5SDP jKJUCDlP2TLKcANpGhEX8BgJWbrKGVe2vi/+hjz69DlOljuHN4VT3hYOu1V5 bZumvOtafaSMCdkcXG9kIs1pUXFSYkMLrQ4166VCJr9/8C7H93jiFt6kNl6I yuLDGqr/AaocK170BgAA ---559023410-758783491-958789407=:17134-- From owner-netdev@oss.sgi.com Fri May 19 19:32:38 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K2WcD05508 for netdev-outgoing; Fri, 19 May 2000 19:32:38 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from vaio.greennet ([206.24.4.33]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K2WaU05505 for ; Fri, 19 May 2000 19:32:37 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id WAA05881; Fri, 19 May 2000 22:35:57 -0400 Date: Fri, 19 May 2000 22:35:57 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Jeff Garzik cc: Andrew Morton , "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: <3925A0F3.D3184229@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, 19 May 2000, Jeff Garzik wrote: > > On Fri, 19 May 2000, Andrew Morton wrote: > > > > > A number of drivers do this: > > > > > > start_xmit() > > > { > > > netif_stop_queue() > > > ... > > > if (room for another packet) > > > netif_wake_queue() > > > ... > > > } > > > > > > I suspect this is a simple port from the dev->tbusy days. > > Not only is it simple, it's wrong wrong wrong... > Andrew (or anyone else) -- do you know any of existing drivers which use > this broken logic? > > That logic was used in a few of my softnet conversions, but was quickly > replaced with bug free and more correct code. The drivers still "exist", even if your copy has been updated. We are not claiming that bad code can't be changed, just that it was bad code that replacing good. > The only exceptions are > going to be older NICs which only support a single Tx packet buffer > (3c501 is like this?). The 3c501 is never ready to transmit again. It's rarely even ready to receive. A better approach for primative hardware is for the driver to keep its own Tx queue of a few packets ready to transmit. But very few devices need this. > There should not be a problem creating drivers for both 2.2 and 2.3 once > broken code such as the above is corrected. The interface shouldn't have been changed without first testing if it could be implemented correctly. Merely guessing that design is reasonable isn't the same as demonstrating that it works. The drivers have been able to be mostly backwards compatible for a long time. That demonstrates that the original interface was reasonable. It should also create a minimum requirement for interface changes: "you must be *this* much better to justify a change". I'm not averse to having drivers depend on new interface features, when the new features have offered significant benefits. An obvious case is using skb_reserve() to improve cache performance. The eepro100 driver would require a new structure if it couldn't use skb_reserve(), and thus it is not backwards compatible to old kernels. > > The Mindcraft "benchmark" is superficially obvious, but the big network > > difference was that they were apparently using the TCP/IP checksum hardware > > on the i82559. This has far more effect on SMP performance than anything > > else that was done. We didn't even find out that the chip had the feature > > until months later, and still don't have the documentation on how to use > > it. > > Has anyone tried to get NDA'd doc from Intel? Andrey? Donald? Intel is much more open with documentation than it used to be, but it's still not easy. Getting the first i82557 documentation took six months to negotiate a NDA that would allow releasing the driver. I've visited Intel in Portland three times on my own $$. I though that I would get the i82559 docs when I met with them in November, but it didn't happen. I visited the Intel people in Santa Clara earlier this month: they are much more open, but they develop things based on the 21*4* network hardware, not the i8255* chips. > Some companies are reluctant to give out databooks, but in my experience > very few of those companies are in turn reluctant to give out reference > source code and databooks under an NDA which allows for open source > development. Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Fri May 19 21:27:23 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K4RNK05850 for netdev-outgoing; Fri, 19 May 2000 21:27:23 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from saw.sw.com.sg (saw.sw.com.sg [203.120.9.98]) by oss.sgi.com (8.10.1/8.10.1) with SMTP id e4K4RMU05847 for ; Fri, 19 May 2000 21:27:22 -0700 Received: (qmail 7729 invoked by uid 577); 20 May 2000 04:27:15 -0000 Message-ID: <20000520122715.A7682@saw.sw.com.sg> Date: Sat, 20 May 2000 12:27:15 +0800 From: Andrey Savochkin To: Donald Becker Cc: netdev@oss.sgi.com, Jeff Garzik , Andrew Morton Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: ; from "Donald Becker" on Fri, May 19, 2000 at 08:48:15PM Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, May 19, 2000 at 08:48:15PM -0400, Donald Becker wrote: > On Fri, 19 May 2000, Jeff Garzik wrote: > > > > I don't see the semantic problem here. > > > This was the recommended way to use the timer routines. If the semantics > > > have changed, there should be new names for the changed semantics. > > > > There doesn't seem to be anything in 2.2.x to prevent this sort of race > > at del_timer time. It always seemed to me like a driver-specific wait > > queue was needed for certain points in the close() process, like this. > > There is no "wait queue" that can cover broken semantics. > > The expected semantics must be "remove this timer from the kernel timer > control". > After calling del_timer(&timer), > - kfree(timer.data) is safe > - a module with the timer.function() code may be immediately removed. > > It's possible for each driver to add locks so that #1 is true, but there is > no work-around if #2 does not hold true. The timer function might have > released its lock, but still be exiting. #2 is not true. I suppose, del_timer() call should be wrapped by start_bh_atomic/end_bh_atomic pair which provides global serialization against BHs. Best regards Andrey From owner-netdev@oss.sgi.com Fri May 19 21:33:46 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K4Xk605867 for netdev-outgoing; Fri, 19 May 2000 21:33:46 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from saw.sw.com.sg (saw.sw.com.sg [203.120.9.98]) by oss.sgi.com (8.10.1/8.10.1) with SMTP id e4K4XjU05864 for ; Fri, 19 May 2000 21:33:45 -0700 Received: (qmail 7754 invoked by uid 577); 20 May 2000 04:33:39 -0000 Message-ID: <20000520123339.B7682@saw.sw.com.sg> Date: Sat, 20 May 2000 12:33:39 +0800 From: Andrey Savochkin To: Jeff Garzik Cc: "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <3925A0F3.D3184229@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <3925A0F3.D3184229@mandrakesoft.com>; from "Jeff Garzik" on Fri, May 19, 2000 at 04:15:47PM Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, May 19, 2000 at 04:15:47PM -0400, Jeff Garzik wrote: > Has anyone tried to get NDA'd doc from Intel? Andrey? Donald? > > Some companies are reluctant to give out databooks, but in my experience > very few of those companies are in turn reluctant to give out reference > source code and databooks under an NDA which allows for open source > development. First of all, we may try to use hardware checksumming without NDA documentation :-) Andrey From owner-netdev@oss.sgi.com Fri May 19 22:22:07 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K5M7e05944 for netdev-outgoing; Fri, 19 May 2000 22:22:07 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from ertpg14e1.nortelnetworks.com (ertpg14e1.nortelnetworks.com [47.234.0.35]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K5M6U05941 for ; Fri, 19 May 2000 22:22:06 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by ertpg14e1.nortelnetworks.com; Sat, 20 May 2000 01:20:55 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJZJMQ; Sat, 20 May 2000 13:20:50 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY560W5; Sat, 20 May 2000 15:20:51 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id PAA09803; Sat, 20 May 2000 15:20:45 +1000 Message-ID: <39262113.19447850@uow.edu.au> Date: Sat, 20 May 2000 15:22:27 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Andrey Savochkin CC: Donald Becker , netdev@oss.sgi.com, Jeff Garzik Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; from "Donald Becker" on Fri, May 19, 2000 at 08:48:15PM <20000520122715.A7682@saw.sw.com.sg> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Andrey Savochkin wrote: > > I suppose, del_timer() call should be wrapped by > start_bh_atomic/end_bh_atomic pair which provides global serialization > against BHs. Those functions aren't present in 2.3... I guess they were unneeded because all the old-style BH's are sequentially run from within a single new-style tasklet, under global_bh_lock. So the old-style handlers are globally serialised. I have just written a little kernel module which has confirmed that the handler-keeps-running-after-del_timer bug exists in both 2.2.14 and 2.3.99-pre9. Not good. Very not good, IMO. I think I have a fix which will preserve the current (completely magical) timer semantics. It's ugly as sin, but that's the price. Later today, perhaps. The test module is at http://www.uow.edu.au/~andrewm/timertest.tar.gz ( I wanna know why my kernel thread shows up in ps as "insmod timertest.o" but everyone else's has nifty names like "[kflushd]" ) -- -akpm- From owner-netdev@oss.sgi.com Fri May 19 22:34:14 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K5YEl05993 for netdev-outgoing; Fri, 19 May 2000 22:34:14 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K5YEU05990 for ; Fri, 19 May 2000 22:34:14 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id WAA18905; Fri, 19 May 2000 22:24:39 -0700 Date: Fri, 19 May 2000 22:24:39 -0700 Message-Id: <200005200524.WAA18905@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: andrewm@uow.edu.au CC: saw@saw.sw.com.sg, becker@scyld.com, netdev@oss.sgi.com, jgarzik@mandrakesoft.com In-reply-to: <39262113.19447850@uow.edu.au> (message from Andrew Morton on Sat, 20 May 2000 15:22:27 +1000) Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; from "Donald Becker" on Fri, May 19, 2000 at 08:48:15PM <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Date: Sat, 20 May 2000 15:22:27 +1000 From: Andrew Morton I have just written a little kernel module which has confirmed that the handler-keeps-running-after-del_timer bug exists in both 2.2.14 and 2.3.99-pre9. Not good. Very not good, IMO. I just noticed this thread, and has del_timer_sync been mentioned yet? That is what should be used to make sure the timer is done in 2.3.x, unless something else prevents it's usage (locking conflict). ( I wanna know why my kernel thread shows up in ps as "insmod timertest.o" but everyone else's has nifty names like "[kflushd]" ) sprintf(current->comm, "nifty name"); Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri May 19 22:36:04 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K5a4s06003 for netdev-outgoing; Fri, 19 May 2000 22:36:04 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from saw.sw.com.sg (saw.sw.com.sg [203.120.9.98]) by oss.sgi.com (8.10.1/8.10.1) with SMTP id e4K5a3U06000 for ; Fri, 19 May 2000 22:36:03 -0700 Received: (qmail 8211 invoked by uid 577); 20 May 2000 05:35:57 -0000 Message-ID: <20000520133557.A8149@saw.sw.com.sg> Date: Sat, 20 May 2000 13:35:57 +0800 From: Andrey Savochkin To: Andrew Morton Cc: Donald Becker , netdev@oss.sgi.com, Jeff Garzik Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <39262113.19447850@uow.edu.au>; from "Andrew Morton" on Sat, May 20, 2000 at 03:22:27PM Sender: owner-netdev@oss.sgi.com Precedence: bulk On Sat, May 20, 2000 at 03:22:27PM +1000, Andrew Morton wrote: > Andrey Savochkin wrote: > > > > I suppose, del_timer() call should be wrapped by > > start_bh_atomic/end_bh_atomic pair which provides global serialization > > against BHs. > > Those functions aren't present in 2.3... I guess they were unneeded > because all the old-style BH's are sequentially run from within a single > new-style tasklet, under global_bh_lock. So the old-style handlers are > globally serialised. I was addressing the remarks about 2.2 kernels. We need to serialize the timer handler with outside code (which runs not in BH or IRQ context). For 2.2 del_timer call should be in bh_atomic section. In 2.3 kernels we no loner have bh_atomic stuff, but we have del_timer_sync function. del_timer_sync solves most serialization problems, but not with module unloading and freeing the memory where the code resides. So, 2.2 kernels have a perfect solution which hasn't been widely deployed yet. 2.3 kernels are really problematic ones. > I have just written a little kernel module which has confirmed that the > handler-keeps-running-after-del_timer bug exists in both 2.2.14 and > 2.3.99-pre9. Not good. Very not good, IMO. > > I think I have a fix which will preserve the current (completely > magical) timer semantics. It's ugly as sin, but that's the price. > Later today, perhaps. > > The test module is at http://www.uow.edu.au/~andrewm/timertest.tar.gz > > ( I wanna know why my kernel thread shows up in ps as "insmod > timertest.o" but everyone else's has nifty names like "[kflushd]" ) Because you haven't replaced the process name. Look at, for example, kswapd start in mm/vmscan.c Best regards Andrey From owner-netdev@oss.sgi.com Fri May 19 22:38:14 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K5cEm06013 for netdev-outgoing; Fri, 19 May 2000 22:38:14 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from saw.sw.com.sg (saw.sw.com.sg [203.120.9.98]) by oss.sgi.com (8.10.1/8.10.1) with SMTP id e4K5cCU06010 for ; Fri, 19 May 2000 22:38:13 -0700 Received: (qmail 8237 invoked by uid 577); 20 May 2000 05:38:07 -0000 Message-ID: <20000520133807.B8149@saw.sw.com.sg> Date: Sat, 20 May 2000 13:38:07 +0800 From: Andrey Savochkin To: "David S. Miller" Cc: becker@scyld.com, netdev@oss.sgi.com, jgarzik@mandrakesoft.com, andrewm@uow.edu.au Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au> <200005200524.WAA18905@pizda.ninka.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <200005200524.WAA18905@pizda.ninka.net>; from "David S. Miller" on Fri, May 19, 2000 at 10:24:39PM Sender: owner-netdev@oss.sgi.com Precedence: bulk On Fri, May 19, 2000 at 10:24:39PM -0700, David S. Miller wrote: > Date: Sat, 20 May 2000 15:22:27 +1000 > From: Andrew Morton > > I have just written a little kernel module which has confirmed that the > handler-keeps-running-after-del_timer bug exists in both 2.2.14 and > 2.3.99-pre9. Not good. Very not good, IMO. > > I just noticed this thread, and has del_timer_sync been mentioned yet? > That is what should be used to make sure the timer is done in 2.3.x, > unless something else prevents it's usage (locking conflict). del_timer_sync doesn't ensure that the timer has really exited (as opposite to just calling timer_exit()). We cannot free the code segment where the timer handler resides even with del_timer_sync! Andrey From owner-netdev@oss.sgi.com Fri May 19 22:52:55 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K5qtm06092 for netdev-outgoing; Fri, 19 May 2000 22:52:55 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from smtprch2.nortel.com (smtprch2.nortelnetworks.com [192.135.215.15]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K5qtU06089 for ; Fri, 19 May 2000 22:52:55 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Sat, 20 May 2000 00:49:26 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LH75VYRG; Sat, 20 May 2000 00:52:16 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY560XC; Sat, 20 May 2000 15:52:16 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id PAA09951; Sat, 20 May 2000 15:52:16 +1000 Message-ID: <39262875.B380A44E@uow.edu.au> Date: Sat, 20 May 2000 15:53:57 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: Donald Becker , "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <3925A0F3.D3184229@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Jeff Garzik wrote: > > > > It would seem to be more sensible to do > > > > > > start_xmit() > > > { > > > ... > > > if (!room for another packet) > > > netif_stop_queue() > > > } > > > > This doesn't give us a way to set dev->tbusy, which is required for all > > pre-2.3 kernels. > > Why not? Maybe I am missing something. A 2.2.x implementation of > stop_queue should set dev->tbusy's bit 0. That's what acenic and my > kcompat software both do. What I meant was, if the 2.2 driver did this: start_xmit() { dev->tbusy = 1; ... } and the 2.3 driver does this: start_xmit() { netif_stop_queue(); ... } then the simple #define obviously perserves compatibility. But once we remove the up-front netif_stop_queue() (which is quite legit because some of the tbusy functionality is handled by dev->xmit_lock in 2.3) then we have nowhere to put the 'tbusy = 1' macro. Do we really think it's worth preserving the compatibility? I mean, the 2.3 series is where we place actual algorithmic enhancements, but doing this in 2.2 is politically incorrect - 2.2 is supposed to be bugfixes only (I think). So they diverge __by design__. If you want to put some nifty new feature into your 2.3 driver and it's also a 2.2 driver then you have a problem? This approach places the onus on maintainers to be familiar with both series, and to concientiously feed bugfixes back to 2.2 (and watch Alan fumble them :)), but that's not too hard. Which drivers are using kcompat24? -- -akpm- From owner-netdev@oss.sgi.com Fri May 19 23:29:57 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K6Tvs06602 for netdev-outgoing; Fri, 19 May 2000 23:29:57 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from smtprch1.nortel.com (smtprch1.nortelnetworks.com [192.135.215.14]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K6TuU06599 for ; Fri, 19 May 2000 23:29:56 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by smtprch1.nortel.com; Sat, 20 May 2000 01:22:54 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJZKSM; Sat, 20 May 2000 14:22:28 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY560XH; Sat, 20 May 2000 16:22:29 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id QAA10092; Sat, 20 May 2000 16:22:26 +1000 Message-ID: <39262F88.5107CEA8@uow.edu.au> Date: Sat, 20 May 2000 16:24:08 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Andrey Savochkin CC: "David S. Miller" , becker@scyld.com, netdev@oss.sgi.com, jgarzik@mandrakesoft.com Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au> <200005200524.WAA18905@pizda.ninka.net>, <200005200524.WAA18905@pizda.ninka.net>; from "David S. Miller" on Fri, May 19, 2000 at 10:24:39PM <20000520133807.B8149@saw.sw.com.sg> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1878 Lines: 60 Andrey Savochkin wrote: > > On Fri, May 19, 2000 at 10:24:39PM -0700, David S. Miller wrote: > > Date: Sat, 20 May 2000 15:22:27 +1000 > > From: Andrew Morton > > > > I have just written a little kernel module which has confirmed that the > > handler-keeps-running-after-del_timer bug exists in both 2.2.14 and > > 2.3.99-pre9. Not good. Very not good, IMO. > > > > I just noticed this thread, and has del_timer_sync been mentioned yet? > > That is what should be used to make sure the timer is done in 2.3.x, > > unless something else prevents it's usage (locking conflict). > > del_timer_sync doesn't ensure that the timer has really exited (as opposite > to just calling timer_exit()). > We cannot free the code segment where the timer handler resides even > with del_timer_sync! yes, there's that. Plus del_timer_sync() is a bit racy anyway. Quoting myself: int del_timer_sync(struct timer_list * timer) { int ret = 0; for (;;) { unsigned long flags; int running; spin_lock_irqsave(&timerlist_lock, flags); ** The timer handler could be running now. It can delete the timer and kfree it, or reuse its memory for something else, or turn it into a semantically different timer ** ret += detach_timer(timer); timer->list.next = timer->list.prev = 0; ** uh-oh ** Plus it requires exit_timer(), plus it doesn't exist in 2.2 :( Regarding this: ( I wanna know why my kernel thread shows up in ps as "insmod timertest.o" but everyone else's has nifty names like "[kflushd]" ) sprintf(current->comm, "nifty name"); Yes, I did that. My niftyname is "timertest". And indeed, 'killall timertest' works, but 'ps' still shows it as 'insmod timertest.o'. 2.2 as well as 2.3. Not a very high priority item. -- -akpm- From owner-netdev@oss.sgi.com Sat May 20 00:07:54 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K77sc07149 for netdev-outgoing; Sat, 20 May 2000 00:07:54 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from vaio.greennet (adsl-151-196-242-17.bellatlantic.net [151.196.242.17]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K77pU07146 for ; Sat, 20 May 2000 00:07:52 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id DAA07660; Sat, 20 May 2000 03:11:14 -0400 Date: Sat, 20 May 2000 03:11:14 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: Andrey Savochkin cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <20000520122715.A7682@saw.sw.com.sg> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1937 Lines: 45 On Sat, 20 May 2000, Andrey Savochkin wrote: > On Fri, May 19, 2000 at 08:48:15PM -0400, Donald Becker wrote: > > On Fri, 19 May 2000, Jeff Garzik wrote: > > > > > > I don't see the semantic problem here. > > > > This was the recommended way to use the timer routines. If the semantics > > > > have changed, there should be new names for the changed semantics. > > > > > > There doesn't seem to be anything in 2.2.x to prevent this sort of race > > > at del_timer time. It always seemed to me like a driver-specific wait > > > queue was needed for certain points in the close() process, like this. > > > > There is no "wait queue" that can cover broken semantics. > > > > The expected semantics must be "remove this timer from the kernel timer > > control". > > After calling del_timer(&timer), > > - kfree(timer.data) is safe > > - a module with the timer.function() code may be immediately removed. > > > > It's possible for each driver to add locks so that #1 is true, but there is > > no work-around if #2 does not hold true. The timer function might have > > released its lock, but still be exiting. > > #2 is not true. > I suppose, del_timer() call should be wrapped by > start_bh_atomic/end_bh_atomic pair which provides global serialization > against BHs. I stand corrected -- this does appear to be a valid work-around in 2.2+SMP to avoid the timer handler running when we call del_timer(). But you will agree that it's a ugly hack that shouldn't need to be done by driver code. It's also potentially inefficient -- del_timer() doesn't need to protect against arbitrary BHes, only against the timer BH or just our specific timer. I had been assuming that the 2.2+SMP the behavior was the del_timer() was always safe in netdevice->stop(), since netdevice->stop() was protected by the "big kernel lock". Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Sat May 20 00:22:25 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K7MP607252 for netdev-outgoing; Sat, 20 May 2000 00:22:25 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from vaio.greennet (adsl-151-196-242-17.bellatlantic.net [151.196.242.17]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K7MNU07249 for ; Sat, 20 May 2000 00:22:23 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id DAA07666; Sat, 20 May 2000 03:25:54 -0400 Date: Sat, 20 May 2000 03:25:53 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <200005200524.WAA18905@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1096 Lines: 27 On Fri, 19 May 2000, David S. Miller wrote: > I just noticed this thread, and has del_timer_sync been mentioned yet? > That is what should be used to make sure the timer is done in 2.3.x, > unless something else prevents it's usage (locking conflict). Yes, it was used as an example of badness. "Something broke with an interface. It should have been obvious that it broke, if for no other reason than a new function had to be created that did what the old function used to do." The new interface/semantics should have been named del_timer_async(), with del_timer() being the synchronous version. Given the questionable semantics of the _async() version, I doubt that there would have been much of a demand to use it. Backwards compatibility is pointlessly made more painful by such a change. It's almost as bad as changing a interface function's argument count without changing the name, or changing the element order of a structure that is commonly statically initialized. Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Sat May 20 00:40:55 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K7etO07364 for netdev-outgoing; Sat, 20 May 2000 00:40:55 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K7esU07361 for ; Sat, 20 May 2000 00:40:54 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id AAA19020; Sat, 20 May 2000 00:31:21 -0700 Date: Sat, 20 May 2000 00:31:21 -0700 Message-Id: <200005200731.AAA19020@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: becker@scyld.com CC: netdev@oss.sgi.com In-reply-to: (message from Donald Becker on Sat, 20 May 2000 03:25:53 -0400 (EDT)) Subject: Re: tx_timeout and timer serialisation References: Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 451 Lines: 13 Date: Sat, 20 May 2000 03:25:53 -0400 (EDT) From: Donald Becker Backwards compatibility is pointlessly made more painful by such a change. It's almost as bad as changing a interface function's argument count without changing the name, or changing the element order of a structure that is commonly statically initialized. del_timer deschedules a timer, no more, no less. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sat May 20 00:51:12 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K7pCk07417 for netdev-outgoing; Sat, 20 May 2000 00:51:12 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K7pCU07414 for ; Sat, 20 May 2000 00:51:12 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id DAA12652; Sat, 20 May 2000 03:50:30 -0400 Message-ID: <392643C6.3B2784A7@mandrakesoft.com> Date: Sat, 20 May 2000 03:50:30 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: davem@redhat.com, kuznet@ms2.inr.ac.ru, Andi Kleen , SteveW@ACM.org, netdev@oss.sgi.com Subject: Re: [PATCH]: Cruft fixes References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 351 Lines: 15 jamal wrote: > > This patch removes a bit of cruft. Please double check. These structs could probably be moved over to the new way of initializing structs... Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Sat May 20 01:24:05 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K8O5t07611 for netdev-outgoing; Sat, 20 May 2000 01:24:05 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from laurin.munich.netsurf.de (laurin.munich.netsurf.de [194.64.166.1]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K8O3U07608 for ; Sat, 20 May 2000 01:24:04 -0700 Received: from fred.muc.de (none@ns1196.munich.netsurf.de [195.180.235.196]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id KAA23258; Sat, 20 May 2000 10:23:46 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12t4Y4-00019e-00; Sat, 20 May 2000 10:23:52 +0200 Date: Sat, 20 May 2000 10:23:52 +0200 From: Andi Kleen To: jamal Cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, Andi Kleen , SteveW@ACM.org, netdev@oss.sgi.com Subject: Re: [PATCH]: Cruft fixes Message-ID: <20000520102352.A4219@fred.muc.de> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: ; from jamal on Sat, May 20, 2000 at 04:23:31AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 649 Lines: 17 On Sat, May 20, 2000 at 04:23:31AM +0200, jamal wrote: > > > This patch removes a bit of cruft. Please double check. Sorry, my original judgement was wrong. The RX path uses it to distingush between old protocols that don't support SMP multithreading yet and new multithreaded protocols. Unfortunately this breaks the fastroute packet socket detection hack as you noted. The right fix is to add a flag word to packet_type that distingushes packet sockets and non locked protocols, and use that instead of the overloaded data member. Such a flag word would fit nicely between type and dev, there should be a 2 bytes gap there anyways. -Andi From owner-netdev@oss.sgi.com Sat May 20 01:28:04 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K8S4d07636 for netdev-outgoing; Sat, 20 May 2000 01:28:04 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from vaio.greennet (adsl-151-196-242-17.bellatlantic.net [151.196.242.17]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K8S2U07633 for ; Sat, 20 May 2000 01:28:02 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id EAA07960; Sat, 20 May 2000 04:31:33 -0400 Date: Sat, 20 May 2000 04:31:33 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <200005200731.AAA19020@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 817 Lines: 24 On Sat, 20 May 2000, David S. Miller wrote: > del_timer deschedules a timer, no more, no less. That's a little too trite of an answer to be useful. Immediately after the return of del_timer(&timer), I expect - kfree(timer.data) is safe - the timer.function() code may be immediately be 'rmmod'ed Specifically, I expect that if the timer.function() will not be running on another processor when del_timer(&timer) returns. If this doesn't hold true, the timer.function() might still be in the function entry phase and entire timer.function() could be run after del_timer() has nominally finished. And if that's acceptable behavior, I have a new version of del_timer() that's very small and fast ;->. Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Sat May 20 02:47:03 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4K9l3308065 for netdev-outgoing; Sat, 20 May 2000 02:47:03 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4K9l2U08062 for ; Sat, 20 May 2000 02:47:02 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id CAA19089; Sat, 20 May 2000 02:37:31 -0700 Date: Sat, 20 May 2000 02:37:31 -0700 Message-Id: <200005200937.CAA19089@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: becker@scyld.com CC: netdev@oss.sgi.com In-reply-to: (message from Donald Becker on Sat, 20 May 2000 04:31:33 -0400 (EDT)) Subject: Re: tx_timeout and timer serialisation References: Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 575 Lines: 16 Date: Sat, 20 May 2000 04:31:33 -0400 (EDT) From: Donald Becker And if that's acceptable behavior, I have a new version of del_timer() that's very small and fast ;->. How do you accomplish this and still respect the environment of the timer function itself, ie. that the timer is not scheduled and outside of self-inflicted locking issues the timer function may add it's timer. Does your solution involve holding the timer list lock during the timer function invocation? If so, wait for troubles... Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Sat May 20 08:41:30 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KFfUm10510 for netdev-outgoing; Sat, 20 May 2000 08:41:30 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from smtprch2.nortel.com (smtprch2.nortelnetworks.com [192.135.215.15]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4KFfTn10507 for ; Sat, 20 May 2000 08:41:29 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Sat, 20 May 2000 06:22:46 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LH75V5B6; Sat, 20 May 2000 06:25:36 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY560YY; Sat, 20 May 2000 21:25:36 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id VAA11317; Sat, 20 May 2000 21:25:36 +1000 Message-ID: <39267696.ACCE4DF3@uow.edu.au> Date: Sat, 20 May 2000 21:27:18 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Andrey Savochkin CC: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au>, <39262113.19447850@uow.edu.au>; from "Andrew Morton" on Sat, May 20, 2000 at 03:22:27PM <20000520133557.A8149@saw.sw.com.sg> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 486 Lines: 15 Andrey Savochkin wrote: > > We need to serialize the timer handler with outside code (which runs not in > BH or IRQ context). For 2.2 del_timer call should be in bh_atomic section. hmm.. I'm not too familiar with that part of 2.2. Could you please cook up a patch to show me what you mean, then I'll torture it a bit. Does the BH synchronisation work within a BH or an IRQ? What happens if we call del_timer within a BH or IRQ when the handler is currently running? -- -akpm- From owner-netdev@oss.sgi.com Sat May 20 10:30:31 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KHUVI11081 for netdev-outgoing; Sat, 20 May 2000 10:30:31 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4KHUUn11078 for ; Sat, 20 May 2000 10:30:30 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id NAA26922; Sat, 20 May 2000 13:30:21 -0400 Message-ID: <3926CBC8.EABB157@mandrakesoft.com> Date: Sat, 20 May 2000 13:30:48 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <392407D4.BE586507@uow.edu.au> <3925BA01.CC49AC1@mandrakesoft.com> <3925D985.51472B30@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1050 Lines: 38 Andrew Morton wrote: > Jeff Garzik wrote: > > Which [drivers have wrong softnet logic]? They need fixing.. > 3c50?.c. Probably others... > > I'm disinclined to change these: > > - They're slow anyway. > - Increased possiblity of breaking them > - Broken 2.2/kcompat24 compatibility I should note that drivers with the logic in question (start_xmit: stop... start if not full) caused transmit timeouts and other nasties when used in modern PCI drivers. That's why I consider such logic a bug, not just a correctness or performance issue. > So. > > - No performance tweaks for [E]ISA drivers in 2.3. > - Correctness fixes if they're obvious. > - If the SMP-safety fixes are not obvious, mark the driver as UP-only. > > Sound sensible? * highly agree with the first * and the second * I think we should go through drivers and mark them SMP-safe... :) Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Sat May 20 10:36:31 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KHaVA11159 for netdev-outgoing; Sat, 20 May 2000 10:36:31 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from web124.yahoomail.com (web124.yahoomail.com [205.180.60.192]) by oss.sgi.com (8.10.1/8.10.1) with SMTP id e4KHaVn11156 for ; Sat, 20 May 2000 10:36:31 -0700 Received: (qmail 22755 invoked by uid 60001); 20 May 2000 17:36:24 -0000 Message-ID: <20000520173624.22754.qmail@web124.yahoomail.com> Received: from [156.153.255.134] by web124.yahoomail.com; Sat, 20 May 2000 10:36:24 PDT Date: Sat, 20 May 2000 10:36:24 -0700 (PDT) From: Cacophonix Subject: Re: Tx queueing To: jamal Cc: netdev@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 444 Lines: 15 --- jamal wrote: > > Amy Fong and I have done some tests with much higher throughputs on 2.3 > with the GA-620. What was the h/ware you used? What was the MTU you used? > PIII/500, 32 bit PCI, 1500 byte MTU. Vanilla driver (i.e, none of the driver parameters changed). __________________________________________________ Do You Yahoo!? Send instant messages & get email alerts with Yahoo! Messenger. http://im.yahoo.com/ From owner-netdev@oss.sgi.com Sat May 20 10:37:20 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KHbK011167 for netdev-outgoing; Sat, 20 May 2000 10:37:20 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from cyberus.ca (mail.cyberus.ca [209.195.95.1]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4KHbJn11164 for ; Sat, 20 May 2000 10:37:19 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id NAA03254; Sat, 20 May 2000 13:37:17 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id NAA17728; Sat, 20 May 2000 13:37:13 -0400 (EDT) Date: Sat, 20 May 2000 13:37:13 -0400 (EDT) From: jamal To: Andi Kleen cc: davem@redhat.com, kuznet@ms2.inr.ac.ru, SteveW@ACM.org, netdev@oss.sgi.com Subject: Re: [PATCH]: Cruft fixes In-Reply-To: <20000520102352.A4219@fred.muc.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 865 Lines: 30 Thanks Andi, Please ignore the patch; I'll send a fresh one later which i'll probably get killed for ;-> We shall see. cheers, jamal On Sat, 20 May 2000, Andi Kleen wrote: > On Sat, May 20, 2000 at 04:23:31AM +0200, jamal wrote: > > > > > > This patch removes a bit of cruft. Please double check. > > Sorry, my original judgement was wrong. The RX path uses it to distingush > between old protocols that don't support SMP multithreading yet and new > multithreaded protocols. Unfortunately this breaks the fastroute packet > socket detection hack as you noted. > > The right fix is to add a flag word to packet_type that distingushes > packet sockets and non locked protocols, and use that instead of the > overloaded data member. Such a flag word would fit nicely between > type and dev, there should be a 2 bytes gap there anyways. > > > -Andi > From owner-netdev@oss.sgi.com Sat May 20 11:11:51 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KIBpW11527 for netdev-outgoing; Sat, 20 May 2000 11:11:51 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4KIBon11524 for ; Sat, 20 May 2000 11:11:50 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id OAA28112; Sat, 20 May 2000 14:11:41 -0400 Message-ID: <3926D55D.766781B2@mandrakesoft.com> Date: Sat, 20 May 2000 14:11:41 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: Andrey Savochkin , Donald Becker , netdev@oss.sgi.com, Alan Cox Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; from "Donald Becker" on Fri, May 19, 2000 at 08:48:15PM <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1630 Lines: 40 Andrew Morton wrote: > I have just written a little kernel module which has confirmed that the > handler-keeps-running-after-del_timer bug exists in both 2.2.14 and > 2.3.99-pre9. Not good. Very not good, IMO. This ties neatly together a public thread and a private thread. >From what I can gather, The timer semantics change which concerns Donald occurred when the new timers and SMP were written (2.1.?). In old 2.0 kernels, SMP in the kernel context didn't really matter due to the BKL-related synchronization. When the new timers and SMP came about in 2.1.x days, suddenly it was possible for a timer to be running on one CPU, after del_timer successfully returned. The 2.3.x timer->running change seems like not enough, because there is still a race between the time the function calls timer_exit(), and the time that the module can be unloaded. In order to guarantee an accurate timer_is_running() value, should timer_set_running() and timer_exit() instead be called from the core kernel code, instead of the driver? Whenever the code is in the driver, there will be a small race between timer_exit() time and the time when the timer function is actually complete. AFAICS from this, 2.2.x drivers might be exiting while their timer routine is still running. And 2.3.x drivers will do this too, until every one is updated to call timer_set_running, timer_exit, and to check timer_is_running. Is that a correct assessment? Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Sat May 20 11:38:23 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KIcNQ11853 for netdev-outgoing; Sat, 20 May 2000 11:38:23 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from havoc.gtf.org (IDENT:root@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4KIcMn11850 for ; Sat, 20 May 2000 11:38:22 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id OAA28726; Sat, 20 May 2000 14:38:14 -0400 Message-ID: <3926DBAE.92459AD5@mandrakesoft.com> Date: Sat, 20 May 2000 14:38:38 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: Donald Becker , "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <3925A0F3.D3184229@mandrakesoft.com> <39262875.B380A44E@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2959 Lines: 90 Andrew Morton wrote: > > Jeff Garzik wrote: > > > > > > It would seem to be more sensible to do > > > > > > > > start_xmit() > > > > { > > > > ... > > > > if (!room for another packet) > > > > netif_stop_queue() > > > > } > > > > > > This doesn't give us a way to set dev->tbusy, which is required for all > > > pre-2.3 kernels. > > > > Why not? Maybe I am missing something. A 2.2.x implementation of > > stop_queue should set dev->tbusy's bit 0. That's what acenic and my > > kcompat software both do. > > What I meant was, if the 2.2 driver did this: > > start_xmit() > { > dev->tbusy = 1; > ... > } > > and the 2.3 driver does this: > > start_xmit() > { > netif_stop_queue(); > ... > } > > then the simple #define obviously perserves compatibility. But once > we remove the up-front netif_stop_queue() (which is quite legit because > some of the tbusy functionality is handled by dev->xmit_lock in 2.3) > then we have nowhere to put the 'tbusy = 1' macro. tbusy=1 occurs at device open time, so the logic is preserved. For 2.2.x drivers though I don't think start_xmit is serialized (is it?), which would imply the need for testing something, or at least doing your own serialization via spinlocks. (2.2.x drivers did a test_and_set(&tbusy) IIRC) > Do we really think it's worth preserving the compatibility? I mean, the > 2.3 series is where we place actual algorithmic enhancements, but doing > this in 2.2 is politically incorrect - 2.2 is supposed to be bugfixes > only (I think). So they diverge __by design__. If you want to put some > nifty new feature into your 2.3 driver and it's also a 2.2 driver then > you have a problem? If a 2.3.x driver is proven stable and superior, people will naturally want to backport it. Heck people are still downloading Donald's latest stuff and trying to stuff it into their 2.0.x kernel. I'm not bending over backwards for 2.2.x compatibility, but I recognize that some people want to do so. > This approach places the onus on maintainers to be familiar with both > series, and to concientiously feed bugfixes back to 2.2 (and watch Alan > fumble them :)), but that's not too hard. FWIW I am not doing that for the corresponding drivers I maintain in 2.3.x. Anybody who reports a problem for 2.2.x rtl8139 or tulip will probably get a response, but not a patch unless things are really critical [and one of Donald's drivers doesn't already solve the problem]. There are a lot of changes in 2.3.x, and work yet to done, so my hands are full there... > Which drivers are using kcompat24? AFAIK, right now 8139too and i810_rng work with it, and emu10k1 (SB Live sound driver) uses a modified version of kcompat24 (see 2.2/emu_wrapper.*) Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Sat May 20 13:13:27 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KKDRf12433 for netdev-outgoing; Sat, 20 May 2000 13:13:27 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from smtprch2.nortel.com (smtprch2.nortelnetworks.com [192.135.215.15]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4KKDRn12430 for ; Sat, 20 May 2000 13:13:27 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Sat, 20 May 2000 15:10:28 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LH75V844; Sat, 20 May 2000 15:13:18 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY56057; Sun, 21 May 2000 06:13:15 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id GAA21429; Sun, 21 May 2000 06:13:10 +1000 Message-ID: <3926F23C.9FEB5E66@uow.edu.au> Date: Sun, 21 May 2000 06:14:52 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: Andrey Savochkin , Donald Becker , netdev@oss.sgi.com, Alan Cox Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; from "Donald Becker" on Fri, May 19, 2000 at 08:48:15PM <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au> <3926D55D.766781B2@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 5609 Lines: 172 Jeff Garzik wrote: > > Andrew Morton wrote: > > I have just written a little kernel module which has confirmed that the > > handler-keeps-running-after-del_timer bug exists in both 2.2.14 and > > 2.3.99-pre9. Not good. Very not good, IMO. > > This ties neatly together a public thread and a private thread. > > >From what I can gather, > The timer semantics change which concerns Donald occurred when the new > timers and SMP were written (2.1.?). In old 2.0 kernels, SMP in the > kernel context didn't really matter due to the BKL-related > synchronization. When the new timers and SMP came about in 2.1.x days, > suddenly it was possible for a timer to be running on one CPU, after > del_timer successfully returned. Sounds right. The issue now is that the accepted API for timers is *so magical* that it's very hard to fix them. - Timer handlers can readd timers - timer handlers can delete timers - timer handlers can kfree() the memory which contained the darn timer_struct. this means that neither del_timer() nor run_timer_list() is allowed to touch the timer_struct after (or even during) the handler run. I think the sanest thing to do is to separate out the timer storage management: alloc_timer(), free_timer(), register_timer, unregister_timer(). Botching sane handling into the current permissive API isn't proving pretty. I do have a mostly-fix coded up. It maintains state arrays: timer_struct *running_timers[NR_CPUS]; run_timer_list() { running_timers[this_cpu] = timer; timer.function(); running_timers[this_cpu] = 0; } del_timer() { if (timer_is_in(running_timers)) spin_until_it_isnt(); do_funky_locking() maybe_spin_again() } This is fairly sane for 2 CPUs, but there may be problems with three: CPU0 is running the handler CPU1 is spinning in del_timer CPU2 re-adds the timer I think I'm not detecting CPU2's action when the del_timer spin completes. We really, really need to be able to touch the timer_struct after the handler has run, so we can maintain state within it. Then we can manage the storage and refcount the timers. (As an old C++ hack, this sort of crap makes me want to scream. It would be _so_ easy...) > The 2.3.x timer->running change seems like not enough, because there is > still a race between the time the function calls timer_exit(), and the > time that the module can be unloaded. In order to guarantee an accurate > timer_is_running() value, should timer_set_running() and timer_exit() > instead be called from the core kernel code, instead of the driver? Yes it should. But the problem is, the timer_exit() function can't be performed by the core kernel because the handler could have kfreed()'d (or recycled) the memory which contains the timer. This is why run_timer_list() is careful to not touch the timer after calling the handler. And this is why my proto-patch maintans state outside the struct timer_list. > Whenever the code is in the driver, there will be a small race between > timer_exit() time and the time when the timer function is actually > complete. > > AFAICS from this, 2.2.x drivers might be exiting while their timer > routine is still running. And 2.3.x drivers will do this too, until > every one is updated to call timer_set_running, timer_exit, and to check > timer_is_running. > > Is that a correct assessment? Yes, but there are several other, more serious failure scenarios with the current code: mainline() handler() ========== ========= enter xxx_timer() del_timer() kfree(some_resource) access(some_resource) mainline() handler() ========== ========= enter xxx_timer() del_timer() kfree(timer) add_timer(timer) This one's the worst. The timer is pending, but the timer_struct is in kfree'ed memory. As Alexey would say, "Your kernel has been destroyed secretly and fatally". The current del_timer_sync() is close to being OK. But its problem is that after timer_synchronise() returns, the timer_struct may have been kfree'ed by the handler, and del_timer_sync then reads and writes it, potentially corrupting something it doesn't own. Anyway, I've put a very-proto-patch up at http://www.uow.edu.au/~andrewm/timer.patch and I have updated the timertest module: http://www.uow.edu.au/~andrewm/timertest.tar.gz (the kernel thread now shows up correctly in ps. I had to set current->mm->arg_start = current->mm->arg_end = 0; Thanks, Andrey!) timer.patch makes all del_timer()s synchronous. del_timer_sync() maps onto the synchronous del_timer(). exit_timer() is a no-op. It's ugly, not obviously correct, but the fastpath is still fast (could be faster). It does introduce the potential for deadlocks which Alexey identified: mainline() handler() ========== ========= spin_lock(some_lock) spin_lock(some_lock) del_timer() [ spins forever ] This is manageable, but has to be fixed on a case-by-case basis. The alternative to all of the above is to simply throw up our hands in horror and start again: - kernel maintains a freelist of struct new_timer_structs - new_timer_struct is refcounted - New API: new_timer_struct *alloc_timer() free_timer(new_timer_struct *) (refcounted) register_timer(new_timer_struct *) unregister_timer(new_timer_struct *) mnm:/usr/src/linux-2.3.99-pre9-2> egrep -r "add_timer|mod_timer" . | wc -l 696 Any volunteers? -- -akpm- From owner-netdev@oss.sgi.com Sat May 20 16:49:51 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KNnpm13491 for netdev-outgoing; Sat, 20 May 2000 16:49:51 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from cyberus.ca (mail.cyberus.ca [209.195.95.1]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4KNnon13488 for ; Sat, 20 May 2000 16:49:50 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id TAA07552; Sat, 20 May 2000 19:49:49 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id TAA18215; Sat, 20 May 2000 19:49:49 -0400 (EDT) Date: Sat, 20 May 2000 19:49:46 -0400 (EDT) From: jamal To: Jeff Garzik cc: Andrew Morton , Donald Becker , "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: <3926DBAE.92459AD5@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2449 Lines: 75 On Sat, 20 May 2000, Jeff Garzik wrote: > Andrew Morton wrote: > > > > Jeff Garzik wrote: > > > > > > > > It would seem to be more sensible to do > > > > > > > > > > start_xmit() > > > > > { > > > > > ... > > > > > if (!room for another packet) > > > > > netif_stop_queue() > > > > > } > > > > > > > > This doesn't give us a way to set dev->tbusy, which is required for all > > > > pre-2.3 kernels. > > > > > > Why not? Maybe I am missing something. A 2.2.x implementation of > > > stop_queue should set dev->tbusy's bit 0. That's what acenic and my > > > kcompat software both do. > > > > What I meant was, if the 2.2 driver did this: > > > > start_xmit() > > { > > dev->tbusy = 1; > > ... > > } > > > > and the 2.3 driver does this: > > > > start_xmit() > > { > > netif_stop_queue(); > > ... > > } > > > > then the simple #define obviously perserves compatibility. But once > > we remove the up-front netif_stop_queue() (which is quite legit because > > some of the tbusy functionality is handled by dev->xmit_lock in 2.3) > > then we have nowhere to put the 'tbusy = 1' macro. > > tbusy=1 occurs at device open time, so the logic is preserved. > > For 2.2.x drivers though I don't think start_xmit is serialized (is > it?), which would imply the need for testing something, or at least > doing your own serialization via spinlocks. (2.2.x drivers did a > test_and_set(&tbusy) IIRC) > dev->xmit_lock in 2.3 serializes the device. Note, 2.3 serializes (per device) i.e much more SMP fine grained than 2.2 which is serialized/protected by start/end_bh_atomic() In a way, nothing to do with tbusy really. Just improvement on start/end_bh_atomic(). tbusy stands for "transmission is busy". Donald might be able to provide better history. It was added, in my understanding, to stop the host processor from overwhelming the already overloaded NIC. In the simple case, when the DMA ring is filled up. Or that weird card that someone mentioned as being able to FIFO a single packet. netif_stop_queue() does the _exact_ tbusy functionality (with additional removal of old cruft). Perhaps what Andrew has proposed is more efficient (but breaks backward compatibility) but to point that the original was totaly wrong has me scratching my head. Jeff, when you say some modern PCI hardware has problems with the described semantics: can you provide more details? cheers, jamal From owner-netdev@oss.sgi.com Sat May 20 19:50:36 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4L2oaV14083 for netdev-outgoing; Sat, 20 May 2000 19:50:36 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from vaio.greennet (adsl-151-196-249-21.bellatlantic.net [151.196.249.21]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4L2oZn14080 for ; Sat, 20 May 2000 19:50:35 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id WAA09650; Sat, 20 May 2000 22:54:17 -0400 Date: Sat, 20 May 2000 22:54:17 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: "David S. Miller" cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-Reply-To: <200005200937.CAA19089@pizda.ninka.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 761 Lines: 23 On Sat, 20 May 2000, David S. Miller wrote: > Date: Sat, 20 May 2000 04:31:33 -0400 (EDT) > From: Donald Becker > > And if that's acceptable behavior, I have a new version of > del_timer() that's very small and fast ;->. > > How do you accomplish this and still respect the environment of the > timer function itself, ie. that the timer is not scheduled and outside > of self-inflicted locking issues the timer function may add it's timer. You trimmed a bit much, and missed the ";->". My comment was that if the timer routine may be run after del_timer() is called, then del_timer() might as well not do anything. Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Sun May 21 07:16:58 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4LEGwG16110 for netdev-outgoing; Sun, 21 May 2000 07:16:58 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-netdev@oss.sgi.com using -f Received: from mail.inka.de (mail@quechua.inka.de [212.227.14.2]) by oss.sgi.com (8.10.1/8.10.1) with ESMTP id e4LEGun16107 for ; Sun, 21 May 2000 07:16:57 -0700 Received: from dungeon.inka.de by mail.inka.de with uucp (rmailwrap 0.4) id 12tWXH-0002JB-00; Sun, 21 May 2000 16:16:55 +0200 Received: by dungeon.inka.de (Postfix, from userid 1000) id 188E5B7886; Sun, 21 May 2000 16:16:46 +0200 (CEST) Date: Sun, 21 May 2000 16:16:45 +0200 From: Andreas Jellinghaus To: netdev@oss.sgi.com Subject: appy hopcount to tunnel header or not ? Message-ID: <20000521161645.A2979@dungeon.inka.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.0.1i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 281 Lines: 10 linux 2.2.15 does: a packet with hopcount 3 (for example) will get a tunnel header with hop count 3. so any tracepath6 will show lots of "no response" hops: ipv4 machines between the two sit tunnel endpoints. shouldn´t a sit tunnel be a virtual single hop connection ? andreas From majordomo@oss.sgi.com Sun May 21 18:44:59 2000 Received: (from localhost user: 'majordomo', uid#102) by oss.sgi.com id ; Sun, 21 May 2000 18:44:49 -0700 Received: from saw.sw.com.sg ([203.120.9.98]:10130 "HELO saw.sw.com.sg") by oss.sgi.com with SMTP id ; Sun, 21 May 2000 18:44:30 -0700 Received: (qmail 12249 invoked by uid 577); 22 May 2000 01:44:19 -0000 Message-ID: <20000522094419.A12225@saw.sw.com.sg> Date: Mon, 22 May 2000 09:44:19 +0800 From: Andrey Savochkin To: Andrew Morton Cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au>, <39262113.19447850@uow.edu.au>; <20000520133557.A8149@saw.sw.com.sg> <39267696.ACCE4DF3@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <39267696.ACCE4DF3@uow.edu.au>; from "Andrew Morton" on Sat, May 20, 2000 at 09:27:18PM Sender: Majordomo List Manager Fake-Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1228 Lines: 41 Hello, On Sat, May 20, 2000 at 09:27:18PM +1000, Andrew Morton wrote: > Andrey Savochkin wrote: > > > > We need to serialize the timer handler with outside code (which runs not in > > BH or IRQ context). For 2.2 del_timer call should be in bh_atomic section. > > hmm.. I'm not too familiar with that part of 2.2. Could you please cook > up a patch to show me what you mean, then I'll torture it a bit. For eepro100 it should be --- eepro100.c Tue Apr 4 11:05:23 2000 +++ eepro100.c-timer Mon May 22 09:35:52 2000 @@ -1799,7 +1799,9 @@ dev->name, inw(ioaddr + SCBStatus)); /* Shut off the media monitoring timer. */ + start_bh_atomic(); del_timer(&sp->timer); + end_bh_atomic(); /* Shutting down the chip nicely fails to disable flow control. So.. */ outl(PortPartialReset, ioaddr + SCBPort); > > Does the BH synchronisation work within a BH or an IRQ? What happens if No. > we call del_timer within a BH or IRQ when the handler is currently > running? As far as I remember, all BHs were single threaded in 2.2 kernels, so if we're in BH, handler can't run on another CPU. And start_bh_atomic may be safely called from BHs and IRQs. It does nothing special in this case. Best regards Andrey From majordomo@oss.sgi.com Sun May 21 19:10:09 2000 Received: (from localhost user: 'majordomo', uid#102) by oss.sgi.com id ; Sun, 21 May 2000 19:09:59 -0700 Received: from panic.ohr.gatech.edu ([130.207.47.194]:34568 "EHLO havoc.gtf.org") by oss.sgi.com with ESMTP id ; Sun, 21 May 2000 19:09:50 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id VAA12089; Sun, 21 May 2000 21:08:05 -0400 Message-ID: <39288893.DBFE549E@mandrakesoft.com> Date: Sun, 21 May 2000 21:08:35 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.15 i686) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: Andrew Morton , Donald Becker , "netdev@oss.sgi.com" Subject: Re: Tx queueing References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: Majordomo List Manager Fake-Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1186 Lines: 51 jamal wrote: > Jeff, when you say some modern PCI hardware has > problems with the described semantics: can you provide more details? I was referring to PCI drivers, not PCI hardware. What I meant was that my experience has shown that some of the early softnet conversions (example 'A') caused transmit timeouts quite easily, until updated to look like example B. Example A: drv_start_xmit() { netif_stop_queue() /* queue packet for xmit */ if (!tx_full) netif_start_queue() } interrupt() { /* Tx'd a packet */ if (tx_full) netif_stop_queue() else netif_wake_queue() } Example B: drv_start_xmit() { /* queue packet for xmit */ if (tx_full) netif_stop_queue() } interrupt() { /* Tx'd a packet */ if (!tx_full) netif_wake_queue() } As a further note, since many of the PCI drivers do multiple iterations of "work," I wonder if it would be useful to postpone the netif_wake_queue() call until after the work loop in the interrupt handler completes. Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From majordomo@oss.sgi.com Sun May 21 19:11:39 2000 Received: (from localhost user: 'majordomo', uid#102) by oss.sgi.com id ; Sun, 21 May 2000 19:11:29 -0700 Received: from rrzd1.rz.uni-regensburg.de ([132.199.1.6]:63502 "EHLO rrzd1.rz.uni-regensburg.de") by oss.sgi.com with ESMTP id ; Sun, 21 May 2000 19:11:11 -0700 Received: from rss1.rz.uni-regensburg.de (rss1.rz.uni-regensburg.de [132.199.1.200]) by rrzd1.rz.uni-regensburg.de (8.9.3/8.9.3-KLW-Linux-0.1) with SMTP id AAA03080 for ; Mon, 22 May 2000 00:10:15 +0200 Received: (qmail 19793 invoked from network); 22 May 2000 00:10:39 +0200 Received: from rrzc3.rz.uni-regensburg.de (132.199.38.3) by rss1.rz.uni-regensburg.de with QMQP; 22 May 2000 00:10:39 +0200 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 22 May 2000 00:10:41 +0200 Date: Mon, 22 May 2000 00:10:41 +0200 (MET DST) From: Rolf Schillinger X-Sender: scr19100@rrzc3.rz.uni-regensburg.de To: netdev@oss.sgi.com Subject: kernel oops in > 2.3.99-pre7 (fwd) Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/Mixed; BOUNDARY="-559023410-1804928587-958744721=:4143" Content-ID: Sender: Majordomo List Manager Fake-Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 14847 Lines: 274 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---559023410-1804928587-958744721=:4143 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-ID: Hi all again, I must have been totally blind with the last try on ksymoops. I hope the one in the attachment is better now. the substantial part of my first email: I get an oops when doing isdnctrl dial ipppX. Somewhere along the pre7-3 or -4 this behaviour started. It could be connected with a crash I had around that time that forced me to fsck. That fsck had to repair lots of errors on the fs. But then again I replaced all necessary libs to the best of my knowledge and it still happened. Now I am up to pre9-2 and still no go. Here is the oops: My Hardware is as follows: Biostar M6VBE motherboard, P-III 450, 256MB, Teledat PCI (AVM Fritz PCI), Matrox G400 single head, creative dxr3 dvd (no module loaded at that stage), Hauppauge wintv based on bt878. silverado:/home/rolf# sh /usr/src/linux/scripts/ver_linux -- Versions installed: (if some fields are empty or look -- unusual then possibly you have very old versions) Linux silverado 2.2.14 #2 Wed May 10 21:00:11 CEST 2000 i686 unknown Kernel modules 2.3.11 Gnu C 2.95.2 Binutils 2.9.5.0.31 Linux C Library 2.1.3 Dynamic linker ldd: version 1.9.11 Procps . Mount 2.10f Net-tools 2.05 Kbd 0.99 Sh-utils 2.0g Modules Loaded hisax isdn msp3400 tuner bttv i2c videodev snd-pcm-oss snd-pcm-plugin snd-mixer-oss snd-card-es1938 snd-es1938 snd-pcm snd-timer snd-mixer snd soundcore The module list doesnt represent the modules loaded when it oopses At that point the modules loaded are hisax isdn msp3400 tuner bttv i2c videov in version from stock pre2.3. the lspci -vvv output is included as attachment. What's left to say is that all is well with 2.2.14 and with pre8-final it dialled 3 times before oopsing without being able to connect tho. Bis bald, Rolf ---559023410-1804928587-958744721=:4143 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII; NAME="lspci.silverado" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: ATTACHMENT; FILENAME="lspci.silverado" MDA6MDAuMCBIb3N0IGJyaWRnZTogVklBIFRlY2hub2xvZ2llcywgSW5jLiBW VDgyQzY5MSBbQXBvbGxvIFBST10gKHJldiAwNikNCglDb250cm9sOiBJL08t IE1lbSsgQnVzTWFzdGVyKyBTcGVjQ3ljbGUtIE1lbVdJTlYtIFZHQVNub29w LSBQYXJFcnItIFN0ZXBwaW5nLSBTRVJSLSBGYXN0QjJCLQ0KCVN0YXR1czog Q2FwKyA2Nk1oei0gVURGLSBGYXN0QjJCKyBQYXJFcnItIERFVlNFTD1tZWRp dW0gPlRBYm9ydC0gPFRBYm9ydC0gPE1BYm9ydCsgPlNFUlItIDxQRVJSKw0K CUxhdGVuY3k6IDE2IHNldA0KCVJlZ2lvbiAwOiBNZW1vcnkgYXQgZTAwMDAw MDAgKDMyLWJpdCwgcHJlZmV0Y2hhYmxlKQ0KCUNhcGFiaWxpdGllczogW2Ew XSBBR1AgdmVyc2lvbiAxLjANCgkJU3RhdHVzOiBSUT03IFNCQSsgNjRiaXQt IEZXLSBSYXRlPXgxLHgyDQoJCUNvbW1hbmQ6IFJRPTAgU0JBLSBBR1AtIDY0 Yml0LSBGVy0gUmF0ZT08bm9uZT4NCg0KMDA6MDEuMCBQQ0kgYnJpZGdlOiBW SUEgVGVjaG5vbG9naWVzLCBJbmMuIFZUODJDNTk4IFtBcG9sbG8gTVZQMyBB R1BdIChwcm9nLWlmIDAwIFtOb3JtYWwgZGVjb2RlXSkNCglDb250cm9sOiBJ L08rIE1lbSsgQnVzTWFzdGVyKyBTcGVjQ3ljbGUtIE1lbVdJTlYtIFZHQVNu b29wLSBQYXJFcnItIFN0ZXBwaW5nLSBTRVJSLSBGYXN0QjJCLQ0KCVN0YXR1 czogQ2FwLSA2Nk1oeisgVURGLSBGYXN0QjJCLSBQYXJFcnItIERFVlNFTD1t ZWRpdW0gPlRBYm9ydC0gPFRBYm9ydC0gPE1BYm9ydCsgPlNFUlItIDxQRVJS LQ0KCUxhdGVuY3k6IDAgc2V0DQoJQnVzOiBwcmltYXJ5PTAwLCBzZWNvbmRh cnk9MDEsIHN1Ym9yZGluYXRlPTAxLCBzZWMtbGF0ZW5jeT0wDQoJISEhIFVu a25vd24gSS9PIHJhbmdlIHR5cGVzIGZmL2ZmDQoJTWVtb3J5IGJlaGluZCBi cmlkZ2U6IGU0MDAwMDAwLWU3ZmZmZmZmDQoJUHJlZmV0Y2hhYmxlIG1lbW9y eSBiZWhpbmQgYnJpZGdlOiBlODAwMDAwMC1lOWZmZmZmZg0KCVNlY29uZGFy eSBzdGF0dXM6IFNFUlINCglCcmlkZ2VDdGw6IFBhcml0eS0gU0VSUi0gTm9J U0ErIFZHQSsgTUFib3J0LSA+UmVzZXQtIEZhc3RCMkItDQoNCjAwOjA3LjAg SVNBIGJyaWRnZTogVklBIFRlY2hub2xvZ2llcywgSW5jLiBWVDgyQzU5NiBJ U0EgW0Fwb2xsbyBQUk9dIChyZXYgMDcpDQoJU3Vic3lzdGVtOiBWSUEgVGVj aG5vbG9naWVzLCBJbmMuOiBVbmtub3duIGRldmljZSAwMDAwDQoJQ29udHJv bDogSS9PKyBNZW0rIEJ1c01hc3RlcisgU3BlY0N5Y2xlLSBNZW1XSU5WLSBW R0FTbm9vcC0gUGFyRXJyLSBTdGVwcGluZysgU0VSUi0gRmFzdEIyQi0NCglT dGF0dXM6IENhcC0gNjZNaHotIFVERi0gRmFzdEIyQi0gUGFyRXJyLSBERVZT RUw9bWVkaXVtID5UQWJvcnQtIDxUQWJvcnQtIDxNQWJvcnQtID5TRVJSLSA8 UEVSUi0NCglMYXRlbmN5OiAwIHNldA0KDQowMDowNy4xIElERSBpbnRlcmZh Y2U6IFZJQSBUZWNobm9sb2dpZXMsIEluYy4gVlQ4MkM1ODYgSURFIFtBcG9s bG9dIChyZXYgMDYpIChwcm9nLWlmIDhhIFtNYXN0ZXIgU2VjUCBQcmlQXSkN CglDb250cm9sOiBJL08rIE1lbSsgQnVzTWFzdGVyKyBTcGVjQ3ljbGUtIE1l bVdJTlYtIFZHQVNub29wLSBQYXJFcnItIFN0ZXBwaW5nLSBTRVJSLSBGYXN0 QjJCLQ0KCVN0YXR1czogQ2FwLSA2Nk1oei0gVURGLSBGYXN0QjJCKyBQYXJF cnItIERFVlNFTD1tZWRpdW0gPlRBYm9ydC0gPFRBYm9ydC0gPE1BYm9ydC0g PlNFUlItIDxQRVJSLQ0KCUxhdGVuY3k6IDY0IHNldA0KCVJlZ2lvbiA0OiBJ L08gcG9ydHMgYXQgZDAwMA0KDQowMDowNy4yIFVTQiBDb250cm9sbGVyOiBW SUEgVGVjaG5vbG9naWVzLCBJbmMuIFZUODJDNTg2QiBVU0IgKHJldiAwMikg KHByb2ctaWYgMDAgW1VIQ0ldKQ0KCVN1YnN5c3RlbTogVW5rbm93biBkZXZp Y2UgMDkyNToxMjM0DQoJQ29udHJvbDogSS9PKyBNZW0rIEJ1c01hc3Rlcisg U3BlY0N5Y2xlLSBNZW1XSU5WLSBWR0FTbm9vcC0gUGFyRXJyLSBTdGVwcGlu Zy0gU0VSUi0gRmFzdEIyQi0NCglTdGF0dXM6IENhcC0gNjZNaHotIFVERi0g RmFzdEIyQi0gUGFyRXJyLSBERVZTRUw9bWVkaXVtID5UQWJvcnQtIDxUQWJv cnQtIDxNQWJvcnQtID5TRVJSLSA8UEVSUi0NCglMYXRlbmN5OiA2NCBzZXQs IGNhY2hlIGxpbmUgc2l6ZSAwOA0KCUludGVycnVwdDogcGluIEQgcm91dGVk IHRvIElSUSAwDQoJUmVnaW9uIDQ6IEkvTyBwb3J0cyBhdCBkNDAwDQoNCjAw OjA3LjMgSG9zdCBicmlkZ2U6IFZJQSBUZWNobm9sb2dpZXMsIEluYy46IFVu a25vd24gZGV2aWNlIDMwNTANCglDb250cm9sOiBJL08tIE1lbS0gQnVzTWFz dGVyLSBTcGVjQ3ljbGUtIE1lbVdJTlYtIFZHQVNub29wLSBQYXJFcnItIFN0 ZXBwaW5nLSBTRVJSLSBGYXN0QjJCLQ0KCVN0YXR1czogQ2FwLSA2Nk1oei0g VURGLSBGYXN0QjJCKyBQYXJFcnItIERFVlNFTD1tZWRpdW0gPlRBYm9ydC0g PFRBYm9ydC0gPE1BYm9ydC0gPlNFUlItIDxQRVJSLQ0KDQowMDowOC4wIE11 bHRpbWVkaWEgY29udHJvbGxlcjogU2lnbWEgRGVzaWducywgSW5jLiBSRUFM bWFnaWMgSG9sbHl3b29kIFBsdXMgRFZEIERlY29kZXIgKHJldiAwMikNCglD b250cm9sOiBJL08tIE1lbSsgQnVzTWFzdGVyKyBTcGVjQ3ljbGUtIE1lbVdJ TlYtIFZHQVNub29wLSBQYXJFcnItIFN0ZXBwaW5nLSBTRVJSLSBGYXN0QjJC LQ0KCVN0YXR1czogQ2FwKyA2Nk1oei0gVURGLSBGYXN0QjJCLSBQYXJFcnIt IERFVlNFTD1tZWRpdW0gPlRBYm9ydC0gPFRBYm9ydC0gPE1BYm9ydC0gPlNF UlItIDxQRVJSLQ0KCUxhdGVuY3k6IDY0IHNldA0KCUludGVycnVwdDogcGlu IEEgcm91dGVkIHRvIElSUSAxMQ0KCVJlZ2lvbiAwOiBNZW1vcnkgYXQgZWEw MDAwMDAgKDMyLWJpdCwgbm9uLXByZWZldGNoYWJsZSkNCglDYXBhYmlsaXRp ZXM6IFs0MF0gUG93ZXIgTWFuYWdlbWVudCB2ZXJzaW9uIDENCgkJRmxhZ3M6 IFBNRUNsay0gQXV4UHdyLSBEU0ktIEQxLSBEMi0gUE1FLQ0KCQlTdGF0dXM6 IEQwIFBNRS1FbmFibGUtIERTZWw9MCBEU2NhbGU9MCBQTUUtDQoNCjAwOjA5 LjAgTmV0d29yayBjb250cm9sbGVyOiBBVk0gQXVkaW92aXN1ZWxsZXMgTUtU RyAmIENvbXB1dGVyIFN5c3RlbSBHbWJIIEExIElTRE4gW0ZyaXR6XSAocmV2 IDAyKQ0KCVN1YnN5c3RlbTogQVZNIEF1ZGlvdmlzdWVsbGVzIE1LVEcgJiBD b21wdXRlciBTeXN0ZW0gR21iSDogVW5rbm93biBkZXZpY2UgMGEwMA0KCUNv bnRyb2w6IEkvTysgTWVtKyBCdXNNYXN0ZXItIFNwZWNDeWNsZS0gTWVtV0lO Vi0gVkdBU25vb3AtIFBhckVyci0gU3RlcHBpbmctIFNFUlItIEZhc3RCMkIt DQoJU3RhdHVzOiBDYXAtIDY2TWh6LSBVREYtIEZhc3RCMkIrIFBhckVyci0g REVWU0VMPW1lZGl1bSA+VEFib3J0LSA8VEFib3J0LSA8TUFib3J0LSA+U0VS Ui0gPFBFUlItDQoJSW50ZXJydXB0OiBwaW4gQSByb3V0ZWQgdG8gSVJRIDEw DQoJUmVnaW9uIDA6IE1lbW9yeSBhdCBlYTEwMDAwMCAoMzItYml0LCBub24t cHJlZmV0Y2hhYmxlKQ0KCVJlZ2lvbiAxOiBJL08gcG9ydHMgYXQgZDgwMA0K DQowMDowYS4wIE11bHRpbWVkaWEgdmlkZW8gY29udHJvbGxlcjogQnJvb2t0 cmVlIENvcnBvcmF0aW9uIEJ0ODc4IChyZXYgMDIpDQoJU3Vic3lzdGVtOiBI YXVwcGFnZSBjb21wdXRlciB3b3JrcyBJbmMuOiBVbmtub3duIGRldmljZSAx M2ViDQoJQ29udHJvbDogSS9PLSBNZW0rIEJ1c01hc3RlcisgU3BlY0N5Y2xl LSBNZW1XSU5WLSBWR0FTbm9vcC0gUGFyRXJyLSBTdGVwcGluZy0gU0VSUi0g RmFzdEIyQi0NCglTdGF0dXM6IENhcC0gNjZNaHotIFVERi0gRmFzdEIyQisg UGFyRXJyLSBERVZTRUw9bWVkaXVtID5UQWJvcnQtIDxUQWJvcnQtIDxNQWJv cnQtID5TRVJSLSA8UEVSUi0NCglMYXRlbmN5OiAxNiBtaW4sIDQwIG1heCwg NjQgc2V0DQoJSW50ZXJydXB0OiBwaW4gQSByb3V0ZWQgdG8gSVJRIDUNCglS ZWdpb24gMDogTWVtb3J5IGF0IGVhMTAxMDAwICgzMi1iaXQsIHByZWZldGNo YWJsZSkNCg0KMDA6MGEuMSBNdWx0aW1lZGlhIGNvbnRyb2xsZXI6IEJyb29r dHJlZSBDb3Jwb3JhdGlvbiBCdDg3OCAocmV2IDAyKQ0KCVN1YnN5c3RlbTog SGF1cHBhZ2UgY29tcHV0ZXIgd29ya3MgSW5jLjogVW5rbm93biBkZXZpY2Ug MTNlYg0KCUNvbnRyb2w6IEkvTy0gTWVtKyBCdXNNYXN0ZXIrIFNwZWNDeWNs ZS0gTWVtV0lOVi0gVkdBU25vb3AtIFBhckVyci0gU3RlcHBpbmctIFNFUlIt IEZhc3RCMkItDQoJU3RhdHVzOiBDYXAtIDY2TWh6LSBVREYtIEZhc3RCMkIr IFBhckVyci0gREVWU0VMPW1lZGl1bSA+VEFib3J0LSA8VEFib3J0LSA8TUFi b3J0LSA+U0VSUi0gPFBFUlItDQoJTGF0ZW5jeTogNCBtaW4sIDI1NSBtYXgs IDY0IHNldA0KCUludGVycnVwdDogcGluIEEgcm91dGVkIHRvIElSUSA1DQoJ UmVnaW9uIDA6IE1lbW9yeSBhdCBlYTEwMjAwMCAoMzItYml0LCBwcmVmZXRj aGFibGUpDQoNCjAwOjBiLjAgTXVsdGltZWRpYSBhdWRpbyBjb250cm9sbGVy OiBFU1MgVGVjaG5vbG9neSBFUzE5NjkgU29sby0xIEF1ZGlvZHJpdmUgKHJl diAwMSkNCglTdWJzeXN0ZW06IEVTUyBUZWNobm9sb2d5OiBVbmtub3duIGRl dmljZSA4ODg4DQoJQ29udHJvbDogSS9PKyBNZW0tIEJ1c01hc3RlcisgU3Bl Y0N5Y2xlLSBNZW1XSU5WLSBWR0FTbm9vcC0gUGFyRXJyLSBTdGVwcGluZy0g U0VSUi0gRmFzdEIyQi0NCglTdGF0dXM6IENhcCsgNjZNaHotIFVERi0gRmFz dEIyQisgUGFyRXJyLSBERVZTRUw9bWVkaXVtID5UQWJvcnQtIDxUQWJvcnQt IDxNQWJvcnQtID5TRVJSLSA8UEVSUi0NCglMYXRlbmN5OiAyIG1pbiwgMjQg bWF4LCA2NCBzZXQNCglJbnRlcnJ1cHQ6IHBpbiBBIHJvdXRlZCB0byBJUlEg OQ0KCVJlZ2lvbiAwOiBJL08gcG9ydHMgYXQgZGMwMA0KCVJlZ2lvbiAxOiBJ L08gcG9ydHMgYXQgZTAwMA0KCVJlZ2lvbiAyOiBJL08gcG9ydHMgYXQgZTQw MA0KCVJlZ2lvbiAzOiBJL08gcG9ydHMgYXQgZTgwMA0KCVJlZ2lvbiA0OiBJ L08gcG9ydHMgYXQgZWMwMA0KCUNhcGFiaWxpdGllczogW2MwXSBQb3dlciBN YW5hZ2VtZW50IHZlcnNpb24gMQ0KCQlGbGFnczogUE1FQ2xrLSBBdXhQd3It IERTSSsgRDErIEQyKyBQTUUtDQoJCVN0YXR1czogRDAgUE1FLUVuYWJsZS0g RFNlbD0wIERTY2FsZT0wIFBNRS0NCg0KMDE6MDAuMCBWR0EgY29tcGF0aWJs ZSBjb250cm9sbGVyOiBNYXRyb3ggR3JhcGhpY3MsIEluYy4gTUdBIEc0MDAg QUdQIChyZXYgMDQpIChwcm9nLWlmIDAwIFtWR0FdKQ0KCVN1YnN5c3RlbTog TWF0cm94IEdyYXBoaWNzLCBJbmMuIE1pbGxlbm5pdW0gRzQwMCAzMk1iIFNH UkFNDQoJQ29udHJvbDogSS9PKyBNZW0rIEJ1c01hc3RlcisgU3BlY0N5Y2xl LSBNZW1XSU5WLSBWR0FTbm9vcC0gUGFyRXJyLSBTdGVwcGluZy0gU0VSUi0g RmFzdEIyQi0NCglTdGF0dXM6IENhcCsgNjZNaHotIFVERi0gRmFzdEIyQisg UGFyRXJyLSBERVZTRUw9bWVkaXVtID5UQWJvcnQtIDxUQWJvcnQtIDxNQWJv cnQtID5TRVJSLSA8UEVSUi0NCglMYXRlbmN5OiAxNiBtaW4sIDMyIG1heCwg NjQgc2V0LCBjYWNoZSBsaW5lIHNpemUgMDgNCglJbnRlcnJ1cHQ6IHBpbiBB IHJvdXRlZCB0byBJUlEgMTENCglSZWdpb24gMDogTWVtb3J5IGF0IGU4MDAw MDAwICgzMi1iaXQsIHByZWZldGNoYWJsZSkNCglSZWdpb24gMTogTWVtb3J5 IGF0IGU0MDAwMDAwICgzMi1iaXQsIG5vbi1wcmVmZXRjaGFibGUpDQoJUmVn aW9uIDI6IE1lbW9yeSBhdCBlNTAwMDAwMCAoMzItYml0LCBub24tcHJlZmV0 Y2hhYmxlKQ0KCUNhcGFiaWxpdGllczogW2RjXSBQb3dlciBNYW5hZ2VtZW50 IHZlcnNpb24gMg0KCQlGbGFnczogUE1FQ2xrLSBBdXhQd3ItIERTSSsgRDEt IEQyLSBQTUUtDQoJCVN0YXR1czogRDAgUE1FLUVuYWJsZS0gRFNlbD0wIERT Y2FsZT0wIFBNRS0NCglDYXBhYmlsaXRpZXM6IFtmMF0gQUdQIHZlcnNpb24g Mi4wDQoJCVN0YXR1czogUlE9MzEgU0JBKyA2NGJpdC0gRlctIFJhdGU9eDEs eDINCgkJQ29tbWFuZDogUlE9MzEgU0JBKyBBR1ArIDY0Yml0LSBGVy0gUmF0 ZT14MQ0KDQo= ---559023410-1804928587-958744721=:4143 Content-Type: TEXT/PLAIN; charset=US-ASCII; name=oops Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename=oops a3N5bW9vcHMgMi4zLjQgb24gaTY4NiAyLjMuOTktcHJlOS4gIE9wdGlvbnMg dXNlZA0KICAgICAtdiAvdXNyL3NyYy9saW51eC92bWxpbnV4IChzcGVjaWZp ZWQpDQogICAgIC1LIChzcGVjaWZpZWQpDQogICAgIC1sIC9wcm9jL21vZHVs ZXMgKGRlZmF1bHQpDQogICAgIC1vIC9saWIvbW9kdWxlcy8yLjMuOTktcHJl OS8gKGRlZmF1bHQpDQogICAgIC1tIC91c3Ivc3JjL2xpbnV4L1N5c3RlbS5t YXAgKHNwZWNpZmllZCkNCg0KTm8gbW9kdWxlcyBpbiBrc3ltcywgc2tpcHBp bmcgb2JqZWN0cw0KTm8ga3N5bXMsIHNraXBwaW5nIGxzbW9kDQpjMDFiMDUz Mw0KT29wczogMDAwMg0KQ1BVOiAwDQpFSVA6IDAwMTA6WzxjMDFiMDUzMz5d DQpVc2luZyBkZWZhdWx0cyBmcm9tIGtzeW1vb3BzIC10IGVsZjMyLWkzODYg LWEgaTM4Ng0KICAgICAgICBlYXg6IDAwMDAwMDQ1ICBlYng6IGNmYWZhMzIw ICAgZWN4OiBjMDIzODZkNCAgICAgZWR4OiBjMDIzODZkNA0KICAgICAgICBl c2k6IDAwMDA4MGZkICBlZGk6IGNlZGEwMDAwICAgIGVicDogY2NjYzg5YTAg ICAgICBlc3A6IGMwMjRkZTY0DQpkczogMDAxOCAgICAgICAgZXM6IDAwMTgg ICAgICAgc3M6IDAwMTgNClByb2Nlc3Mgc3dhcHBlciAocGlkOiAwLCBzdGFj a3BhZ2U9YzAyNGQwMDApDQpTdGFjazogYzAyMWY0ZTAgZDA4NDNlZmMgYWZh ZmEzMjAgY2VkYTAwMDAgZDA4NDNlZmMgYWZhZmEzMjAgYWZhZmEzMjAgY2Vk YTAwMDANCiAgICAgICAgY2NjODljMDAgY2NjODlhMDAgY2NjODljMTAgZDA4 NDNlOTMgY2NjODljMDAgY2NjODlhMDAgY2ZhZmEzMjAgMDAwMDgwZmUNCiAg ICAgICAgY2ZhZmEzMjAgY2ZhZmEzMjAgY2NjODljMTAgY2NjODlhMDAgMDAw MDAwMDAgZDA4M2IzOGIgY2NjODljMDAgY2NjODlhMDANCkNhbGwgdHJhY2U6 IFs8YzAyMWY0ZTA+XSBbPGQwODQzZWZjPl0gWzxkMDg0M2VmYz5dIFs8ZDA4 NDNjOTM+XSBbPGQwODNiMzliPl0gWzxkMDgzYjQxYT5dIFs8ZDA4MzU0Yzk+ XQ0KWzxkMDg2ZWQyNz5dIFs8ZDA4NjVhMjU+XSBbPGQwODY1YTRmPl0gWzxj MDExZWM3Mj5dIFs8YzAxMWM0ZTM+XSBbPGMwMTFjNDM4Pl0gWzxjMDExYzMz MT5dIFs8YzAxMGJmNjQ+XQ0KWzxjMDExMGE3Yz5dIDxbPGQwODAyMDQwPl0g WzxjMDEwOGFlMD5dIFs8YzAxMGIwNTQ+XSBbPGMwMTEwYTdjPl0gWzxkMDgw MjA0MD5dIFs8YzAxMDhhZTA+XSBbPGMwMTEwY2FhPl0NCls8YzAxMTBhN2M+ XSBbPGMwMTA4YWUwPl0gWzxjMDEwOGI0OT5dIFs8YzAxMDUwMDA+XSBbPGMw MTAwMThkPl0NCkNvZGU6IGM3IDA1IDAwIDAwIDAwIDAwIDAwIDAwIDAwIDAw IDgzIGM0IDA4IDhiIDQ0IDI0IDBjIDhiIDQwIDJjDQoNCj4+RUlQOyBjMDFi MDUzMyA8ZGV2X3F1ZXVlX3htaXRfbml0KzRmL2M4PiAgIDw9PT09PQ0KVHJh Y2U7IGMwMjFmNGUwIDxwcmlvMmJhbmQrYmIvMzMzYj4NClRyYWNlOyBkMDg0 M2VmYyA8RU5EX09GX0NPREUrMTA1YTQ5MTAvPz8/Pz4NClRyYWNlOyBkMDg0 M2VmYyA8RU5EX09GX0NPREUrMTA1YTQ5MTAvPz8/Pz4NClRyYWNlOyBkMDg0 M2M5MyA8RU5EX09GX0NPREUrMTA1YTQ2YTcvPz8/Pz4NClRyYWNlOyBkMDgz YjM5YiA8RU5EX09GX0NPREUrMTA1OWJkYWYvPz8/Pz4NClRyYWNlOyBkMDgz YjQxYSA8RU5EX09GX0NPREUrMTA1OWJlMmUvPz8/Pz4NClRyYWNlOyBkMDgz NTRjOSA8RU5EX09GX0NPREUrMTA1OTVlZGQvPz8/Pz4NClRyYWNlOyBkMDg2 ZWQyNyA8RU5EX09GX0NPREUrMTA1Y2Y3M2IvPz8/Pz4NClRyYWNlOyBkMDg2 NWEyNSA8RU5EX09GX0NPREUrMTA1YzY0MzkvPz8/Pz4NClRyYWNlOyBkMDg2 NWE0ZiA8RU5EX09GX0NPREUrMTA1YzY0NjMvPz8/Pz4NClRyYWNlOyBjMDEx ZWM3MiA8aW1tZWRpYXRlX2JoKzQ2LzUwPg0KVHJhY2U7IGMwMTFjNGUzIDxi aF9hY3Rpb24rMWIvNWM+DQpUcmFjZTsgYzAxMWM0MzggPHRhc2tsZXRfaGlf YWN0aW9uKzM4LzYwPg0KVHJhY2U7IGMwMTFjMzMxIDxkb19zb2Z0aXJxKzUx Lzc4Pg0KVHJhY2U7IGMwMTBiZjY0IDxkb19JUlErYTQvYjQ+DQpUcmFjZTsg YzAxMTBhN2MgPGFjcGlfaWRsZSswLzIzYz4NClRyYWNlOyBjMDExMGE3YyA8 YWNwaV9pZGxlKzAvMjNjPg0KVHJhY2U7IGMwMTA4YWUwIDxkZWZhdWx0X2lk bGUrMC8yOD4NClRyYWNlOyBjMDEwOGI0OSA8Y3B1X2lkbGUrNDEvNTQ+DQpU cmFjZTsgYzAxMDUwMDAgPGVtcHR5X2JhZF9wYWdlKzAvMTAwMD4NClRyYWNl OyBjMDEwMDE4ZCA8TDYrMC8yPg0KQ29kZTsgIGMwMWIwNTMzIDxkZXZfcXVl dWVfeG1pdF9uaXQrNGYvYzg+DQowMDAwMDAwMCA8X0VJUD46DQpDb2RlOyAg YzAxYjA1MzMgPGRldl9xdWV1ZV94bWl0X25pdCs0Zi9jOD4gICA8PT09PT0N CiAgIDA6ICAgYzcgMDUgMDAgMDAgMDAgMDAgMDAgICAgICBtb3ZsICAgJDB4 MCwweDAgICA8PT09PT0NCkNvZGU7ICBjMDFiMDUzYSA8ZGV2X3F1ZXVlX3ht aXRfbml0KzU2L2M4Pg0KICAgNzogICAwMCAwMCAwMCANCkNvZGU7ICBjMDFi MDUzZCA8ZGV2X3F1ZXVlX3htaXRfbml0KzU5L2M4Pg0KICAgYTogICA4MyBj NCAwOCAgICAgICAgICAgICAgICAgIGFkZCAgICAkMHg4LCVlc3ANCkNvZGU7 ICBjMDFiMDU0MCA8ZGV2X3F1ZXVlX3htaXRfbml0KzVjL2M4Pg0KICAgZDog ICA4YiA0NCAyNCAwYyAgICAgICAgICAgICAgIG1vdiAgICAweGMoJWVzcCwx KSwlZWF4DQpDb2RlOyAgYzAxYjA1NDQgPGRldl9xdWV1ZV94bWl0X25pdCs2 MC9jOD4NCiAgMTE6ICAgOGIgNDAgMmMgICAgICAgICAgICAgICAgICBtb3Yg ICAgMHgyYyglZWF4KSwlZWF4DQoNCkFpZWUsIGtpbGxpbmcgaW50ZXJydXB0 IGhhbmRsZXINCktlcm5lbCBwYW5pYzogQXR0ZW1wdGVkIHRvIGtpbGwgdGhl IGlkbGUgdGFzaw0K ---559023410-1804928587-958744721=:4143-- From owner-netdev@oss.sgi.com Sun May 21 20:32:15 2000 Received: by oss.sgi.com id ; Sun, 21 May 2000 20:31:56 -0700 Received: from saw.sw.com.sg ([203.120.9.98]:24722 "HELO saw.sw.com.sg") by oss.sgi.com with SMTP id ; Sun, 21 May 2000 20:31:52 -0700 Received: (qmail 13587 invoked by uid 577); 22 May 2000 03:31:45 -0000 Message-ID: <20000522113145.A13568@saw.sw.com.sg> Date: Mon, 22 May 2000 11:31:45 +0800 From: Andrey Savochkin To: Jeff Garzik , jamal Cc: Andrew Morton , "netdev@oss.sgi.com" Subject: Re: Tx queueing References: <39288893.DBFE549E@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <39288893.DBFE549E@mandrakesoft.com>; from "Jeff Garzik" on Sun, May 21, 2000 at 09:08:35PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 629 Lines: 28 Hello, On Sun, May 21, 2000 at 09:08:35PM -0400, Jeff Garzik wrote: > I was referring to PCI drivers, not PCI hardware. What I meant was that > my experience has shown that some of the early softnet conversions > (example 'A') caused transmit timeouts quite easily, until updated to > look like example B. > > Example A: > > drv_start_xmit() { > netif_stop_queue() > /* queue packet for xmit */ > if (!tx_full) > netif_start_queue() > } > interrupt() { > /* Tx'd a packet */ > if (tx_full) > netif_stop_queue() > else > netif_wake_queue() > } I don't see what's wrong with A. Best regards Andrey From owner-netdev@oss.sgi.com Mon May 22 01:15:16 2000 Received: by oss.sgi.com id ; Mon, 22 May 2000 01:14:56 -0700 Received: from pizda.ninka.net ([216.101.162.242]:41355 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Mon, 22 May 2000 01:14:34 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id BAA21122; Mon, 22 May 2000 01:04:15 -0700 Date: Mon, 22 May 2000 01:04:15 -0700 Message-Id: <200005220804.BAA21122@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: jmorris@intercode.com.au CC: netdev@oss.sgi.com, rusty@linuxcare.com.au In-reply-to: (message from James Morris on Sat, 20 May 2000 04:59:29 +1000 (EST)) Subject: Re: [PATCH] netfilter ip_queue notifier fix wrt 2.3.99pre9-2 References: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 213 Lines: 9 Rusty, just send off the accumulated fixes such as this one to myself or Linus when you feel they are ready. I'm going to send the ipv6 netfilter stuff to Linus tonight. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon May 22 04:14:46 2000 Received: by oss.sgi.com id ; Mon, 22 May 2000 04:14:36 -0700 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:51840 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Mon, 22 May 2000 04:14:15 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Mon, 22 May 2000 06:08:22 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LH75WXVC; Mon, 22 May 2000 06:11:14 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY57GQA; Mon, 22 May 2000 21:11:15 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id VAA03932; Mon, 22 May 2000 21:11:17 +1000 Message-ID: <39291650.D15A17CB@uow.edu.au> Date: Mon, 22 May 2000 21:13:20 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Andrey Savochkin CC: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <3925BB00.B1CDDFE7@mandrakesoft.com> , ; <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au>, <39262113.19447850@uow.edu.au>; <20000520133557.A8149@saw.sw.com.sg> <39267696.ACCE4DF3@uow.edu.au>, <39267696.ACCE4DF3@uow.edu.au>; from "Andrew Morton" on Sat, May 20, 2000 at 09:27:18PM <20000522094419.A12225@saw.sw.com.sg> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1523 Lines: 51 Andrey Savochkin wrote: > > Hello, Hi, Andrey. > ... > For eepro100 it should be > > --- eepro100.c Tue Apr 4 11:05:23 2000 > +++ eepro100.c-timer Mon May 22 09:35:52 2000 > @@ -1799,7 +1799,9 @@ > dev->name, inw(ioaddr + SCBStatus)); > > /* Shut off the media monitoring timer. */ > + start_bh_atomic(); > del_timer(&sp->timer); > + end_bh_atomic(); > > /* Shutting down the chip nicely fails to disable flow control. So.. */ > outl(PortPartialReset, ioaddr + SCBPort); hmm.. But if the timer handler was running before the start_bh_atomic(), it will continue to run during and after the del_timer(). _Somewhere_ the mainline code needs to spin until the timer handler has finished. So we need a lock to serialise del_timer() wrt the handlers. It can be either the global_bh_lock or the timerlist_lock; it doesn't matter a lot. I'm using the timerlist_lock at present because it looks to me like global_bh_lock has its head on the chopping block - sometime we're going to make timer handlers non-serialised and things which play with global_bh_lock will probably break. > ... > > As far as I remember, all BHs were single threaded in 2.2 kernels, so if > we're in BH, handler can't run on another CPU. That is also the case in 2.3. Timer handlers are globally serialised. Only one handler can be running at a time, system-wide. Where's Alexey, BTW? If he's busily coding a fix for this I'm gonna strangle him :) -- -akpm- From owner-netdev@oss.sgi.com Mon May 22 04:24:37 2000 Received: by oss.sgi.com id ; Mon, 22 May 2000 04:24:27 -0700 Received: from saw.sw.com.sg ([203.120.9.98]:7315 "HELO saw.sw.com.sg") by oss.sgi.com with SMTP id ; Mon, 22 May 2000 04:24:09 -0700 Received: (qmail 17536 invoked by uid 577); 22 May 2000 11:23:58 -0000 Message-ID: <20000522192357.A17503@saw.sw.com.sg> Date: Mon, 22 May 2000 19:23:57 +0800 From: Andrey Savochkin To: Andrew Morton Cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: , ; <20000520122715.A7682@saw.sw.com.sg> <39262113.19447850@uow.edu.au>, <39262113.19447850@uow.edu.au>; <20000520133557.A8149@saw.sw.com.sg> <39267696.ACCE4DF3@uow.edu.au>, <39267696.ACCE4DF3@uow.edu.au>; <20000522094419.A12225@saw.sw.com.sg> <39291650.D15A17CB@uow.edu.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <39291650.D15A17CB@uow.edu.au>; from "Andrew Morton" on Mon, May 22, 2000 at 09:13:20PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1422 Lines: 44 On Mon, May 22, 2000 at 09:13:20PM +1000, Andrew Morton wrote: > Andrey Savochkin wrote: > > ... > > For eepro100 it should be > > > > --- eepro100.c Tue Apr 4 11:05:23 2000 > > +++ eepro100.c-timer Mon May 22 09:35:52 2000 > > @@ -1799,7 +1799,9 @@ > > dev->name, inw(ioaddr + SCBStatus)); > > > > /* Shut off the media monitoring timer. */ > > + start_bh_atomic(); > > del_timer(&sp->timer); > > + end_bh_atomic(); > > > > /* Shutting down the chip nicely fails to disable flow control. So.. */ > > outl(PortPartialReset, ioaddr + SCBPort); > > hmm.. But if the timer handler was running before the > start_bh_atomic(), it will continue to run during and after the > del_timer(). Timers run from timer BH. start_bh_atomic() gives a guarantee that upon its exit no BHs are running and BHs are disabled until end_bh_atomic(). Certainly, it applies only to calls not from BH/IRQ context. And I repeat, that's 2.2 kernel code. > > _Somewhere_ the mainline code needs to spin until the timer handler has > finished. That's start_bh_atomic()/wait_on_bh(). See include/asm-i386/softirq.h -> arch/i386/kernel/irq.c [snip] > Where's Alexey, BTW? If he's busily coding a fix for this I'm gonna > strangle him :) In Moscow :-) Last time I spoke with him he was busy with rewriting of fast retransmit logic and congestion avoidance. Andrey From owner-netdev@oss.sgi.com Mon May 22 09:00:38 2000 Received: by oss.sgi.com id ; Mon, 22 May 2000 09:00:28 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:2489 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Mon, 22 May 2000 09:00:20 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id MAA27465; Mon, 22 May 2000 12:00:19 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id MAA20184; Mon, 22 May 2000 12:00:18 -0400 (EDT) Date: Mon, 22 May 2000 12:00:18 -0400 (EDT) From: jamal To: Jeff Garzik cc: Andrew Morton , Donald Becker , "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: <39288893.DBFE549E@mandrakesoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 453 Lines: 41 On Sun, 21 May 2000, Jeff Garzik wrote: > Example A: > > drv_start_xmit() { spinlock > netif_stop_queue() > /* queue packet for xmit */ > if (!tx_full) > netif_start_queue() > } unlock > interrupt() { spinlock > /* Tx'd a packet */ > if (tx_full) > netif_stop_queue() > else > netif_wake_queue() > } > unlock that should do it; you are already doing it anyways. I hope Donald doesnt flame me ;-> cheers, jamal From owner-netdev@oss.sgi.com Mon May 22 15:55:04 2000 Received: by oss.sgi.com id ; Mon, 22 May 2000 15:54:54 -0700 Received: from [206.24.4.33] ([206.24.4.33]:15885 "EHLO vaio.greennet") by oss.sgi.com with ESMTP id ; Mon, 22 May 2000 15:54:36 -0700 Received: from localhost (becker@localhost) by vaio.greennet (8.9.3/8.8.7) with ESMTP id SAA06476; Mon, 22 May 2000 18:54:38 -0400 Date: Mon, 22 May 2000 18:54:37 -0400 (EDT) From: Donald Becker X-Sender: becker@vaio.greennet To: jamal cc: "netdev@oss.sgi.com" Subject: Re: Tx queueing In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 2167 Lines: 81 On Mon, 22 May 2000, jamal wrote: > On Sun, 21 May 2000, Jeff Garzik wrote: > > > Example A: > > > > drv_start_xmit() { > spinlock > > netif_stop_queue() > > /* queue packet for xmit */ > > if (!tx_full) > > netif_start_queue() > > } > unlock .. > that should do it; you are already doing it anyways. > > I hope Donald doesnt flame me ;-> What a foolish, foolish hope. There are some chips where the transmit routines don't need to be locked against other activity such as the interrupt handler, only against simultaneous entry of drv_start_xmit() by multiple processors. Always having a spinlock adds overhead. What we want for those chips is netif_block_tx(dev); /* Block other func. entries on pre-2.3. */ ... if (full) { np->tx_full = 1; netif_pause_tx(dev); /* We must later do netif_wake_queue()!! */ } else netif_unblock_tx(dev); To make it backward-compatible it would be netif_block_tx(dev, timeout_handler, timeout); (C.f. David Hinds' vesion: tx_timeout_check(dev, tx_timeout) ). Or we could just insist that all netdrivers implement their own watchdogs. Some driver *will* need to do netif_block_tx(dev); /* Block other Tx entries. */ spinlock(np->tx_window_spinlock); change_chip_state(ioaddr); ... change_chip_state_back(ioaddr); unlock(np->tx_window_spinlock); if (full) netif_pause_tx(dev); else netif_unblock_tx(dev); ________________ /* 2.3.43+ implicitly blocks simultaneous entry. */ #define netif_block_tx(dev) do { } while (0) #define netif_unblock_tx(dev) do { } while (0) #else #define netif_block_tx(dev) set_bit(0, (void *)&(dev)->tbusy) #define netif_unblock_tx(dev) clear_bit(0, (void *)&(dev)->tbusy) or #define netif_block_tx(dev, timeout_handler, timeout) \ do { if (test_and_set_bit(0, (void *)&(dev)->tbusy) != 0) { \ if ((timeout_handler) == 0 || \ jiffies - (dev)->trans_start < (timeout)) return 1; \ else (timeout_handler)(dev); \ } } while (0) ________________ Donald Becker becker@scyld.com Scyld Computing Corporation 410 Severn Ave. Suite 210 Annapolis MD 21403 From owner-netdev@oss.sgi.com Tue May 23 04:49:39 2000 Received: by oss.sgi.com id ; Tue, 23 May 2000 04:49:29 -0700 Received: from tmp.netguide.dk ([130.227.158.8]:32517 "EHLO netguide.netguide.dk") by oss.sgi.com with ESMTP id ; Tue, 23 May 2000 04:49:09 -0700 Received: from netguide.dk (popepc.netguide.dk [130.227.158.148]) by netguide.netguide.dk (8.9.3/8.9.3) with ESMTP id MAA18315; Tue, 23 May 2000 12:46:05 +0200 Message-ID: <392A700A.DA8CE369@netguide.dk> Date: Tue, 23 May 2000 13:48:26 +0200 From: "Povl H. Pedersen" Reply-To: pope@netguide.dk Organization: NetGuide Danmark Aps X-Mailer: Mozilla 4.7 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com CC: pope@netguide.dk Subject: TCP/IP and multihomed single IF source addresses Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1940 Lines: 54 First, i am not subscribing ot the mailing list, so I would appreciate CC of mails to the list. BACKGROUND I recently had problems with lots of collisions on one of my non-switched ethernet segments, and spent quite some time localizing the source, and has now tracked it down to what I would call an unwanted feature in the TCP/IP implementation of a multihomed single interface host. The problem exists both with Linux 2.0 / 2.2 as well as AIX 4.1.5 I am not sure of the problem is in the kernel, or in libnet. EXAMPLE My configuration is: HOST 1 eth0 - 10.11.12.2/24 eth0:1 - 192.168.1.2/24 (or using IP alias) eth1 - 10.250.240.2/24 (or whatever) gw - 10.11.12.1/24 There exists a static route to 192.168.1.0/24 using 192.168.1.2 as gateway HOST 2 eth0 - 192.168.1.3/24 gw - 192.168.1.1/24 Host 1 + 2 are sitting on the same ethernet segment / collision domain. The gateway (Cisco router) is multihomed on a single interface. If host 1 opens a connection to host 2, the souce address will be 10.11.12.2, which is NOT on the same IP network as host 2. So they are transmitted 10.11.12.2 --inside machine--> 192.168.1.2 --> 192.168.1.4 . And response packets from host goes 192.168.1.3 -> 192.168.1.1 -> 10.11.12.2, and is thus transmitted twice on the physical network. IDEA FOR SOLUTION Modify the IP stack or just TCP to be intelligent about picking source addresses. At least outgoing TCP connections should pick the closest source address it knows of. This is already done if having multiple interfaces, so it should be do-able to expand it to handle multiple IPs on the same IF, or use the logical eth0:x interfaces in its search algorithm. I am willing to start coding, but I would like to get some feedback first. Searching on the Internet, it looks like the BSD Kane project might include it in their IPv6 implementation, but for the next few years, IPv4 is what I will be using. Povl H. Pedersen pope@netguide.dk From owner-netdev@oss.sgi.com Wed May 24 06:42:23 2000 Received: by oss.sgi.com id ; Wed, 24 May 2000 06:42:14 -0700 Received: from orinoco.cisco.com ([171.69.161.57]:32678 "EHLO orinoco.cisco.com") by oss.sgi.com with ESMTP id ; Wed, 24 May 2000 06:42:07 -0700 Received: from stcampos-lptp1 (dhcp-aus-162-184.cisco.com [171.69.162.184]) by orinoco.cisco.com (8.8.8/2.6/Cisco List Logging/8.8.8) with SMTP id IAA26621 for ; Wed, 24 May 2000 08:41:56 -0500 (CDT) From: "Stephen Campos" To: Subject: Packet capture with af_packet.c (Shared memory between user/kernel space) Date: Wed, 24 May 2000 08:43:10 -0500 Message-ID: <000001bfc585$ffed5790$b8a245ab@stcampos-lptp1.cisco.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 246 Lines: 9 Anyone know know of any sample code for taking advantage of the new work in af_packet.c regarding mmap kernel/user memory? I understand that someone wrote some code to share memory between user/kernel space for packet capture. Thanks -- SC From owner-netdev@oss.sgi.com Wed May 24 08:11:14 2000 Received: by oss.sgi.com id ; Wed, 24 May 2000 08:11:04 -0700 Received: from nevald.k-net.dtu.dk ([130.225.71.226]:44696 "HELO nevald.k-net.dk") by oss.sgi.com with SMTP id ; Wed, 24 May 2000 08:10:44 -0700 Received: from akp-3.bergsoe.k-net.dk (akp-3.bergsoe.dtu.dk [192.38.219.27]) by nevald.k-net.dk (Postfix) with SMTP id 704AF3C231 for ; Wed, 24 May 2000 17:10:32 +0200 (CEST) Received: (qmail 11836 invoked by uid 9); 24 May 2000 15:10:32 -0000 To: netdev@oss.sgi.com Path: not-for-mail From: "Anders K. Pedersen" Newsgroups: akp.lists.linux.netdev Subject: 3c59x.c v0.99H 24May00 failed with two 3c905 Boomerangs Date: Wed, 24 May 2000 17:10:27 +0200 Organization: AKP Consult I/S Message-ID: <392BF0E3.2A6D4DFC@akp.dk> NNTP-Posting-Host: akp-1.bergsoe.dtu.dk Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------52EAAA4DD7524BCF2BD76B94" X-Trace: akp-3.bergsoe.k-net.dk 959181032 11834 192.38.218.231 (24 May 2000 15:10:32 GMT) X-Complaints-To: newsmaster@akp.dk NNTP-Posting-Date: 24 May 2000 15:10:32 GMT X-Mailer: Mozilla 4.73 [en] (Win98; U) X-Accept-Language: da,en Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 6207 Lines: 100 This is a multi-part message in MIME format. --------------52EAAA4DD7524BCF2BD76B94 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I tried the 3c59x.c v0.99H 24May00 on a machine with two 3c905 Boomerang cards. It generated a lot of error messages in the kernel log, and neither card were reachable through the network, after starting this driver. I have attached the kernel messages as a gzip compressed file. Kernel is 2.2.15, and the last startup of the driver in the attached log is with the standard driver from this kernel, which we have not experienced any problems with. Regards, Anders K. Pedersen --------------52EAAA4DD7524BCF2BD76B94 Content-Type: application/x-gzip; name="kernel.gz" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="kernel.gz" H4sICEnnKzkAA2tlcm5lbADV3VFz2mYWxvH7/RTv5e6MAwIMBnqzTdpMPF1Pt57c78hI2Mxi lJGETb79vjg2pLM8B73nOWcm7ZW76/zBRsrPR64ON/nXMLwMg8l8MJ5nWbir803RtPX2S/hv WW/K9TyMFuPZrreYP2W92exT/Oyb/Gv8zF+qTb4uwvtyET8xPLTtl3m/vyibYtX07pvlorfJ m7x3Xz3116vNdtcv6tVTWTf9p6puy13voX1ch7/dnH/4sn3I4pP4UD3GZzLLxuF9VT2W8fPu wyDL7vKm/LwLeRuyXZll2UUIWTafZPNsOs+W8/xyfllchOvbP+Ind3q4EKa/heeqLt49r4oy 3P58E0bzcbjdzePDNF/Wq/Yi5Nu2asp1uWj7N9fXYbVpy3qZL8pex0fY/6E2/h/Notx/T8Ky 2m6K/deQF0VdNk0sXISmzdttE66mk6Jr99dNfhe/2ffhbtu8e8yb+LS+Pc7jqm1C/APh+aFa l++Wdf5Yhrp8efimWz2+DINuL0Mx/fPLMJzNZ/GDyevLMPirvgzLH+RliGfDp6ppQ1nXVX0R Pl5//D0Uq/x+E//H1SL27lcvDzmMp8P/RQf4tVVH4wcw+vZ1h3b1WBah2sbXrd395/W7Gr/G 14/Kzt0QPq7z++an77+7g4uw3K7XYfBTfNJ1+zVkYbGt63LTxlTX7Oe3p7qOX2p8at/+CU/x lVmM7i4Xw+7PML5E/3z7MyGsy819+xCmL71hHg5f/Ou/d2sODs2BWXN4aA7NmqNDc2TWvDw0 L82a40NzbNacHJoTs+bVoXll1pwemlOz5uzQnNkd88cTKbeLHs+kO7vo8VRa2EWP51JhFz2e TCWOThOjx7NpaRT95sdt2ZRtu8e0fShDtL/ef/ylesE+BaLD3+6vwu1/jPj2rA62zZJki4Hb /XPZw3MRI8tt8/I0q9CUkfi77XJ54hnOnKg82eWpBFkNlSBFUQmaFJWgSVEJmhSVoElRCZoU laBJUQmaFJWgSVEJmhSV6JinqERRikoUpahEUYpKFKWoRFGKSsGPNCqFkIrK0dCYyhGyN+Oo PN2lqUTZc1Tm3Z/h4QzP06lEzcMJnqdTiZqH8ztPpxI1D6d3nk4lah7O7jydStQ8nNx5OpWo eaAyT6cSNQ9U5ulUouaByjydStQ8UJmnUwmP+eOJlE4ljB7PpHQqYfR4KqVTCaPHcymdShg9 nkzpVMLo8WxKp1LyI4lKGCJnth8MohMz21mIFDPbWYgUM9tZiBQz21mIFDPbWYgUM9tZiBQz 21mIFDPbWYgUM9tZiBQz21mIFDPbeYgUM9t5iBQz23mIFDPbeYgUM5v017tugPG4XnpeNsUQ KIFkJpvh95AZAp2ul57u8vbaXS9FKcpeh+ulqEnZ63C9FDUpex2ul6ImZa/D9VLUpOx1uF4K j3nKXo/rpTBK2etxvRRGKSo9rpdKfqRRaX299FL4r6xUVF46jamnuzSVKKugEqUYKlGToRI1 GSpRk6ESNRkqUZOhEjUZKlGToRI1GSpRk6ESHvMMlTDKUAmjDJUwylAJowyVMMpQKfmRRKUU 0lEp2aui0mmqPN3lqbSbKlGKotJhqkRNikqHqRI1KSodpkrUpKh0mCpRk6LSYaqExzxFpcdU CaMUlR5TJYxSVHpMlZIfaVRaT5Vj66ly7DRVnu7SVKKsgkqUYqhETYZK1GSoRE2GStRkqERN hkrUZKhETYZK1GSoRE2GSnjMM1TCKEMljDJUwihDJYwyVMIoQ6XkRxKVUkhH5diaSqep8nSX p9JuqkQpikqHqRI1KSodpkrUpKh0mCpRk6LSYapETYpKh6kSHvMUlR5TJYxSVHpMlTBKUekx VUp+pFFpO1VOxI0QCipj0GWqRF2SSpxNphKn9FTipp5K3NRTiZt6KnFTTyVu6qnETT2VuKmn Ejf1VOKmnkrhmNdTKUT1VApRPZVCVE+lENVTKUT1VMp+JFAph3RU2k6VMegyVaIuT6XVVIlT FJXmUyVuUlSaT5W4SVFpPlXiJkWl+VSJmxSV5lOlcMxTVNpPlUKUotJ+qhSiFJX2U6XsRxqV 1lPlwHqqHDhNlae7NJUoq6ASpRgqUZOhEjUZKlGToRI1GSpRk6ESNRkqUZOhEjUZKlGToRIe 8wyVMMpQCaMMlTDKUAmjDJUwylAp+ZFEpRTSUWk9VQ6cpsrTXZ5Ku6kSpSgqHaZK1KSodJgq UZOi0mGqRE2KSoepEjUpKh2mSnjMU1R6TJUwSlHpMVXCKEWlx1Qp+ZFGpfVUORQ2nKuoHDpN lae7NJUoq6ASpRgqUZOhEjUZKlGToRI1GSpRk6ESNRkqUZOhEjUZKlGToRIe8wyVMMpQCaMM lTDKUAmjDJUwylAp+ZFEpRTSUWk9VfpsN0ddnkq7qdJ+uzluUlQ6TJX2281xk6LSYaq0326O mxSVDlOl/XZz4ZinqPSYKh22mwtRikqPqdJhu7nsRxqV1lPlyPp3lT7bzVG3E5WDycHK0bBr N3m9OU7p15vjpn69OW7q15vjpn69OW7q15vjpn69OW7q15vjpn69OW7q15vjpn69uXDM69eb C1H9enMhql9vLkT1682FqH69uRDVrzeXAUmy0me9OSWRNLSZrTfvAJFiaLNfb94BIsXQZr/e vANEiqHNfr15B4gUQ5v9enP5FNf9FOtwwdR+X3oX2RRToMO+9C6yKaZAh33pXWRTTIEO681l kMxkMzx/mCnQ6YKpy3pznNXY63DB1H69OW5S9jpcMLVfb46blL0OF0zt15vjJkWlwwVTh/Xm QpSi0uOCqcN6cyFKUelxwdRovbkcUlFpvN584rTeHHVpKs3Wm+MUQ6X9enPcZKi0X2+OmwyV 9uvNcZOh0n69OW4yVNqvNxeOeYZKh/XmQpSh0mG9uRBlqHRYby77kUSl8XrzGLT+z3B81puj Lk+l3VRpv94cNykqHaZK+/XmuElR6TBV2q83x02KSoep0n69uXDMU1R6TJUO682FKEWlx1Tp sN5c9iONStup8kq+sTKdyiun+yBRl6QSZ5OpxCk9lbippxI39VTipp5K3NRTiZt6KnFTTyVu 6qnETT2VuKmnUjjm9VQKUT2VQlRPpRDVUylE9VQKUT2Vsh8JVMohBZXTeTY0pXLqtLMVdUkq cTaZSpzSU4mbeipxU08lbuqpxE09lbippxI39VTipp5K3NRTiZt6KoVjXk+lENVTKUT1VApR PZVCVE+lENVTKfuRQKUc0lFpewF26rSzFXV5Kq2mSpyiqDSfKnGTotJ8qsRNikrzqRI3KSrN p0rcpKg0nyqFY56i0n6qFKIUlfZTpRClqLSfKmU/0qi0niqNL8DKQYZKl52tOKug0n5nK24y VNrvbMVNhkr7na24yVBpv7MVNxkq7Xe24iZDpcPOViHKUOmws1WIMlQ67GwVogyVRjtbz8im otJ6qvT5XSXq8lTaTZX2v6vETYpKh6nS/neVuElR6TBV2v+uEjcpKh2mSvvfVQrHPEWlx1Tp 8LtKIUpR6TFVOvyuUvYjjUrrqXJo/bvKodNUebpLU4myCipRiqESNRkqUZOhEjUZKlGToRI1 GSpRk6ESNRkqUZOhEjUZKuExz1AJowyVMMpQCaMMlTDKUAmjDJWSH0lUSiEdldYXYH12tqIu T6XdVGm/sxU3KSodpkr7na24SVHpMFXa72zFTYpKh6nSfmercMxTVHpMlQ47W4UoRaXHVOmw s1X2I41K66nSeGfr1GlTHurSVJptysMphkr7TXm4yVBpvykPNxkq7Tfl4SZDpf2mPNxkqLRf bCcc8wyVDovthChDpcNiOyHKUOmw2E72I4lK48V2MWj9u0qfxXaoy1NpN1XaL7bDTYpKh6nS frEdblJUOkyV9ovtcJOi0mGqtF9sJxzzFJUeU6XDYjshSlHpMVU6LLaT/Uij0nqqHFtPlWOn qfJ0l6YSZRVUohRDJWoyVKImQyVqMlSiJkMlajJUoiZDJWoyVKImQyVqMlTCY56hEkYZKmGU oRJGGSphlKESRhkqJT+SqJRCOiqtp8qx01R5ustTaTdVohRFpcNUiZoUlQ5TJWpSVDpMlahJ UekwVaImRaXDVAmPeYpKj6kSRikqPaZKGKWo9JgqJT/SqLSdKmfWN4vMnG4WQV2SSpxNphKn 9FTipp5K3NRTiZt6KnFTTyVu6qnETT2VuKmnEjf1VOKmnkrhmNdTKUT1VApRPZVCVE+lENVT KUT1VMp+JFAph1RUDm0vwM6cbhZBXZpKlFVQiVIMlajJUImaDJWoyVCJmgyVqMlQiZoMlajJ UImaDJWoyVAJj3mGShhlqIRRhkoYZaiEUYZKGGWolPxIolIK6ai8sqbSaap0uVkEZzVUOkyV 9jeL4CZFpcNUaX+zCG5SVDpMlfY3i+AmRaXDVOlws4gQpaj0mCodbhYRohSVHlOl0c0ickhF pfHNIjOnm0VQl6bS7GYRnGKotL9ZBDcZKu1vFsFNhkr7m0Vwk6HS/mYR3GSotL9ZRDjmGSod bhYRogyVDjeLCFGGSoebRWQ/kqg0vllkNr8cGVPp89bOqEtTafbWzjjFUGn/1s64yVBp/9bO uMlQaf/WzrjJUGn/1s64yVBp/9bOwjHPUOnw1s5ClKHS4a2dhShDpcNbO8t+JFFp/NbOM+u3 dp45vbUz6vJU2l2AtX9rZ9ykqHS4AGv/1s64SVHpcAHW/q2dcZOi0uECrP1bOwvHPEWlxwVY h7d2FqIUlR4XYB3e2ln2I41KgwuwvV7vu+hwAN7Za7QYz3a9xfwp681mn+Ix1R9c9WfT8Eu1 yddFeF8u4meGh7b9Mu/3F2VTrJrefbNc9DZ5k/fuq6f+erXZ7vpFvXoq66b/VNVtues9tI/r 0OXx4xcVz43Rh+oxPpVZNg7vq+qxjJ93H0+a7C5vyvgF5pGx3V7Wi0hSNp9k82w6z5bz/HJ+ WVyE69s/4id3erj4V8Zv4bmqi3fPq6IMtz/fxL/nx+F2N99/H7+sV5H1fNtWTbkuF23/5vo6 vLxCy3xR9jo+wv4Pvfy4sCj335OwrLbxR474NeRFUZdNEwsXb6/Z1XRSdO3+usnv1vuj5rsf GN5+LGlC/APh+aFal++Wdf5YxqPh5eGbbvVvL8Pv8enWq2L/GP/+cB3WeVtuFl9ffuSpw98/ fPzX53+Et8O4Wob4cmzK5/CUr7dlWDVhNOz8YINur3kx/fNrPpzNZ/GDyetrPvirvubLH+Q1 H9i85v8D5vUjJ+gQAQA= --------------52EAAA4DD7524BCF2BD76B94-- From owner-netdev@oss.sgi.com Wed May 24 10:11:25 2000 Received: by oss.sgi.com id ; Wed, 24 May 2000 10:11:15 -0700 Received: from ertpg14e1.nortelnetworks.com ([47.234.0.35]:53429 "EHLO ertpg14e1.nortelnetworks.com") by oss.sgi.com with ESMTP id ; Wed, 24 May 2000 10:11:06 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by ertpg14e1.nortelnetworks.com; Wed, 24 May 2000 11:35:13 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJ6TLH; Wed, 24 May 2000 23:35:08 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id KXY57JJR; Thu, 25 May 2000 01:35:12 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.47]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id BAA28867; Thu, 25 May 2000 01:35:14 +1000 Message-ID: <392BF736.F95AB01@uow.edu.au> Date: Thu, 25 May 2000 01:37:26 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: "Anders K. Pedersen" CC: netdev@oss.sgi.com Subject: Re: 3c59x.c v0.99H 24May00 failed with two 3c905 Boomerangs References: <392BF0E3.2A6D4DFC@akp.dk> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 809 Lines: 23 "Anders K. Pedersen" wrote: > > I tried the 3c59x.c v0.99H 24May00 on a machine with two 3c905 > Boomerang cards. It generated a lot of error messages in the kernel > log, and neither card were reachable through the network, after > starting this driver. I have attached the kernel messages as a gzip > compressed file. > > Kernel is 2.2.15, and the last startup of the driver in the attached > log is with the standard driver from this kernel, which we have not > experienced any problems with. A "catastrophic Host error". Receive underruns on both NICs, caused by the host reading data faster than the NIC is receiving it. That's a new one. This is a different machine, I assume? Is there anything in common which could cause them to play up? I'll send you a few other drivers to try.. -- -akpm- From owner-netdev@oss.sgi.com Wed May 24 10:48:55 2000 Received: by oss.sgi.com id ; Wed, 24 May 2000 10:48:45 -0700 Received: from nevald.k-net.dtu.dk ([130.225.71.226]:35733 "HELO nevald.k-net.dk") by oss.sgi.com with SMTP id ; Wed, 24 May 2000 10:48:35 -0700 Received: from akp-3.bergsoe.k-net.dk (akp-3.bergsoe.dtu.dk [192.38.219.27]) by nevald.k-net.dk (Postfix) with SMTP id 484353C486 for ; Wed, 24 May 2000 19:48:24 +0200 (CEST) Received: (qmail 16678 invoked by uid 9); 24 May 2000 17:48:24 -0000 To: netdev@oss.sgi.com Path: not-for-mail From: "Anders K. Pedersen" Newsgroups: akp.lists.linux.netdev Subject: Re: 3c59x.c v0.99H 24May00 failed with two 3c905 Boomerangs Date: Wed, 24 May 2000 19:48:16 +0200 Organization: AKP Consult I/S Message-ID: <392C15E0.13D9BFCA@akp.dk> References: <392BF0E3.2A6D4DFC@akp.dk> <392BF736.F95AB01@uow.edu.au> NNTP-Posting-Host: akp-1.bergsoe.dtu.dk Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: akp-3.bergsoe.k-net.dk 959190504 16676 192.38.218.231 (24 May 2000 17:48:24 GMT) X-Complaints-To: newsmaster@akp.dk NNTP-Posting-Date: 24 May 2000 17:48:24 GMT To: Andrew Morton X-Mailer: Mozilla 4.73 [en] (Win98; U) X-Accept-Language: da,en Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1233 Lines: 33 Andrew Morton wrote: > "Anders K. Pedersen" wrote: > > I tried the 3c59x.c v0.99H 24May00 on a machine with two 3c905 > > Boomerang cards. It generated a lot of error messages in the kernel > > log, and neither card were reachable through the network, after > > starting this driver. I have attached the kernel messages as a gzip > > compressed file. > > > > Kernel is 2.2.15, and the last startup of the driver in the attached > > log is with the standard driver from this kernel, which we have not > > experienced any problems with. > > A "catastrophic Host error". Receive underruns on both NICs, caused by > the host reading data faster than the NIC is receiving it. That's a new > one. > > This is a different machine, I assume? Is there anything in common > which could cause them to play up? It is a pentium 133 (with F0 0F bug) working as a firewall for a 256 kbit Internet connection, so the load values on this machine is very low (close to zero most of the time). There are no resource conflicts that I am aware of - eth0 is at int 10 base address 0xe000 and eth1 is at int 11 base address 0xd800. > I'll send you a few other drivers to try.. I have sent the results directly to you. Regards, Anders K. Pedersen From owner-netdev@oss.sgi.com Wed May 24 12:17:28 2000 Received: by oss.sgi.com id ; Wed, 24 May 2000 12:17:08 -0700 Received: from colorfullife.com ([216.156.138.34]:41991 "EHLO colorfullife.com") by oss.sgi.com with ESMTP id ; Wed, 24 May 2000 12:16:59 -0700 Received: from colorfullife.com (localhost [127.0.0.1]) by colorfullife.com (8.9.3/8.9.3) with ESMTP id RAA30685; Wed, 24 May 2000 17:30:10 -0400 Message-ID: <392C38BF.F45193DA@colorfullife.com> Date: Wed, 24 May 2000 22:17:03 +0200 From: Manfred Spraul X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.3.99-pre9 i686) X-Accept-Language: en, de MIME-Version: 1.0 To: davem@redhat.com, linux-net@vger.rutgers.edu, netdev@oss.sgi.com Subject: sock_alloc_send_skb(): wrong GFP flags? Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 947 Lines: 42 from net/core/sock.c, sock_alloc_send_skb(): >>>>>>> if (fallback) { /* The buffer get won't block, or use the atomic queue. * It does produce annoying no free page messages still. */ skb = alloc_skb(size, GFP_BUFFER); if (skb) break; try_size = fallback; } skb = alloc_skb(try_size, sk->allocation); <<<<<<<< IMHO the flags are wrong: Someone modified GFP_BUFFER between 2.2.15 and 2.3.99-pre9: was: GFP_BUFFER == GFP_MED|GFP_WAIT now: GFP_BUFFER == GFP_HIGH|GFP_WAIT It will eat the memory for atomic allocations, and it will sleep instead of downgrading to fallback. What about sk->allocation & (~(GFP_WAIT|GFP_IO|GFP_HIGH)) Should I write a patch? Btw, I'd reorder the allocations in alloc_skb(): I'm sure that the actual data kmalloc fails far more often than the skb_head alloc. We should allocate the data area first, and then the skb_head. I'm not subscribed to linux-net, please cc -- Manfred From owner-netdev@oss.sgi.com Wed May 24 13:21:56 2000 Received: by oss.sgi.com id ; Wed, 24 May 2000 13:21:36 -0700 Received: from lightning.swansea.uk.linux.org ([194.168.151.1]:53281 "EHLO the-village.bc.nu") by oss.sgi.com with ESMTP id ; Wed, 24 May 2000 13:21:13 -0700 Received: from alan by the-village.bc.nu with local (Exim 2.12 #1) id 12uiY4-0006Eg-00; Wed, 24 May 2000 22:18:40 +0100 Subject: Re: sock_alloc_send_skb(): wrong GFP flags? To: manfreds@colorfullife.com (Manfred Spraul) Date: Wed, 24 May 2000 22:18:38 +0100 (BST) Cc: davem@redhat.com, linux-net@vger.rutgers.edu, netdev@oss.sgi.com In-Reply-To: <392C38BF.F45193DA@colorfullife.com> from "Manfred Spraul" at May 24, 2000 10:17:03 PM X-Mailer: ELM [version 2.5 PL1] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: From: Alan Cox Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 557 Lines: 24 > IMHO the flags are wrong: > Someone modified GFP_BUFFER between 2.2.15 and 2.3.99-pre9: > > was: > GFP_BUFFER == GFP_MED|GFP_WAIT > now: > GFP_BUFFER == GFP_HIGH|GFP_WAIT > > It will eat the memory for atomic allocations, and it will sleep instead > of downgrading to fallback. It needs to allocate non atomic memory only > Should I write a patch? Please. I had completely missed that > I'm sure that the actual data kmalloc fails far more often than the > skb_head alloc. We should allocate the data area first, and then the > skb_head. True From owner-netdev@oss.sgi.com Thu May 25 10:33:17 2000 Received: by oss.sgi.com id ; Thu, 25 May 2000 10:32:57 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:50954 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Thu, 25 May 2000 10:32:36 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA21251; Thu, 25 May 2000 22:31:42 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005251831.WAA21251@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: andrewm@uow.edu.au (Andrew Morton) Date: Thu, 25 May 2000 22:31:42 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <3923F8CD.AECBDA6D@uow.edu.au> from "Andrew Morton" at May 19, 0 00:06:05 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1235 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > for (;;) { > unsigned long flags; > int running; > > spin_lock_irqsave(&timerlist_lock, flags); > > ** The timer handler could be running now. Of course! It is exactly the situation, when del_timer_sync() is different of del_timer(). > It can delete the > timer and kfree it, or reuse its memory for something else, > or turn it into a semantically different timer ** Yes, and in this case you cannot use del_timer_sync() and have to use generic reference counting scheme. del_timer_sync() is used by process, which _owns_ this timer and have exclusive right to destroy it. See? If timer handler is self-restartable, del_timer_sync() guarantees that timer is not running after exit from del_timer_sync(), so that you may destroy it safely. Another (more complicated) scheme is used by TCP (net/ipv4/tcp_timer.c), by neighbour cache (core/neighbour.c) etc. In these cases timer "thread" has equal rights with process threads and we have to use exact reference counting to wait for last timer user. del_timer_sync() does nothing useful in this case, but it also does not fail, because anyone operating on timer holds it with reference count. Alexey From owner-netdev@oss.sgi.com Fri May 26 09:57:48 2000 Received: by oss.sgi.com id ; Fri, 26 May 2000 09:57:26 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:55957 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Thu, 25 May 2000 15:56:37 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by smtprch1.nortel.com; Thu, 25 May 2000 18:20:28 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJ6X41; Fri, 26 May 2000 07:20:23 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PTSZ1; Fri, 26 May 2000 09:20:29 +1000 Received: from uow.edu.au (IDENT:akpm@localhost [127.0.0.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id JAA11611; Fri, 26 May 2000 09:20:27 +1000 Message-ID: <392DB53B.2EB21297@uow.edu.au> Date: Thu, 25 May 2000 23:20:27 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.16pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev Subject: Re: tx_timeout and timer serialisation References: <3923F8CD.AECBDA6D@uow.edu.au> from "Andrew Morton" at May 19, 0 00:06:05 am <200005251831.WAA21251@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing You're back! I posted a patch to the async del_timer() behaviour onto linux-kernel yesterday. I think we need to change the default behaviour of del_timer() to be async, and then run around and fix the resulting deadlocks. If we accept anything less, we have tens or even hundreds of subtle bugs. I'd be very interested in your opinions. -- -akpm- From owner-netdev@oss.sgi.com Fri May 26 09:57:50 2000 Received: by oss.sgi.com id ; Fri, 26 May 2000 09:57:26 -0700 Received: from ertpg14e1.nortelnetworks.com ([47.234.0.35]:61658 "EHLO ertpg14e1.nortelnetworks.com") by oss.sgi.com with ESMTP id ; Thu, 25 May 2000 15:27:08 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by ertpg14e1.nortelnetworks.com; Thu, 25 May 2000 19:22:34 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJ6X4M; Fri, 26 May 2000 07:22:29 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PTSZJ; Fri, 26 May 2000 09:22:35 +1000 Received: from uow.edu.au (IDENT:akpm@localhost [127.0.0.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id JAA11621; Fri, 26 May 2000 09:22:38 +1000 Message-ID: <392DB5BE.9841537F@uow.edu.au> Date: Thu, 25 May 2000 23:22:38 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.16pre4 i686) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <3923F8CD.AECBDA6D@uow.edu.au> from "Andrew Morton" at May 19, 0 00:06:05 am <200005251831.WAA21251@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > ... > > It can delete the > > timer and kfree it, or reuse its memory for something else, > > or turn it into a semantically different timer ** > > Yes, and in this case you cannot use del_timer_sync() and > have to use generic reference counting scheme. To use refcounting we need a new sort of timer, so the timer core can manage their storage. So instead of having: struct some_struct { ... struct timer_list timer; ... } we have: struct some_struct { ... struct new_timer *timer; ... } -- -akpm- From owner-netdev@oss.sgi.com Fri May 26 09:57:51 2000 Received: by oss.sgi.com id ; Fri, 26 May 2000 09:57:27 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:60685 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 26 May 2000 09:13:04 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA32576; Fri, 26 May 2000 20:17:24 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005261617.UAA32576@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: andrewm@uow.edu.au (Andrew Morton) Date: Fri, 26 May 2000 20:17:24 +0400 (MSK DST) Cc: netdev@oss.sgi.com, davem@redhat.com (Dave Miller) In-Reply-To: <392DB5BE.9841537F@uow.edu.au> from "Andrew Morton" at May 25, 0 11:22:38 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 336 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I posted a patch to the async del_timer() behaviour onto linux-kernel > yesterday. I'll look now. > To use refcounting we need a new sort of timer, so the timer core can > manage their storage. Khm... Well, it is really universal solution, but I am not sure in its impact to performance. It looks as redundancy... Alexey From owner-netdev@oss.sgi.com Fri May 26 09:57:56 2000 Received: by oss.sgi.com id ; Fri, 26 May 2000 09:57:27 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:7438 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 26 May 2000 09:24:05 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA00756; Fri, 26 May 2000 21:23:51 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005261723.VAA00756@ms2.inr.ac.ru> Subject: Re: ICMPv6 Echo Reply bug in Linux 2.2.15 ? To: tmoestl@gmx.NET (Thomas Moestl) Date: Fri, 26 May 2000 21:23:51 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000515180749.A479@flux.local> from "Thomas Moestl" at May 15, 0 08:13:14 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 97 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > If somebody needs it, I can post some code. Please, post some tcpdump better. Alexey From owner-netdev@oss.sgi.com Fri May 26 09:57:56 2000 Received: by oss.sgi.com id ; Fri, 26 May 2000 09:57:27 -0700 Received: from Cantor.suse.de ([194.112.123.193]:41734 "HELO Cantor.suse.de") by oss.sgi.com with SMTP id ; Fri, 26 May 2000 09:40:25 -0700 Received: from Hermes.suse.de (Hermes.suse.de [194.112.123.136]) by Cantor.suse.de (Postfix) with ESMTP id 78D411E0F5 for ; Fri, 26 May 2000 12:09:08 +0200 (MEST) Received: from Thor.suse.de (Thor.suse.de [10.10.11.1]) by Hermes.suse.de (Postfix) with ESMTP id 50A5410A026 for ; Fri, 26 May 2000 12:09:08 +0200 (MEST) Received: by Thor.suse.de (Postfix, from userid 628) id E95638D791; Fri, 26 May 2000 12:09:06 +0200 (CEST) Date: Fri, 26 May 2000 12:09:06 +0200 From: Olaf Hering To: netdev@oss.sgi.com Subject: tap devices, ddp, multicast Message-ID: <20000526120906.B30208@suse.de> Reply-To: netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi, is there a way to use the ddp protocol over a tap device? I use Mac-on-Linux (http://www.maconlinux.org) and have a running network connetion, over eth0 or over a tap device with MASQ. I can see other Macs in the Apple chooser if I go over eth0, but when I use tap0 the local netatalk server can't start multicast. This is not enabled in the ethertap driver, the following patch solve this: --- linux-2.2.14.SuSE/drivers/net/Config.in Mon May 8 09:42:18 2000 +++ OLAF/drivers/net/Config.in Sun May 14 16:57:59 2000 @@ -23,6 +23,9 @@ if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then if [ "$CONFIG_NETLINK" = "y" ]; then tristate 'Ethertap network tap' CONFIG_ETHERTAP + if [ "$CONFIG_ETHERTAP" != "n" ]; then + bool 'Enable Multicast for tap devices' CONFIG_ETHERTAP_MC + fi fi fi netatalk is now happy, but it can't register the atalk devices. I can them only access via the IP address. I use there commands to setup the tap0 device: /sbin/ifconfig tap0 192.168.200.1 netmask 255.255.255.0 up arp /sbin/route add -host 192.168.200.1 tap0 echo "1" > /proc/sys/net/ipv4/conf/tap0/proxy_arp /sbin/arp -s 192.168.200.2 FE:FD:00:00:00:00 pub echo "1" > /proc/sys/net/ipv4/ip_forward After mol is up and running I have to use another hw address, otherwise MacOS tries to talk to itself. /sbin/ifconfig tap0 down /sbin/ifconfig tap0 multicast hw ether FE:FD:00:00:00:01 up Now I have a IP network between Linux and MacOS. Is there a way to enable ddp over the tap device? Or is this not possible per definition? Gruss Olaf -- $ man clone BUGS Main feature not yet implemented... From owner-netdev@oss.sgi.com Fri May 26 10:57:26 2000 Received: by oss.sgi.com id ; Fri, 26 May 2000 10:57:16 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:32784 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Fri, 26 May 2000 10:56:56 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA03942; Fri, 26 May 2000 22:56:44 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005261856.WAA03942@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: andrewm@uow.edu.au (Andrew Morton) Date: Fri, 26 May 2000 22:56:44 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <392DB53B.2EB21297@uow.edu.au> from "Andrew Morton" at May 25, 0 11:20:27 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 531 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > I posted a patch to the async del_timer() behaviour onto linux-kernel > yesterday. > > I think we need to change the default behaviour of del_timer() to be > async, and then run around and fix the resulting deadlocks. If we accept > anything less, we have tens or even hundreds of subtle bugs. > > I'd be very interested in your opinions. Seems, I like this. 8) Only I still do not undertsnad how it works. 8) I am lost in these patches... Could you prepare one large patch doing all of the described things? Alexey From owner-netdev@oss.sgi.com Fri May 26 11:26:37 2000 Received: by oss.sgi.com id ; Fri, 26 May 2000 11:26:26 -0700 Received: from pop.gmx.net ([194.221.183.20]:54053 "HELO mail.gmx.net") by oss.sgi.com with SMTP id ; Fri, 26 May 2000 11:26:21 -0700 Received: (qmail 8621 invoked by uid 0); 26 May 2000 19:26:14 -0000 Received: from pc19ebf72.dip.t-dialin.net (HELO flux.local) (193.158.191.114) by mail04.rzmi.gmx.net with SMTP; 26 May 2000 19:26:14 -0000 Received: from thomas by flux.local with local (Exim 3.13 #1) id 12vPq4-00007B-00; Fri, 26 May 2000 21:32:08 +0200 Date: Fri, 26 May 2000 21:32:08 +0200 From: Thomas Moestl To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: ICMPv6 Echo Reply bug in Linux 2.2.15 ? Message-ID: <20000526213208.B350@flux.local> References: <20000515180749.A479@flux.local> <200005261723.VAA00756@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <200005261723.VAA00756@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Fri, May 26, 2000 at 09:23:51PM +0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hi! > > If somebody needs it, I can post some code. > Please, post some tcpdump better. OK, here it comes. The problem seems to be a little different from what I first thought, though. tcpdump -i eth0 shows no traffic (eth0 was the interface whose link-local address I tried to ping) ----------------------------------------------------------------------------- tcpdump -i lo ----------------------------------------------------------------------------- 21:06:53.017227 ::1 > fe80::200:86ff:fe39:cb0 icmpv6: echo request 21:06:53.017227 ::1 > fe80::200:86ff:fe39:cb0 icmpv6: echo request 21:06:53.017275 fe80::200:86ff:fe39:cb0 > ::1 icmpv6: echo reply 21:06:53.017275 fe80::200:86ff:fe39:cb0 > ::1 icmpv6: echo reply ----------------------------------------------------------------------------- , which is certainly strange, because recvfrom actually returns ::1 in the from field in my code! (btw, the double packets listed are a bug in my libpcap or tcpdump, I think). So, the ICMPv6 module seems to be ok, and the error is somewhere later in the chain ;-) I will post some code anyway, perhaps it's my fault ;-) As you can see, I use recvfrom in the relevant part, and almost immediately put the address out, so I hope I did not clobber it. The address is however ::1, although the packet source address must have been correct. ----------------------------------------------------------------------------- static int ping6(struct in6_addr a, int timeout, int rep) { char buf[1024]; int i,tm; int rve=1; int len; int isock,osock; struct icmp6_filter f; struct sockaddr_in6 from; struct icmp6_hdr icmpd; struct icmp6_hdr *icmpp; struct msghdr msg; unsigned short id=(unsigned short)(rand()&0xffff); socklen_t sl; ICMP6_FILTER_SETBLOCKALL(&f); ICMP6_FILTER_SETPASS(ICMP6_ECHO_REQUEST,&f); for (i=0;iicmp6_id),ntohs(icmpp->icmp6_id)==id?"OK":"mismatch", ntohs(icmpp->icmp6_seq),ntohs(icmpp->icmp6_seq)<=i?"OK":"mismatch"); if (ntohs(icmpp->icmp6_id)==id && ntohs(icmpp->icmp6_seq)<=i) { close(osock); close(isock); return (i-ntohs(icmpp->icmp6_seq))*timeout+tm; /* return the number of ticks */ } } } else { if (errno!=EAGAIN) { close(osock); close(isock); return -1; /* error */ } } usleep(100000); tm++; } while (tm; Mon, 29 May 2000 20:38:57 -0700 Received: from linuxcare.com.au ([203.29.91.49]:30980 "EHLO front.linuxcare.com.au") by oss.sgi.com with ESMTP id ; Mon, 29 May 2000 20:38:42 -0700 Received: from halfway (penicillin.linuxcare.com.au [10.61.2.27]) by front.linuxcare.com.au (8.9.3/8.9.3/Debian 8.9.3-21) with ESMTP id OAA20877; Tue, 30 May 2000 14:38:07 +1000 X-Authentication-Warning: front.linuxcare.com.au: Host penicillin.linuxcare.com.au [10.61.2.27] claimed to be halfway Received: from linuxcare.com.au (localhost [127.0.0.1]) by halfway (Postfix) with ESMTP id 48AAA8189; Tue, 30 May 2000 14:08:10 +0930 (CST) From: Rusty Russell To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation In-reply-to: Your message of "Thu, 11 May 2000 17:09:29 +0400." <200005111309.RAA32574@ms2.inr.ac.ru> Date: Tue, 30 May 2000 14:08:10 +0930 Message-Id: <20000530043810.48AAA8189@halfway> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing In message <200005111309.RAA32574@ms2.inr.ac.ru> you write: > Actually, existing *_timer primitives are very inconvenient. > And I did not find any good way to improve them. Essentially, > del_timer_sync(), timer->running and mod_timer() returning > value are all that I was able to do. There's still a tiny race with unloading modules between timer_exit() and return in the timer handler. A better interface would be to make the timerfn return a pointer to the timer: struct timer_list *function(unsigned long data); Get rid of the braindead timer_exit() macro, and inside run_timer_list, do: #ifdef CONFIG_SMP if (timer->function(timer->data)) { timer->running = 0; mb(); } #else timer->function(timer->data); #endif Then del_timer_sync() becomes the default, and del_timer_async() is used for self-deleting timers and special effects. (BTW, returning a pointer not an int so bugs like `kfree(timer); return timer;' are more obvious). So can we live with the current braindeath in 2.4? Rusty. -- Hacking time. From owner-netdev@oss.sgi.com Mon May 29 23:35:17 2000 Received: by oss.sgi.com id ; Mon, 29 May 2000 23:35:08 -0700 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:18185 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Mon, 29 May 2000 23:34:48 -0700 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.9.3) with ESMTP id BAA19517; Tue, 30 May 2000 01:10:36 -0700 Message-ID: <3933777C.E562388C@candelatech.com> Date: Tue, 30 May 2000 01:10:36 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton , netdev@oss.sgi.com Subject: Re: Plans for 2.5 / 2.6 ??? References: <39331883.DA18E159@candelatech.com> <39331B48.9BADB2B7@uow.edu.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Andrew Morton wrote: > > Ben Greear wrote: > > > > I'd like to get an 802.1Q VLAN patch in... > > Yep. > > It would be useful if you could write a little web page which tells us > what to do and to circulate that on netdev@oss.sgi.com > > Need to think about how the n/w driver maintainers can actually test the > new feature as well... The patch, and instructions for use/installation can be found at: http://scry.wanfear.com/~greear/vlan.html A small change is needed to get it to patch against the latest 2.3.X kernel, I'll try to update that patch soon (The fix should be obvious to kernel hackers..) To test, you just need two machines and a NIC or two in each. Could probably even work out a way to loop back between two NICs on a single machine... The drivers need to be able to handle pkts that are 4 bytes bigger than normal, but they should still have the MTU of 1500, when seen as a regular ethernet device. Not sure the best way to do this...but some drivers already work (in 2.2.X at least), and a patch for the 3-com can be found in the HOWTO linked off my page... Basically, if ping foo.local.net and ping -s 1472 (1500 total payload, right?) foo.local.net works, where foo is a VLAN device, then the driver is probably OK. Ethereal, and a modified tcpdump, distributed with the VLAN pkg on my page, can snoop/decode the VLAN packets. Ethereal is better and prettier, IMHO :) There is another VLAN implementation linked from my page as well... The driver issues will be similar with either VLAN implementation, however. Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Tue May 30 00:14:38 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 00:14:28 -0700 Received: from smtprch2.nortelnetworks.com ([192.135.215.15]:19861 "EHLO smtprch2.nortel.com") by oss.sgi.com with ESMTP id ; Tue, 30 May 2000 00:14:15 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch2.nortel.com; Tue, 30 May 2000 03:11:15 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LW2B9X76; Tue, 30 May 2000 03:14:07 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PT8X1; Tue, 30 May 2000 18:14:16 +1000 Received: from uow.edu.au (IDENT:akpm@notebook3.asiapac.nortel.com [47.181.194.44]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id SAA09160; Tue, 30 May 2000 18:14:00 +1000 Message-ID: <393378F4.4AC33E9@uow.edu.au> Date: Tue, 30 May 2000 18:16:52 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Rusty Russell CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: Your message of "Thu, 11 May 2000 17:09:29 +0400." <200005111309.RAA32574@ms2.inr.ac.ru> <20000530043810.48AAA8189@halfway> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Orig: Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Rusty Russell wrote: > > In message <200005111309.RAA32574@ms2.inr.ac.ru> you write: > > Actually, existing *_timer primitives are very inconvenient. > > And I did not find any good way to improve them. Essentially, > > del_timer_sync(), timer->running and mod_timer() returning > > value are all that I was able to do. > > There's still a tiny race with unloading modules between timer_exit() > and return in the timer handler. Yep. > A better interface would be to make the timerfn return a pointer to > the timer: > > struct timer_list *function(unsigned long data); > > Get rid of the braindead timer_exit() macro, and inside > run_timer_list, do: > > #ifdef CONFIG_SMP > if (timer->function(timer->data)) { > timer->running = 0; > mb(); > } > #else > timer->function(timer->data); > #endif Yes. My patch (which is going out in about two hours...) has the same characteristics, but records the fact that a timer is running by writing a pointer to it into a global var in kernel/timer.c. In del_timer, spin until that pointer is not equal to the timer-being-deleted. So my patch does not require every timer handler to be visited. But this is not necessarily an advantage! They _all_ need looking at. It doesn't take very long - I've audited about 100 del_timer calls over three evenings (500 to go) but some of the fixes are non-obvious. > Then del_timer_sync() becomes the default, and del_timer_async() is > used for self-deleting timers and special effects. Yes. I talked Alan into this idea, but now I don't think it's prudent. It seems that about 70% of calls to del_timer are racy and 20% are deadlocky :( Some are both racy with async and deadlocky with sync! So either way we lose. What I will now propose is: - Fix del_timer_sync - Rename del_timer to del_timer_async - #define del_timer del_timer_async OK, a no-op so far. - Spend a couple of weeks working on net, scsi, ide, drivers/net migrating all del_timer calls to either del_timer_async or del_timer_sync. - Change the #define to #define del_timer del_timer_sync, see what happens. There will be some deadlocks, but the fancy deadlock detection code will catch them quickly. - Spend the next year banishing del_timer _completely_. So the presence of del_timer indicates "unaudited, possibly buggy" code. > (BTW, returning a pointer not an int so bugs like `kfree(timer); > return timer;' are more obvious). > > So can we live with the current braindeath in 2.4? Ship with known bugs? Please, no. We need to at least fix them in areas which are important to SMP. core, net, disk. From owner-netdev@oss.sgi.com Tue May 30 04:22:39 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 04:22:30 -0700 Received: from ertpg14e1.nortelnetworks.com ([47.234.0.35]:6645 "EHLO ertpg14e1.nortelnetworks.com") by oss.sgi.com with ESMTP id ; Tue, 30 May 2000 04:22:09 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by ertpg14e1.nortelnetworks.com; Tue, 30 May 2000 08:12:03 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJ7HB6; Tue, 30 May 2000 20:11:57 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PT85Q; Tue, 30 May 2000 22:12:04 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.44]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id WAA10445; Tue, 30 May 2000 22:11:50 +1000 Message-ID: <3933B0B2.50AB5EA1@uow.edu.au> Date: Tue, 30 May 2000 22:14:42 +1000 X-Sybari-Trust: b6c52953 04003204 04004e00 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Alexey Kuznetsov CC: "netdev@oss.sgi.com" Subject: [timers] net/ipv4 Content-Type: multipart/mixed; boundary="------------858EFF1C8DC71475B0F1489C" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. --------------858EFF1C8DC71475B0F1489C Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, Alexey. You're the first cab off the rank :) Here are the results of reviewing net/ipv4/* for timer deletion safety. I believe there are some races in there. I have marked these with the string REVIEWME. Additional commentary is in ipv4.txt I have attached a proposed patch in which I have made _all_ timer handlers use del_timer_async, so this patch is a no-op. (I'm really getting sick of staring at timer handlers) -- -akpm- --------------858EFF1C8DC71475B0F1489C Content-Type: text/plain; charset=us-ascii; name="net_ipv4.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="net_ipv4.patch" --- linux-2.4.0test1-ac5/net/ipv4/igmp.c Wed Apr 26 22:52:03 2000 +++ linux-akpm/net/ipv4/igmp.c Tue May 30 21:22:23 2000 @@ -142,7 +142,7 @@ static __inline__ void igmp_stop_timer(struct ip_mc_list *im) { spin_lock_bh(&im->lock); - if (del_timer(&im->timer)) + if (del_timer_async(&im->timer)) atomic_dec(&im->refcnt); im->tm_running=0; im->reporter = 0; @@ -165,7 +165,7 @@ { spin_lock_bh(&im->lock); im->unsolicit_count = 0; - if (del_timer(&im->timer)) { + if (del_timer_async(&im->timer)) { if ((long)(im->timer.expires-jiffies) < max_delay) { add_timer(&im->timer); im->tm_running=1; --- linux-2.4.0test1-ac5/net/ipv4/ip_fragment.c Sat Apr 29 21:22:41 2000 +++ linux-akpm/net/ipv4/ip_fragment.c Tue May 30 21:24:29 2000 @@ -150,7 +150,7 @@ qp->iph->saddr == saddr && qp->iph->daddr == daddr && qp->iph->protocol == protocol) { - del_timer(&qp->timer); + del_timer_async(&qp->timer); break; } } @@ -170,7 +170,7 @@ struct ipfrag *fp; /* Stop the timer for this entry. */ - del_timer(&qp->timer); + del_timer_async(&qp->timer); /* Remove this entry from the "incomplete datagrams" queue. */ if(qp->next) --- linux-2.4.0test1-ac5/net/ipv4/ipmr.c Mon May 15 21:24:15 2000 +++ linux-akpm/net/ipv4/ipmr.c Tue May 30 21:29:45 2000 @@ -756,10 +756,15 @@ uc->mfc_mcastgrp == c->mfc_mcastgrp) { *cp = uc->next; if (atomic_dec_and_test(&cache_resolve_queue_len)) - del_timer(&ipmr_expire_timer); + del_timer_async(&ipmr_expire_timer); break; } } + /* + * REVIEWME: this function can be called from process context, yet I + * did a del_timer_async. Can we use del_timer_sync? Can the handler + * and this code fight each other? + */ spin_unlock_bh(&mfc_unres_lock); if (uc) { --- linux-2.4.0test1-ac5/net/ipv4/route.c Sat Apr 29 21:22:41 2000 +++ linux-akpm/net/ipv4/route.c Tue May 30 21:34:47 2000 @@ -395,7 +395,14 @@ spin_lock_bh(&rt_flush_lock); - if (del_timer(&rt_flush_timer) && delay > 0 && rt_deadline) { +/* + * REVIEWME: rt_cache_flush() can be called from process context. The above + * spinlock won't help us - rt_check_expire() could be running now (and doesn't use + * the lock anyway + * Should we use del_timer_sync() here? + */ + + if (del_timer_async(&rt_flush_timer) && delay > 0 && rt_deadline) { long tmo = (long)(rt_deadline - now); /* If flush timer is already running --- linux-2.4.0test1-ac5/net/ipv4/tcp_timer.c Mon May 15 21:24:15 2000 +++ linux-akpm/net/ipv4/tcp_timer.c Tue May 30 21:59:17 2000 @@ -81,7 +81,7 @@ * The delayed ack timer can be set if we are changing the * retransmit timer when removing acked frames. */ - if (timer_pending(&tp->probe_timer) && del_timer(&tp->probe_timer)) + if (timer_pending(&tp->probe_timer) && del_timer_async(&tp->probe_timer)) __sock_put(sk); if (when > TCP_RTO_MAX) { printk(KERN_DEBUG "reset_xmit_timer sk=%p when=0x%lx, caller=%p\n", sk, when, NET_CALLER(sk)); @@ -110,14 +110,17 @@ { struct tcp_opt *tp = &sk->tp_pinfo.af_tcp; - if(timer_pending(&tp->retransmit_timer) && del_timer(&tp->retransmit_timer)) +/* + * REVIEWME: this function can be called from process context. See ipv4.txt + */ + if(timer_pending(&tp->retransmit_timer) && del_timer_async(&tp->retransmit_timer)) __sock_put(sk); - if(timer_pending(&tp->delack_timer) && del_timer(&tp->delack_timer)) + if(timer_pending(&tp->delack_timer) && del_timer_async(&tp->delack_timer)) __sock_put(sk); tp->ack.blocked = 0; - if(timer_pending(&tp->probe_timer) && del_timer(&tp->probe_timer)) + if(timer_pending(&tp->probe_timer) && del_timer_async(&tp->probe_timer)) __sock_put(sk); - if(timer_pending(&sk->timer) && del_timer(&sk->timer)) + if(timer_pending(&sk->timer) && del_timer_async(&sk->timer)) __sock_put(sk); } @@ -368,8 +371,12 @@ tw->pprev_death = NULL; tcp_tw_put(tw); if (--tcp_tw_count == 0) - del_timer(&tcp_tw_timer); + del_timer_async(&tcp_tw_timer); } +/* + * REVIEWME: this function can be called from process context. Can it race with tcp_twkill()? + * I think it can, because tcp_twkill could be spinning on tw_death_lock() on another CPU right now! + */ spin_unlock(&tw_death_lock); } @@ -508,7 +515,7 @@ out: if ((tcp_tw_count -= killed) == 0) - del_timer(&tcp_tw_timer); + del_timer_async(&tcp_tw_timer); net_statistics[smp_processor_id()*2].TimeWaitKilled += killed; spin_unlock(&tw_death_lock); } @@ -710,7 +717,13 @@ void tcp_delete_keepalive_timer (struct sock *sk) { - if (timer_pending(&sk->timer) && del_timer (&sk->timer)) +/* + * REVIEWME. tcp_keepalive_timer can be called from process context. Can it race + * against the timer handler? + * + * Also, do we need timer_pending here? It'll usually return true, perhaps? + */ + if (timer_pending(&sk->timer) && del_timer_async(&sk->timer)) __sock_put(sk); } --- linux-2.4.0test1-ac5/include/net/tcp.h Mon May 15 21:24:15 2000 +++ linux-akpm/include/net/tcp.h Tue May 30 22:05:45 2000 @@ -1491,7 +1491,7 @@ return; }; - if (timer_pending(timer) && del_timer(timer)) + if (timer_pending(timer) && del_timer_async(timer)) __sock_put(sk); } --------------858EFF1C8DC71475B0F1489C Content-Type: text/plain; charset=us-ascii; name="ipv4.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="ipv4.txt" igmp.c ====== igmp_start_timer() OK as long as always called from BH context. igmp_mod_timer() OK as long as always called from BH context. ip_fragment.c ============= ip_find() It says "We are always in BH context", so let's trust that. ip_free() Also called only from BH context. ipmr.c ====== ipbr_mfc_add() ALERT! Can be called from process context! Added REVIEWME route.c ======= ALERT! rt_cache_flush() is called from process context and can apparently race wrt rt_check_expire(). Added a REVIEWME comment. tcp_timer.c =========== tcp_reset_xmit_timer() OK, always called from timer handlers. ** Why bother calling timer_pending in here? The common case will be that the timer _is_ pending, so just call del_timer_async??? tcp_clear_xmit_timers() Called from tcp_disconnect() Called from tcp_listen_stop() Called from tcp_close() ALERT! tcp_clear_xmit_timers() is called from process context. The retransmit_timer, delack_timer, probe_timer and sk->timer could still be running! I used del_timer_async(), but this needs reviewing. tcp_tw_deschedule() Called from tcp_timewait_state_process() Called from tcp_v4_rcv(). OK, BH context. Called from tcp_v4_check_established() Called from tcp_v4_hash_connecting() Called from tcp_connect() ALERT! tcp_tw_deschedule() called from process context! Added a REVIEWME, used del_timer_async(). tcp_twcal_tick() OK, it's a timer handler. tcp_delete_keepalive_timer() Called from tcp_listen_stop() Called from tcp_close() ALERT! tcp_delete_keepalive_timer() called from process context! Can it race with the handler? Added a REVIEWME. include/net/tcp_timer.h ======================= tcp_clear_xmit_timer(): Called from tcp_ack_probe() Called from tcp_ack() OK, BH only, methinks. Called from tcp_ack() OK, BH only. ** IS the timer_pending needed? --------------858EFF1C8DC71475B0F1489C-- From owner-netdev@oss.sgi.com Tue May 30 05:11:49 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 05:11:29 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:35238 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Tue, 30 May 2000 05:11:00 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch1.nortel.com; Tue, 30 May 2000 08:06:13 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LW2B96FT; Tue, 30 May 2000 08:06:05 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PT86M; Tue, 30 May 2000 23:06:14 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.44]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id XAA10704; Tue, 30 May 2000 23:06:02 +1000 Message-ID: <3933BD65.4E106F8@uow.edu.au> Date: Tue, 30 May 2000 23:08:53 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Alexey Kuznetsov CC: "netdev@oss.sgi.com" Subject: [timers] net/core/* Content-Type: multipart/mixed; boundary="------------6231A3208CF66BAB14DCDDCF" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. --------------6231A3208CF66BAB14DCDDCF Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit A couple of possible races. neigh_del_timer() looks ugly. Fixed a race in whitehole_close(). We could just apply these patches and revisit the REVIEWME comments at some time in the future of course. Or you could just tell me to buzz off :) -- -akpm- --------------6231A3208CF66BAB14DCDDCF Content-Type: text/plain; charset=us-ascii; name="core.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="core.txt" dst.c ===== dst_gc_run() OK, this is the timer handler, so del_timer_async is safe. ** Why bother deleting the timer in the timer handler? It's gone! __dst_free() Called from dst_free() (include/net/dst.h) Called from decnet/dn_route.c:dn_dst_check_expire() OK, timer handler Called from decnet/dn_route.c:dn_dst_gc() Called from dst.c:dst_alloc() (Only place?) Called from decnet/dn_route.c:dn_route_output_slow() Called from dn_route_output() Called from dn_cache_getroute() Called from dn_connect() Called from process context. Called from decnet/dn_route.c:dn_route_input_slow() Called from ipv4/route.c Called from ipv4/route.c: Called from ipv6/ip6_fib.c: Called from ipv6/route.c: OK, we have one path by which __dst_free() can be called from process context. I'm dazed and confused. neighbour.c =========== neigh_del_timer() Called from neigh_ifdown() Called from neigh_table_clear() Called from decnet/dn_neigh.c:dn_neigh_cleanup() Called from process context Called from ipv6/ndisc.c:ndisc_cleanup() Called from process context neigh_if_down() uses write_lock_bh(&tbl->lock) and write_lock(&n->lock); neigh_timer_handler() uses write_lock(&neigh->lock); Oh how very sticky. We can't use del_timer_sync() because it'll deadlock. But neigh_release() will race horridly against neigh_timer_handler(). I used del_timer_async(). Added REVIEWME. neigh_proxy_process() OK, this is a timer handler. ** the del_timer_async() seems unnecessary. pneigh_enqueue() Called from ipv4/arp.c:arp_rcv(). OK, BH context. Called from ipv6/ndisc.c:ndisc_rcv() OK, BH context. **EXPORTED TO MODULES** neigh_table_clear() Uses del_timer_sync() profile.c ========= whitehole_close() Racy. Use del_timer_sync(). --------------6231A3208CF66BAB14DCDDCF Content-Type: text/plain; charset=us-ascii; name="net_core.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="net_core.patch" --- linux-2.4.0test1-ac5/net/core/dst.c Mon May 15 21:24:15 2000 +++ linux-akpm/net/core/dst.c Tue May 30 22:32:41 2000 @@ -50,8 +50,10 @@ return; } - - del_timer(&dst_gc_timer); +/* + * REVIEWME: I don't think we need to delete the timer at all here + */ + del_timer_async(&dst_gc_timer); dstp = &dst_garbage_list; while ((dst = *dstp) != NULL) { if (atomic_read(&dst->__refcnt)) { @@ -128,13 +130,16 @@ dst->next = dst_garbage_list; dst_garbage_list = dst; if (dst_gc_timer_inc > DST_GC_INC) { - del_timer(&dst_gc_timer); + del_timer_async(&dst_gc_timer); dst_gc_timer_inc = DST_GC_INC; dst_gc_timer_expires = DST_GC_MIN; dst_gc_timer.expires = jiffies + dst_gc_timer_expires; add_timer(&dst_gc_timer); } - +/* + * REVIEWME: this function can be called from process context. See core.txt. + * Can it race? + */ spin_unlock_bh(&dst_lock); } --- linux-2.4.0test1-ac5/net/core/neighbour.c Sat Apr 29 21:22:41 2000 +++ linux-akpm/net/core/neighbour.c Tue May 30 23:01:46 2000 @@ -156,8 +156,12 @@ static int neigh_del_timer(struct neighbour *n) { +/* + * REVIEWME. Can be called from process context, can't use del_timer_sync. I think we + * have a problem here. See core.txt. + */ if (n->nud_state & NUD_IN_TIMER) { - if (del_timer(&n->timer)) { + if (del_timer_async(&n->timer)) { neigh_release(n); return 1; } @@ -1016,7 +1020,7 @@ } else if (!sched_next || tdif < sched_next) sched_next = tdif; } - del_timer(&tbl->proxy_timer); + del_timer_async(&tbl->proxy_timer); if (sched_next) { tbl->proxy_timer.expires = jiffies + sched_next; add_timer(&tbl->proxy_timer); @@ -1036,7 +1040,7 @@ } skb->stamp.tv_sec = 0; skb->stamp.tv_usec = now + sched_next; - if (del_timer(&tbl->proxy_timer)) { + if (del_timer_async(&tbl->proxy_timer)) { long tval = tbl->proxy_timer.expires - now; if (tval < sched_next) sched_next = tval; --- linux-2.4.0test1-ac5/net/core/profile.c Wed Apr 26 22:51:21 2000 +++ linux-akpm/net/core/profile.c Tue May 30 23:02:36 2000 @@ -172,7 +172,7 @@ static int whitehole_close(struct net_device *dev) { - del_timer(&whitehole_timer); + del_timer_sync(&whitehole_timer); return 0; } --------------6231A3208CF66BAB14DCDDCF-- From owner-netdev@oss.sgi.com Tue May 30 10:09:28 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 10:09:08 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:28942 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 30 May 2000 09:00:23 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA12445; Tue, 30 May 2000 20:59:28 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005301659.UAA12445@ms2.inr.ac.ru> Subject: Re: [timers] net/ipv4 To: andrewm@uow.edu.au (Andrew Morton) Date: Tue, 30 May 2000 20:59:28 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <3933B0B2.50AB5EA1@uow.edu.au> from "Andrew Morton" at May 30, 0 10:14:42 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 3500 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! [ Andi, please, look at the end ] > --- linux-2.4.0test1-ac5/net/ipv4/ipmr.c Mon May 15 21:24:15 2000 > +++ linux-akpm/net/ipv4/ipmr.c Tue May 30 21:29:45 2000 > @@ -756,10 +756,15 @@ > uc->mfc_mcastgrp == c->mfc_mcastgrp) { > *cp = uc->next; > if (atomic_dec_and_test(&cache_resolve_queue_len)) > - del_timer(&ipmr_expire_timer); > + del_timer_async(&ipmr_expire_timer); > break; > } > } > + /* > + * REVIEWME: this function can be called from process context, yet I > + * did a del_timer_async. Can we use del_timer_sync? Can the handler > + * and this code fight each other? > + */ ipmr_expire_timer is static stateless timer. There are no problems with it. Even not counting that del_timer_sync is deadlocky, it should not be ever used for static timers, especially doing large work. It would be useless loss of time, not more. > + * REVIEWME: rt_cache_flush() can be called from process context. The above > + * spinlock won't help us - rt_check_expire() could be running now (and doesn't use > + * the lock anyway > + * Should we use del_timer_sync() here? > + */ It is the same as above. > @@ -110,14 +110,17 @@ > { > struct tcp_opt *tp = &sk->tp_pinfo.af_tcp; > > - if(timer_pending(&tp->retransmit_timer) && del_timer(&tp->retransmit_timer)) > +/* > + * REVIEWME: this function can be called from process context. See ipv4.txt > + */ Certainly. So what? > @@ -368,8 +371,12 @@ > tw->pprev_death = NULL; > tcp_tw_put(tw); > if (--tcp_tw_count == 0) > - del_timer(&tcp_tw_timer); > + del_timer_async(&tcp_tw_timer); > } > +/* > + * REVIEWME: this function can be called from process context. Can it race with tcp_twkill()? > + * I think it can, because tcp_twkill could be spinning on tw_death_lock() on another CPU right now! > + */ > spin_unlock(&tw_death_lock); > } > Let it to spin. No problems. > @@ -710,7 +717,13 @@ > > void tcp_delete_keepalive_timer (struct sock *sk) > { > - if (timer_pending(&sk->timer) && del_timer (&sk->timer)) > +/* > + * REVIEWME. tcp_keepalive_timer can be called from process context. Can it race > + * against the timer handler? Of course. I repeat again and again, TCP use _reference_ _counting_. so that it is insensitive to any races and orthogonal to the problem under discussion. > + * > + * Also, do we need timer_pending here? It'll usually return true, perhaps? > + */ > + if (timer_pending(&sk->timer) && del_timer_async(&sk->timer)) > __sock_put(sk); It is optimization, del_timer() is too expensive to be used in TCP. > igmp.c > ====== > > igmp_start_timer() > > OK as long as always called from BH context. > > igmp_mod_timer() > > OK as long as always called from BH context. Forget about this argument. Words "always called from BH context" are non-sense in 2.4 net/ipv4. The code is invariant wrt BH<->process context and 99% of code is called from both of contexts. > ip_fragment.c > ============= > > ip_find() > > It says "We are always in BH context", so let's trust that. It lies, it has the same bug as ipv6/reassembly.c > I used del_timer_async(), but this needs reviewing. Andrew, I repeat again, all the timers in net/ipv4 are supposed to be del_timer_async(). See? Bugs in ipv6 and ipv4 defragmenter are noticed, they will be fixed after talking to Andi, he had some massive changes to defragmenter. Andi, what is status of that your patch? May I change defragmenter to fix timers there? Alexey From owner-netdev@oss.sgi.com Tue May 30 10:09:38 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 10:09:08 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:34062 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 30 May 2000 09:08:39 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA12627; Tue, 30 May 2000 21:07:48 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005301707.VAA12627@ms2.inr.ac.ru> Subject: Re: [timers] net/core/* To: andrewm@uow.edu.au (Andrew Morton) Date: Tue, 30 May 2000 21:07:47 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <3933BD65.4E106F8@uow.edu.au> from "Andrew Morton" at May 30, 0 11:08:53 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 586 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > ** Why bother deleting the timer in the timer handler? It's gone! It is plain mud. Delete it. > OK, we have one path by which __dst_free() can be called from process context. > > I'm dazed and confused. I do not understand, what confused you. __dst_free can be called from any context. > I used del_timer_async(). Added REVIEWME. It does reference counting. Andrew, did you read my mails or you did not? > whitehole_close() > > Racy. Use del_timer_sync(). Accepted! To resume: del_timer() is dst.c and this one are OK. All the rest is crap. Alexey From owner-netdev@oss.sgi.com Tue May 30 10:09:38 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 10:09:08 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:37390 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 30 May 2000 09:19:42 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA12731; Tue, 30 May 2000 21:18:57 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005301718.VAA12731@ms2.inr.ac.ru> Subject: Re: [timers] net/sched/* To: andrewm@uow.edu.au (Andrew Morton) Date: Tue, 30 May 2000 21:18:57 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <3933CC51.6B05CE93@uow.edu.au> from "Andrew Morton" at May 31, 0 00:12:33 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 468 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > +/* REVIEWME: called from process context */ It is not essential. It is static timer. > +/* > + * REVIEWME: Called from process context. I think del_timer_sync is safe here > + */ > + del_timer_async(&q->wd_timer); > + del_timer_async(&q->delay_timer); Alas. It is called under spinlock and cannot be synchronous. This and all the rest must use (existing) reference counts. It will be cleaned later, this code needs some cleaning in any case. Alexey From owner-netdev@oss.sgi.com Tue May 30 10:09:38 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 10:09:09 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:40718 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 30 May 2000 09:30:14 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA12781; Tue, 30 May 2000 21:29:20 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005301729.VAA12781@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: rusty@linuxcare.com.au (Rusty Russell) Date: Tue, 30 May 2000 21:29:20 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20000530043810.48AAA8189@halfway> from "Rusty Russell" at May 30, 0 02:08:10 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 637 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > Get rid of the braindead timer_exit() macro, and inside > run_timer_list, do: > > #ifdef CONFIG_SMP > if (timer->function(timer->data)) { > timer->running = 0; > mb(); > } > #else > timer->function(timer->data); > #endif I apologize. Certainly it could look like. if (timer->flags & TIMER_SELF_DESTRUCTABLE) { timer->function(timer->data); timer->running = 0; mb(); } else { timer->function(timer->data); } But both ways are still wrong. We still need macro telling, when a self-destructable timer is running even if we need not any synchronization. So that timer_exit() is required in any case. Alexey From owner-netdev@oss.sgi.com Tue May 30 10:09:38 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 10:09:09 -0700 Received: from ertpg14e1.nortelnetworks.com ([47.234.0.35]:58810 "EHLO ertpg14e1.nortelnetworks.com") by oss.sgi.com with ESMTP id ; Tue, 30 May 2000 06:33:58 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by ertpg14e1.nortelnetworks.com; Tue, 30 May 2000 10:09:53 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LCNJ7HHC; Tue, 30 May 2000 22:09:47 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PT87J; Wed, 31 May 2000 00:09:54 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.44]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id AAA11103; Wed, 31 May 2000 00:09:40 +1000 Message-ID: <3933CC51.6B05CE93@uow.edu.au> Date: Wed, 31 May 2000 00:12:33 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: Alexey Kuznetsov CC: "netdev@oss.sgi.com" Subject: [timers] net/sched/* Content-Type: multipart/mixed; boundary="------------EE5857C6AFA754B9B7B3AD54" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This is a multi-part message in MIME format. --------------EE5857C6AFA754B9B7B3AD54 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I got a bit lost here. There are a few del_timer()'s immediately preceding MOD_DEC_USE_COUNT which look risky. --------------EE5857C6AFA754B9B7B3AD54 Content-Type: text/plain; charset=us-ascii; name="net_sched.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="net_sched.patch" --- linux-2.4.0test1-ac5/net/sched/estimator.c Thu Jun 10 07:45:37 1999 +++ linux-akpm/net/sched/estimator.c Tue May 30 23:32:36 2000 @@ -191,7 +191,8 @@ killed++; } if (killed && elist[idx].list == NULL) - del_timer(&elist[idx].timer); + del_timer_async(&elist[idx].timer); +/* REVIEWME: called from process context */ } } --- linux-2.4.0test1-ac5/net/sched/sch_api.c Wed Apr 26 22:51:21 2000 +++ linux-akpm/net/sched/sch_api.c Tue May 30 23:57:37 2000 @@ -1111,7 +1111,7 @@ static void psched_tick(unsigned long); static struct timer_list psched_timer = - { NULL, NULL, 0, 0L, psched_tick }; + { { NULL, NULL }, 0, 0L, psched_tick }; static void psched_tick(unsigned long dummy) { --- linux-2.4.0test1-ac5/net/sched/sch_cbq.c Wed Apr 26 22:52:03 2000 +++ linux-akpm/net/sched/sch_cbq.c Tue May 30 23:40:52 2000 @@ -553,7 +553,7 @@ cl->penalized = sched; cl->cpriority = TC_CBQ_MAXPRIO; q->pmask |= (1<delay_timer) && + if (del_timer_async(&q->delay_timer) && (long)(q->delay_timer.expires - sched) > 0) q->delay_timer.expires = sched; add_timer(&q->delay_timer); @@ -1055,7 +1055,7 @@ sch->stats.overlimits++; if (q->wd_expires && !netif_queue_stopped(sch->dev)) { long delay = PSCHED_US2JIFFIE(q->wd_expires); - del_timer(&q->wd_timer); + del_timer_async(&q->wd_timer); if (delay <= 0) delay = 1; q->wd_timer.expires = jiffies + delay; @@ -1263,8 +1263,11 @@ q->pmask = 0; q->tx_class = NULL; q->tx_borrowed = NULL; - del_timer(&q->wd_timer); - del_timer(&q->delay_timer); +/* + * REVIEWME: Called from process context. I think del_timer_sync is safe here + */ + del_timer_async(&q->wd_timer); + del_timer_async(&q->delay_timer); q->toplevel = TC_CBQ_MAXLEVEL; PSCHED_GET_TIME(q->now); q->now_rt = q->now; --- linux-2.4.0test1-ac5/net/sched/sch_csz.c Tue Aug 24 03:01:02 1999 +++ linux-akpm/net/sched/sch_csz.c Tue May 30 23:43:25 2000 @@ -708,7 +708,7 @@ */ if (q->wd_expires) { unsigned long delay = PSCHED_US2JIFFIE(q->wd_expires); - del_timer(&q->wd_timer); + del_timer_async(&q->wd_timer); if (delay == 0) delay = 1; q->wd_timer.expires = jiffies + delay; @@ -741,7 +741,8 @@ #ifdef CSZ_PLUS_TBF PSCHED_GET_TIME(&q->t_tbf); q->tokens = q->depth; - del_timer(&q->wd_timer); +/* REVIEWME: called from process context */ + del_timer_async(&q->wd_timer); #endif sch->q.qlen = 0; } --- linux-2.4.0test1-ac5/net/sched/sch_generic.c Mon May 15 21:24:15 2000 +++ linux-akpm/net/sched/sch_generic.c Tue May 30 23:49:07 2000 @@ -190,7 +190,7 @@ static void dev_watchdog_down(struct net_device *dev) { spin_lock_bh(&dev->xmit_lock); - if (del_timer(&dev->watchdog_timer)) + if (del_timer_async(&dev->watchdog_timer)) __dev_put(dev); spin_unlock_bh(&dev->xmit_lock); } --- linux-2.4.0test1-ac5/net/sched/sch_sfq.c Tue Aug 24 03:01:02 1999 +++ linux-akpm/net/sched/sch_sfq.c Tue May 30 23:53:54 2000 @@ -391,7 +391,8 @@ q->quantum = ctl->quantum ? : psched_mtu(sch->dev); q->perturb_period = ctl->perturb_period*HZ; - del_timer(&q->perturb_timer); +/* REVIEWME: del_timer_sync() is safe here, and we're called from process context */ + del_timer_async(&q->perturb_timer); if (q->perturb_period) { q->perturb_timer.expires = jiffies + q->perturb_period; add_timer(&q->perturb_timer); @@ -435,7 +436,8 @@ static void sfq_destroy(struct Qdisc *sch) { struct sfq_sched_data *q = (struct sfq_sched_data *)sch->data; - del_timer(&q->perturb_timer); +/* REVIEWME: suggest del_timer_sync() */ + del_timer_async(&q->perturb_timer); MOD_DEC_USE_COUNT; } --- linux-2.4.0test1-ac5/net/sched/sch_tbf.c Mon May 15 21:24:15 2000 +++ linux-akpm/net/sched/sch_tbf.c Tue May 30 23:55:47 2000 @@ -265,7 +265,8 @@ q->tokens = q->buffer; q->ptokens = q->mtu; sch->flags &= ~TCQ_F_THROTTLED; - del_timer(&q->wd_timer); +/* REVIEWME: suggest del_timer_sync() */ + del_timer_async(&q->wd_timer); } static int tbf_change(struct Qdisc* sch, struct rtattr *opt) @@ -350,7 +351,8 @@ { struct tbf_sched_data *q = (struct tbf_sched_data *)sch->data; - del_timer(&q->wd_timer); +/* REVIEWME: suggest del_timer_sync() */ + del_timer_async(&q->wd_timer); if (q->P_tab) qdisc_put_rtab(q->P_tab); --------------EE5857C6AFA754B9B7B3AD54 Content-Type: text/plain; charset=us-ascii; name="sched.txt" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="sched.txt" estimator.c =========== qdisc_kill_estimator() Called from sched/police.c: Called from tcf_police_destroy() Called from tcf_police_release() (include/net/pkt_sched.h) Called from cls_fw.c:fw_destroy() Called from sch_atm.c:destroy_filters() Called from atm_tc_put() Called from atm_tc_change() Called from cls_api.c:tc_ctl_tfilter() Called from process context Called from sch_api.c: Called from atm_tc_destroy() Called from sch_cbq.c: Called from sch_dsmark.c: Called from sch_generic.c: Called from sch_ingress.c: Called from cls_route.c: Called from cls_tcindex.c: Called from cls_u32.c: Called from sched/sch_api.c: Called from sched/sch_cbq.c: Called from sched/sch_generic.c: Potentially racy. Used del_timer_async. Added REVIEWME. sch_cbq.c ========= cbq_ovl_delay() Called from cbq_under_limit() Called from cbq_dequeue_prio() Called from cbq_dequeue_1() Called from cbq_dequeue() Called from sch_dsmark.c:dsmark_dequeue(), perhaps. I'm lost. del_timer_async. cbq_dequeue() Lost again. del_timer_async. cbq_reset() Called from qdisc_reset() Called from dev_deactivate() Called from process context. Used del_timer_async. Added REVIEWME. sch_csz.c ========= csz_dequeue() I can't work out the call context for the dequeue methods. Make is del_timer_async(). csz_reset() I think this should be del_timer_sync(). Added REVIEWME. sch_generic.c ============= dev_watchdog_down() Called from dev_deactivate() under, effectively, dev->xmit_lock. Called from process context. But dev_watchdog() grabs dev->xmit_lock. It seems that refcounting makes this one safe. sch_sfq.c ========= sqf_change() Called from sfq_init() Called from process context, I assume. The timer is probably not running at this time. sqf_destroy() This is racy (module unload). Suggest del_timer_sync() sch_tbf.c ========= tbf_reset() I think this is called from process context. Suggest del_timer_sync(). tbf_destroy() del_timer_async() jsut before MOD_DEC_USE_COUNT. Risky. --------------EE5857C6AFA754B9B7B3AD54-- From owner-netdev@oss.sgi.com Tue May 30 11:11:57 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 11:11:37 -0700 Received: from panic.ohr.gatech.edu ([130.207.47.194]:43525 "EHLO havoc.gtf.org") by oss.sgi.com with ESMTP id ; Tue, 30 May 2000 11:11:14 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id OAA07525; Tue, 30 May 2000 14:28:41 -0400 Message-ID: <39340859.CA072AB6@mandrakesoft.com> Date: Tue, 30 May 2000 14:28:41 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.73 [en] (X11; I; Linux 2.2.15-6mdksmp i686) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: Rusty Russell , netdev@oss.sgi.com Subject: Re: tx_timeout and timer serialisation References: <200005301729.VAA12781@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > But both ways are still wrong. We still need macro telling, > when a self-destructable timer is running even if we need not > any synchronization. So that timer_exit() is required in any case. It seems like any solution requiring timer_exit() will always be racy, because of the [very small] window between timer_exit() code finishing, and the timer function code being disposeable... Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Tue May 30 11:13:37 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 11:13:27 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:12815 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Tue, 30 May 2000 11:13:05 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA13129; Tue, 30 May 2000 22:45:01 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005301845.WAA13129@ms2.inr.ac.ru> Subject: Re: tx_timeout and timer serialisation To: jgarzik@mandrakesoft.com (Jeff Garzik) Date: Tue, 30 May 2000 22:45:01 +0400 (MSK DST) Cc: rusty@linuxcare.com.au, netdev@oss.sgi.com In-Reply-To: <39340859.CA072AB6@mandrakesoft.com> from "Jeff Garzik" at May 30, 0 02:28:41 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 1004 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > It seems like any solution requiring timer_exit() will always be racy, > because of the [very small] window between timer_exit() code finishing, > and the timer function code being disposeable... timer_exit() was used not to resolve this race, but to plug race window between releasing timer list spinlock and entry to timer handler. The moment, when timer handler finds convenient to call timer_exit() is not essential from this viewpoint. F.e it could call it on the most entry upon grabbing some spinlock on controlled object. No attempt to resolve "module unload" race was even made. Moreover, I even think that this race could be resolved separately. Even some big synchronizer called from module.c is acceptable. Case of "self-modifying" code does not deserve too much of attention. Actually, TIMER_SELF_DESTRUCTABLE flag or Andrew's approach solve _all_ of the problems, provided timer->running remains public for "asynchronous" timers and module.c synchronizes to TIMER_BH. Alexey From owner-netdev@oss.sgi.com Tue May 30 19:36:58 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 19:36:39 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:9354 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Tue, 30 May 2000 19:36:07 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch1.nortel.com; Tue, 30 May 2000 21:29:39 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LW2B0SVH; Tue, 30 May 2000 21:28:42 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PT9R8; Wed, 31 May 2000 12:28:53 +1000 Received: from uow.edu.au (IDENT:akpm@localhost [127.0.0.1]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id MAA18926; Wed, 31 May 2000 12:28:39 +1000 Message-ID: <393478D7.7D131923@uow.edu.au> Date: Wed, 31 May 2000 02:28:39 +0000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.3.99-pre5 i686) X-Accept-Language: en MIME-Version: 1.0 To: kuznet@ms2.inr.ac.ru CC: netdev@oss.sgi.com Subject: Re: [timers] net/ipv4 References: <3933B0B2.50AB5EA1@uow.edu.au> from "Andrew Morton" at May 30, 0 10:14:42 pm <200005301659.UAA12445@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Alexey, I'm obviously missing something. If you want to say "Andrew, TCP is safe" then that's cool. I'll just buzz off and stare unhappily at the SCSI code. But I just need one more shot at it. :) kuznet@ms2.inr.ac.ru wrote: > > Even not counting that del_timer_sync is deadlocky, > it should not be ever used for static timers, static struct timer_list timer; int foo; handler() { foo = 1; } mainline() { del_timer_async(&timer); foo = 2; } That is a race. So why do you say del_timer_sync() is not needed for static timers? > > Of course. I repeat again and again, TCP use _reference_ _counting_. Bear with me Alexey, I'm Australian. Where are the refcounts held? Let's look at igmp.c: static __inline__ void igmp_stop_timer(struct ip_mc_list *im) { // igmp_timer_expire() could be running right now spin_lock_bh(&im->lock); // and now if (del_timer(&im->timer)) atomic_dec(&im->refcnt); im->tm_running=0; im->reporter = 0; // and now. im->unsolicit_count = 0; spin_unlock_bh(&im->lock); } static void igmp_timer_expire(unsigned long data) { ... // Race here. im->reporter = 1; ip_ma_put(im); } I assume there is SOMETHING at a higher layer which prevents this occurring. But what? From owner-netdev@oss.sgi.com Tue May 30 21:55:08 2000 Received: by oss.sgi.com id ; Tue, 30 May 2000 21:54:48 -0700 Received: from pizda.ninka.net ([216.101.162.242]:34179 "EHLO pizda.ninka.net") by oss.sgi.com with ESMTP id ; Tue, 30 May 2000 21:54:43 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id VAA18372; Tue, 30 May 2000 21:47:03 -0700 Date: Tue, 30 May 2000 21:47:03 -0700 Message-Id: <200005310447.VAA18372@pizda.ninka.net> X-Authentication-Warning: pizda.ninka.net: davem set sender to davem@redhat.com using -f From: "David S. Miller" To: andrewm@uow.edu.au CC: kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com In-reply-to: <393478D7.7D131923@uow.edu.au> (message from Andrew Morton on Wed, 31 May 2000 02:28:39 +0000) Subject: Re: [timers] net/ipv4 References: <3933B0B2.50AB5EA1@uow.edu.au> from "Andrew Morton" at May 30, 0 10:14:42 pm <200005301659.UAA12445@ms2.inr.ac.ru> <393478D7.7D131923@uow.edu.au> Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Date: Wed, 31 May 2000 02:28:39 +0000 From: Andrew Morton > > Of course. I repeat again and again, TCP use _reference_ _counting_. Bear with me Alexey, I'm Australian. Where are the refcounts held? For TCP, they are held in the socket. See all the sock_get/sock_put calls around adding/removing the TCP retransmit timer. Only the last sock_put really free's up the socket in the end, and this can be a timer :-) Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Wed May 31 07:40:05 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 07:39:46 -0700 Received: from sirppi.helsinki.fi ([128.214.205.27]:26631 "EHLO sirppi.helsinki.fi") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 07:39:38 -0700 Received: from localhost (amlaukka@localhost) by sirppi.helsinki.fi (8.10.1/8.10.1) with ESMTP id e4VEess24449; Wed, 31 May 2000 17:40:55 +0300 (EET DST) X-Authentication-Warning: sirppi.helsinki.fi: amlaukka owned process doing -bs Date: Wed, 31 May 2000 17:40:53 +0300 (EET DST) From: Aki M Laukkanen To: netdev@oss.sgi.com cc: iwtcp@cs.helsinki.fi, jmanner@cs.helsinki.fi Subject: recent TCP changes adversive on slow links Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-1472595759-959784053=:26843" Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --0-1472595759-959784053=:26843 Content-Type: TEXT/PLAIN; charset=US-ASCII Hi, it seems the changes to tcp_write_space() in 2.3.99-pre5 or thereabouts had some adversive effects on slow links. The accompanying tcpdump is from a 9600 bps connection with a rather large transmission delay. Take a look with tracedump and the burstiness at the sender end becomes apparent. The reason for this seems to be that the incoming acks are so delayed that they can not provide feedback for TCP when to wakeup the sender. The following patch in will make the old behaviour come back. I assume this change was made to prevent over-scheduling and thus the patch as it is might not be acceptable. Maybe a variant which tests against (sk->sndbuf - WATERMARK) could be? Aside from the rather crude looking tcpdumps, the transmission speed with ttcp (and tools alike) is not much worse withthe current behaviour. However real-world apps might not be able to shuffle new packets to the stack at will so the burst will be gentler. --- linux-2.4.0-test1-ac6.bak/net/ipv4/tcp.c Mon Apr 24 23:59:57 2000 +++ linux-2.4.0-test1-ac6/net/ipv4/tcp.c Wed May 31 17:22:41 2000 @@ -546,7 +546,7 @@ struct socket *sock; read_lock(&sk->callback_lock); - if ((sock = sk->socket) != NULL && atomic_read(&sk->wmem_alloc) == 0) { + if ((sock = sk->socket) != NULL) { if (test_bit(SOCK_NOSPACE, &sock->flags)) { if (sk->sleep && waitqueue_active(sk->sleep)) { clear_bit(SOCK_NOSPACE, &sock->flags); -- D. --0-1472595759-959784053=:26843 Content-Type: APPLICATION/octet-stream; name="tcpdump.gz" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="tcpdump.gz" H4sICDIeNTkCA3RjcGR1bXAAjNwJdI7X1gfwTJJqihYXbcwSM5EIEYIYE6Sm hBDEVIJEguTNbGpSEWNRTc2UizY0rVlQQww1z1ONLRWEUrNSvuf/5J6d9+z1 ddl3ra57173r/tbZ7/tsnvf89zlnctcts7NxsFH/amv884Hxz5qPGvmUXFLE xtf4z/innflvfjZ+jtVz3rexsTP+sXVwcnByu1KvBf5/S+3G3b43yvgfHOzK Odi9Z/zPoyviv7e1tzctywynf7Ns4ZnWVP9kw2u5tOS42yUnWFuRF+Epa5dh eRmOV4Hl9T9rD1tXS8NLmfDhuNvNFxn/pa25ptqwYPjNcTJrrWkLw7ZmgeF2 7N+M2y11w8b4zFzdqteoWat2nbr16rvDHCsy6zdUZr183ewZ0qt3aJ++/cL6 Dxg4aPBnQ2Aealjs32q1/txS4ML0TVNm5Bu4MD7wKiZYl3sNtS7/4bSuLnCM 8hp4eDb0atTYu4lP02a+5ucnMhu4KHPoXN00yhsaPmz4iIjIkVHRo0aPWWuY LndshbW614DZLE6ZI8uhVhjdDePd6/IoYa6r3Ljbdk60ruFwjPKat2jp16p1 m7bt2vsHdIA5UWR62qta7crqZuFTUvAxwpx3vYiw1gYuMJu2oVo98RnC+MUw BOt6rta1/CGtKxVO4ZNW8FXAfCEyG+Yrc8l83Sx8Sgo+RphVfyoqrNWjBEyf xlRrJ1VrT8N497q8rqnv9ewqWtdCOIVPWkysJS4+AWa6yGx0WtXqV003e7CG hTl5tbRfPe1hNilOtQ7B8wJj62pJbzU+oNYVWobWtQlOXdawMG+JTO9tyvyk tW4OYg37g2E+6mcnrfU5TG8bqnU8aoVRI8xOsK4m2ep7fTCN1nUCjg9rWJhB ItNnuaq19F7ddGMNC/NAfUdhrQ3zYTa6SLV+g96AYePuKFhX00y1rk1+tK47 cPqwhoXpLjKbTVXmug66yRsWplf194W1el2D6XWYal2nah1sGO9el+949b3m 1lPGGHs4w1jDwpwlMpvHqFofH9fNYNawMBdXKC6stdFpmA1XUK1H8OcAjFOG 8e51tQhX64rMpXVVgFOHNaz5rFSUmC37KLPul7o5kDVsNvo1S9qvjQ/A9Myk Wv9ArTBqrJH0ll8X9b1eKU3ragynCWtYmD1FZqs2qtYaMbrpyhoWZs5yab96 b4PpEU21vsWfeTDuL5f0Vmtvta7h92ldXeGEsoaFWX6FxGxTR5l7n+mmB2tY mH4LpP3aJBtmg4HKjPpY1Rq7QNJbbSupdX27ltY1Ak44a1iYy0Rmu1LqWSnt qpvW72EdOwV+CnNnhLRffZZjne5NqVa8N52A8SBC0lvtnVSt48fQulLh1GYN C7NCpMh8pczrLXRzAGvYHw0zuIq9sNammTDr16VaA1ErjAzDePe6/B+odZUv fN4WwfFmDQtzi8gMuKG+17y9umn9HpaYlJwCc2U5J2Gtzaaav52KUK1D8XcZ jPPlJL+/OpxXtaY0o3VthtObNSxMh48lZsfDyjy/XDcbsIaF6XtH2q++42HW eUG1jle1jroj6a1OO9W6evaldZ2AM5Q1LMzFIjNwvfpebdfoZnPWsD+av1Gk /do8BuusfYJqnYe/t2G8vC7prU9XqVpnV6J13YVTizUszJq/SczOC5T5aJpu 9mcN+5Nh5n0t7dcW4TBr7aFa16NWGJ9kSnqry0y1Lnd63mIc4DRmDQuzg8js mkq/nTrqZiRrWJj/DJX2a8s+WGfNBVTrEbyjwPAMl/RWt3iq9S6tqwKcXqxh YQ4Umd0jlHm3u266s4aFGR7mLKzVr4v5fjKdar2las00jHevK6i/Wtew72ld 3nCGsIaFmSsyg4Pot5OHbvqyhl1nmHalbYS1tmqDdVYfqkzjBcp4H4PRoLSN YF09/FWtK47QurrBqckaFmaYyOzZTJkrfXQzjDUsTI+b0n5t7Q3TrTfV+rGq tf9NSW+FuKt1tS583kbAacQaFuZUkdmrmjKvvtLNCNawMH+5JO3XNnVgutKz Et0Q754wXl+S9Fbvsup5q16C1pUGx/pdrnOXrt1g1r4sMUOdVa2v3+om25ps bj4rp6X92rYSzGq0rxP9qap19mlJb4W+pfemEFrXYjhsezMG5i6R2eexMuen 6ibbmuy43jBvTpb2a7tSMKvQsxIdjvdsGGUzJL3VN099ry/p3T9mCxzrd7mx 4ypXgdlOZPa7pGrdkq6bbGsyEmZabwdhre2dYFb+k2qdoGrdaBjvXlfYcbWu 4MLn7SQctr3ZBuYNkdk/l95PVusm25pMhFmz23vSWl/BrHSAap2P3xQwQg3j 3esasFl9r6Vojz4mH04r1rAwJ4vMgVmq1hJrdZNnCTB3B3wgrNX/AcyKW6nW DarWp4bx7nUNWqLWVf+/yogtAofnETCrdZCYg+coc20D3eRZwgbsI56T9mvA DZgVZlOtR/H7CcaLc5Le+iydfn81onVVhDOKNSxMt/MSc0gy7WE11k2eJcAM PSbt1w7nzT9DKbuKzlO1ZhyT9NbQaLWucPpNEtsEDs8jYG4WmeGD6f3kmW7y LAFm1dLSjKPjYZguococZY/fijBCSkvyiGEh9PuLcqbYYDh+rGFhponM4Z1U rVV76ibPEmAe9JRmHJ12mllCJ6q1oqr1tackjxhB+evvZMRGw+F5BMxaDSVm BOWvO8/rJs8SNhpmRJp0zzRwPcyPK1OtvvhdDGNxmmR/M5LyVxcHWtd0ONGs YWEeEZkjXdSz0qS8blq/h3UPCu4Bc8oG6Tvip6uwznLvU61hqtacDZL3uagS qtYkek+PzYbD8wiYeSIzmvLXXqm6ybMEmM7ifu28AGaZx1Qr3pvuwmgp6q1o yl/LXqN1bYHDtjc7wxwpMkflq+/V+ahuWr+HVa3m6gZzpbO0X7vMxDr/8wfV OlHV+quzpLdGX1O1PllG6zoFh+cRMIt+IDHHUP56appu8ixhk2G+2iWdIeia amZ/O6jWBdjvgFFrtyTvj6H89dguWtc9OGx7cyzMniIzdpv6Xt3W6WY71rAw p8ySzhB0i8c6S/1ItW5Qtf48S5L3W7JVrdPfU4alCByeR8C8JzLjKH8d8Vg3 eZYAs32GtF+7R8AsmUG1HsPeDoyEDElvxVP+WpueN0slODyPgLlKZCZMpd9O 3XUzljUszMcTpP0a1B/r/CiZar2taq05UdJbieNVras707p84PA8AmYPkZlE +esfPrrJs4TN2AMsIf37NTgI5ofBlF/bYh8LRngJyd+FyZS//vYhras7HJ5H wPxKZKb0od9OTXWzLWtYmCsdpZlkD3/zt1N7qtVF1XrRUZIfju1C8zkbaV2R cHgeAdPRSWKOo/y14lLd5FkCzJZvpP3asxnM4hWp1kbYs4MR/UbSW+Mpf21e +LxNgsPzCJiLROYEyl9DBuhmDGtYmEePSfs1xB1msY+o1s6q1rfHJL01sZJ6 3jrSu79lCRzrdzlMOcCsd1xifl5K1fr8tm7yLGEL8n/x+3CvajCdac9+9DDs T8LoK3p3TaX89Sv6O9qyFQ7PI2BOkZmUv86N1E22Ndkdpn2KtF97l4X5/mWq 9XNVq1eKpLfSHqjvtcImWtdpONbvcphygDlIZH5xQ9Xa4E/d5FmC+Z0ESjPJ UGeYRenPldF4b0qDkREoyQ8nUf7apfB5uweH5xEwt4jMdMpfnUvoJtuarAqz bFtpJhn6FuZ7hbOEG1WtgW0l+eHknep79aYZ5DhHOB1Yw8JMEpkZ61Wt73+t mzxL2Irs76W0X/s8huk0nmo9jn1nGMkvJb01hfLXVc60rspweB4B83uROZXy 19Uf6ybPEmBefSjt1755MB1HW8/CFdRa6i9Jb02bSb+/fGldTeEksIaF2Upk Tk9Vtd6cr5s8S4AZsVvar/0uwSxCme4YO+yxw1iwW9JbMyh/HUT7w3FBcHge AfOgyJxJ+WuFfrrJswSYzlul/Rp2HKYDfSdjyqtam22V9NaX/en3F2VCcSPh BLCGhTlcZM4Kolrb6CbPEnKwVxQlzST758K0/4/1LFzMSRinoyT54WzKX0O3 0LrS4fA8AqZttMScQ/lrzn7d5FkCzJstpPtNAzabc/X023NMF1VrmZaSvaGv KH+dk0TrWgonnjUszLYic2419az09dJN6/cwTCjA3NpYmkkOzMI6ba5TrcOR ncDIbyzJD78uq2q10Ax3XA4cnkfA/MRbYmZS/nqyqG7yLAFm2folhLUOWgLz 7UnrWbiCWtsbhmBdlL+WLHzezsBh25s9YVpE5jeP1ffqflo3rd/DMKGwDX8O b5f26+A5WOcbmusYsxA5EYwl2yW9NS9P1TqS+iDuPhyeR8A8LjLnU/561KKb PEuA6SDeH/4sHeY/C6nWTarWJqK93AWUv/rXUEa8Exy2vVkd5jCRuTBXfa8d cnTzU9awMDdnSft1SLI5l2CxnoWLrQjjTpaktxZtVrWmP6B1VYHD8wiYZdZI zMWUv94I0U2eJWw3zOktpJnk0GiYr4ZRrXdQK4xDLST54RLKX93o84pvCofn ETD/EZlL59Bvpz66mcIaFmaShzSTDB+Mdf5NZzBi7JH/wfjeQ5IfLktXtVam 30rxwXB4HgHzgsj8lvLX61V1k2cJMP1rSWcIhoXAfOlpPQtXUGtKLUnev5zy 196F32sUHJ5HmJ+fyFwxmH47+elmIGtYmDdeSvt1eCes8wX9PRbTGFknjHJ/ S3rrvyE0n7OY1pUOh+cRMANE5krKXzMddZNnCTsMs0i2tF9HtIT5rHA+rCtq heGTLemtVZS/Nil83pbB4XkEzHCRuZry1+P7dTOZNSzMRSul/Rphnn99esp6 Fi42GsaxlZLe+q6Get6iKtC6tsGxfpfDlAPM1yLzezr/+ui4bvIsAea4xdJ+ jTTPvz7Zaz0LV1DrD4slvZVF+Wuxwu/1LByeR8C8IjLXUP76YJ1usq3JUJi7 oqQzPyPN86+P51Gti5Bhw3gSJZnPWfNcfa+/h9G6/oRj/S6HKQeY1aIl5lo6 //pdsG7yLOFnvA+7SjPJKPP866OpVOtm1AqjnaskP/yB8teO9LwlvAeH5xEw x4jMbMpfU2N0k21N1oZZrLw0k4w2z7/+1cd6Fi42G4ZveUl++OMB9b32oRn4 hKpwurGGhTlCZP5E51+LjNVNniXAjLkn7ddo8/zrQ8rYYu6qWlfdk/TWOspf K9FZpIRmcHgeAfOsyFxP+evygbrJswSYl29I+3WUef71gTfl1w6YTYBR4qak tzZkqu+1Ps0hJPSAU4U1LMzmInMjnX/tZ6ubPEvYaZit50n7dbR5/vXPWtaz cLFbYMTMk/TWJspf+6XQuqLh8DwC5jKRuZny17m7dZNnCTDrRkn7dYx5/vW+ HdXqjTkMGAOiJL21JZx+f9WhdU2G05U1LMzpInMrnX8tV1c3eZYAM3SgNJOM Mc+/5j+hWrupWqcOlOSHOZS/nqZcM+FbODyPgLlVZG6j/HVjtm7yLAHm017S TDLWPP96l85gxEZg5gRGrd6S/HA75a/ONN+UsB1OZdawu/D3WJ60tyzmWdU7 O2ldeMe5B+NoXkEfBBQYAf8zftDXtaOOet5u0p5fwlk4/J4VmM/+kvTWDnof DqB3r4Rz5tpc9ftgYDZ4JDF/pvfhGXN08/9bZ51r0n6NM8+/3v6GPr/FmK+B MfSapLd20jzi6pq0rgdw+Aw3zG9E5i56Hz4QrJt8/hpmyjlpv8ab51/zpljP whXUmnVO0lu76T6YQd8qI7EoHL5nCvNXkbmH3ocHndNNfh8MzNqrpP2aYJ5/ vTWYaj2JWSIYYask/bqH5hEd6D09sSocPsMNc5rIzKX3YZu+usnnr3fj+esr zSQTzfOvf/SgWvNRK4x5fSX54V66D6YxZS+JvnD4ninMAyJzH70PL/1dN/l9 MDCfB0kzySTz/OvNetazcBYfGDWCJfnhfppHzDlL6+oJh89wwwwWmQfofTjZ Vjf5/DXM2a7STDLZPP96g2alLRVVrbmukvzwF7oPpmttWlc0HL5nCvOhyDxI 78OWL3ST3wcD85Py0kwyxTz/+ttLqrUJZsRgBJSX5IeHaB6xLN2fk5gBh98H AzNWZB6m9+FSRXWTz1/vMcz1J6X9OtY8/3o933oWztIdxu2Tkt46QvfBVDtI 61oOg++ZwixzSmIepffh4Jm6ye+DgZm1Qtqv48zzr9foPcASgXk4GHmiOz6O 0TziUTr3krgdDr8PBma5/0rM4/Q+PL2bbvL5a5i35kn7dbx5/vVq4WzSF6rW ivMlvXWC7oNpTjPIiefg8D1TmIEi8yS9D0/21E1+HwzMAbOk/TrBPP96ZYb1 LJxlEoyZsyS9dYrehw/RWaTEh3D4fTB7zPlriXma5hHdNukmn7/ONcyF7aWZ 5ETz/OvlCdazcJZJME60l+SHZ+g+GPuRykgyHT7DDfOtyDxL98HE7tRNvmcK 07+sNJP83Dz/eonOXVlO4fODkVxWkh+eo3nE63Q2OqkaHH4fDMwsmUnziH5v dJPPX8PcVVyacaSa519/DbSehSuo9VlxSR5xnu6DGUvZVVJzOHyGG6ZbCYl5 ge6DCRigm3zPFOZcR+kMQap5/vWiq/UsnGUrjP2Okrz/Is0jXv6b1hUCh98H A/OpyPyV5hGbDNZNPn+9F7//50r7Nc08/3qBZn/jKqFWGPfnSnrrEt0Hs9mF 1jUKDp/hhunytcS8TPfBDPLXTb5nCjNomrRfvzDPv56jTDzOB/OrMNKmSXrr Cs0jPrpK65oCh98HA3OdyLxK84hBgbrJ74OBeSlV2q+TzPOvZ3+znoUrqLVY mqS3rtF9MF8OonWtgMNnuPfht2ZJaX6Ybp5VPbON1hWJfoVxoqg0l5tsngE9 TefA4yYpw2Ir3T/PMM9WnvrCes4qzhFG21PSz3iKeWbxZAIZW2HsN4zK06Sf x1TzLOCJrmScxjwqjKfjpJ/HNPOM3fHW1vM2BcaSLtLPY7p5du0Y9XC8I2ZF Ybj4S/fpZ5hnwo4WI6MyjAM4Y/Ja+nnMNM9aHb5lPXcRFwSj3BPp5/GleYbp 0AXreYYCw3ef9POYZZ4NOvgTGSMxYwnjs+3Sz2O2eebml+XW+XvcyF8MY85o 6Z7lHPMsy4Fk61w7Lh2GS2vpvt1X5hmR/VFk5Chjk490P2yuefZiH/15HH8G PQOji4d0n+lr80zDXh/rfDNu6UHkEDuln0emeVYgtyTt9TlhbhDGV9ukn0em OYO/h85EJ1RRRvFs6efxjTnbvuuydc4VdwbGnPHSz2OeOTO+szBXDVZGzhTJ Htl1ul/L44YyUlbC4RkUzL9E5m80mz3tgm7yu7FgfjhVYv5Os9n+rXST340F s77IvEH3Wu07opv8LArMwHUS8ybdaxXoTuYqmDz7gZkhMv+gmWi7qrrJ76SC uUpk3qKZaPuZusnvpIK5T2Tm0Tzzkou6yc+AwLwhMm/TfVK+qbrJz2/AzLsg Me/QXVCpdN9KymqY/PwGzAoXJeZdmkUOzdFNfhcUzGYyk+aIk9rpJj97ATNU ZObTPU4vRusmPzcB08lGsg9wj+5gCski8zuY/NwEzNYi8z7NAJf5Tjf5HUww B4jMP2l+d3133eRnHmCOE5kP6P6kh690k59XgOlQQ2I+pLuPRj8n83uY/LyC +SyJzL9o9jaqoW7y3Axmb5H5iOZmp/fXTZ6bwUwQmY/p3iLnfN3kmRdM184S 8wndOdTuHzKzYPJzAjD7i8ynNPN6uIpu8rzKfJZE5jOaV91/WDd5XgVzoch8 TvcFOZTRTZ41wcy2SMwXdNfPALq3PWUNTD6fDzNfZL6kWdOJa3ST50Qwi8ZJ zL8pF291QTd5TgSzpsh8Rbl4pc90k2c8MNuLzNeUaScd0U0+Fw+z1UKJ+Q/d sZNJ+5Upa2HymXaYiSLzDc1n/tpWN3k+AzNTZL6lPHrLDN3k2QrMjTKTsuT4 17rJ59FhdsgVmFdt6G6biH1k/gCTz5LDnCAybWkuMviRbvJcBOYSkWlHObDn Fd3kmQbMHSLTnjJc93a6yefADxnmBi9pLjffPAP6M907mBCFfQcYAXWledcC 82zljvnWc1YFxp3X0t9hC80zi9sL50iX4QwODMsT6e+wReZZwG1DyNgG47Bh nPxO+nksNs/Y5TS3nguKrwIjLEn6efwfY3f+53P5vQE8UdmzlagoW9aUXZYI WVqp7PsyqBSiMMOsDMaY7CplyRKGIhX1oT5UiESEyHskax+7ZK35uk7f+5ru +6frH3g+dHrNzOs+5zr3a7btrq3hnX/DTjujzGB1rjbHdsJWZ81gc6KHAuPt /uq8aq7tWn167d+5izce2YqdkktqPebZDtMnu2jUQx8Gxv4zaj3et92gVcz0 D3/BGbceU+sx33ZuPn7v3/P3N16B8dZnaj0W2C7Lykk0xsL4Ht/UGKD22Rfa jsiKF2nMQz8Ixobeav96ke1efNT533PYf4ynaqo9yw9sp+FD3t84HM/pXBgb K6vzkcW2K7Cce3TDz8DYhgzIIbUeSyyDn867UEfkwh4DjDkr1XostWz7Uv7+ HVHKGbWWqPVIt8z4ku9o1MeOwQ83jOnd1J5lumWxF/+HRltnlHlI7Vkus4zz B7x/YQSe09MwTjyg9iyXW3Z40Vga45wxraTas/zQMrkLs+Zp7yObvx29np/V enxkWdcFPKeNWOuMRgvUeqywDOl85npH7EZuHsaeWWo9Vlre833u3ow464yX pqr1+Ngyj3N5X2e09fp24LsrrdSe5SrLEs7h3STRZZwxpbjas/zEMnqzmb+O boC8OYxahdS/lZ9a9u29j2m0c8aIQ2o9PrNM2bu8xyx6ILLgP2JHYIZaj9WW 1Zo1isZ4ZwxPU+uxxjJQ77Snged0LIwr3dR6fG7Zord5j0X0WmdMbqe+O3xh mZ232NuL3oMM9U7kWPKpfyv/Y1mYmfyuUfQ5ZxTIrf6tXGsZk+kn+Y6XB5lp GJdOy4b9fZnOu5BjCuI9BsZLZ/7JCq3+x1j9/0ZD/10zB+8a7M5397iecMLd PpjZrgn5o8gtPAee47ej4nrZv62Wn4+G2VAyb+U5sPZw3wzz0TB7SuZtPAe2 buybYY8G5mjJzMls88ABvhn2aGBWKSXMmyO5mEvOw9lFXG+YYY8GZpRk5uY5 MG8p3wxzyfbfrpk8By6v6pthjwbmfMnMk7Vj97Nvhj0amHVbKmZe5oEzOGuP 6wMz7NHAHCiZ+XgObDXFN8M8MMw0yczPc2Dvz30z7NHAXC6ZtzPLW+aob4Y9 GphPD1TMAszhXmVuOS4KZtijsWdJMgsyh1vyHd8Mc7j2LElmIWZo/9vSN8MM Lcz1klmYGdq1bXwz7NHAPD1YMYsw/7qthm+GPRqYpV9TzDuYf83M65th/hVm Y8m8k9nVYcG/M8yuwuwmmUWZXa2TzTfDHg3MYzMU8y7mTo8z4xjXF2bYo4FZ bqZiFmPutPodvhnmTmE2lczizIwO3eCbYWbU/sZJ5t3MjG7f6pthZhRmvGTe w7zn8s98M8x72nvIOsW8l3nPkpzBxPWDGeY9Yc6RzBLMaqYs880wqwlznWYy q3lyuG+GWU2YBySzJHOWFfr4ZpiztN+fRxTzPuYsL2S91/WHGeYs7b9dMu9n RnLsAd8MM5L23y6ZpZiRPNzMN8OMJMxrklma+cZeD/lmmG+EmZxX2JOOlGG+ MX/W344XYYb5RphrJbMss4mzy/lmmE2EuV8yyzGbmDnFN8NsIswrkvkAc4X1 I74Z5gphVpD7fOusrzWNd2HFVMLOI4xiEbXP96X1tabyjqmYZjB2Ydf/ZbWv 9ZX1tabwO/IxXbGPCKO23Of7r/W1JvNOpJhhzljRSe1rrbe+1iTecxMzGbuC MCrKfb711td6s8a/Z7nDc/2E+09+VeuxwfpaacwExnyNfjKM2ANqPb62vlYq 8+wxGc5Yv1itxzfW15rA73PEXMaO3W7kJOQ+37fW10phr3BkQWdcbav2tTZa X2s8v2UwsjL232B0lPt8m6yvNW4qjWbO2FhC7eNstr7WWGYCR3ZFb3zPDePl XWo9vrO+VjJ7hSOHOWPzfLUeW6yvNYaZwJGTsDcGo5Xc59tqfa3R3Nsbme6M jClqPb63vlZSdhrfYKdrL76HXkrta22zvlbCRRoZzrhSTO1r/WB9rfhtNK6g zw9jYEG1r7Xd+lpx7BWOKuSMOnKfb4f1tWI5mxtVCbtQP+PdW+7z/Wh9rVG8 e2bU4844NVGtx07ra43kzsOo7thTgvGq3OfbZTtJMby3btQwZ+SS+3w/2a5P 9IM0pmCHaN8NY3Fetc+323ZoRnBHddQyZ8QeUOux23ZThnE2N+ob7PfAGLdL rcce2/l4g98WH5XhjEpb1XrstV2K17mrOOoqdm/2478lXq3Hz7ajMJT7Z7FF nFGgjdqz3Gf7BEP4rfrYytiLgTG7pfrusN/2CV5LoPG4M841Vt8dfrF9gsG8 eyi2O3ZWfsGO1Q9qPQ7YPsEg3ukTO9wZZTeq9YjYPsFAvgPG4jkdD+O7dWo9 Mmyf4FXuBMUud8bjaWo9Dto+wYBzNL7F/sgB/Lw0Vd8dfrV9ghfZV4s9iIwv jNTWwh5RpDzz0ft4tkyqDifMR8PcKJkVmI+u+aVvhvlomEclsyLz0Wcf8M0w Hw0zRxvFrMR89Csf+2aYj4a5cJhiVmY++iL/TibVgBnmo2FGJLMK89FLc/pm mI+GeV0yH2Q+On2ob4b5aJjFhytmVeajW3zlm2E+GmZdyXyI+egj/X0zzEfD XDNbMR9mProy76hJqgkzzEfDPC2Z1ZiPzj7XN8N8NMx8cyST+ejSlX0zzEfD rCSZ1ZmPntveN8N8NMwOGxWzBvPRN7Hfl1QLZpiPhjlDMmsyH/1lmm+G+WiY n0hmLeajO9f1zTAfDXOXZNZmPvqtQ74Z5qPtd90pxazDfHRx9hCTasMM89Ew S5xWzLrMRxct7pvh7A1mfcl8hPnoWo/7Zjh7g9lRMusxH71qh2+GszeYV4oI 94FG6jMffTrLrAMzzEfDrHmHYjZgPnpITt8MZ28wn5fMhsxHv7rSN8PZG8zB kvko89Hp130znL3BzFZfMRsxH52L96wl1YUZ5qNh1pPMxsxHV5zim+HszX4v SeZjnIv/vs43w9kbzGGS2YRz8U2tfDOcvcGcLplNORcv/bFvhvlomAV7KWYz 5qMbsQeS9AjMMB8Ns5VkPs58dGIl3wxnbzD7S2ZzzsV7DPXNcPYGM1kzORe/ 7zffDPPRMJemKGYL5qOL8LyfVA9mmI+23/OS2ZL56L/2+mY4e4N5ywTFbMW5 +P4NvhnO3mCWk8wnOBffW8k3w3w0zGfrqz3LQ7bH3J93R8dec2eXJ+Q7N36z PeZ+zL/GFXbG0hxqD/ew7TH35fdY4qrAiOB7DmvUc9gR22OO4i5XXHPs8cFo PV6tx1HbY+6TNX/v4YwL8Wo9jtkec2/uGsSNgJGBv0G11Z7lcdtj7pU1u53q jKJV1Z7lCdtj7sl7c+OWO2NpplqP322PuQf3UOI2YpcPRu1Laj1+tz3m7vy+ dtyvMA7iHXyZWo//2R5zN+51x113xvkYtR4nbY+5az1nxBdxRtwQtYd7yvYJ uvAsEP8gdvFg3Puy2rM8bfsEnfl3O74FjF8xQ7uq1uOM7RN04vcU43s4o/B5 tR5nbZ+gI8/I8dHO2HVCrcc52ydoz/ti46dh/gKj3+dqPc7bPkG7YzTwnC45 dMMY86raw71g+wRtmZeK3+SMu/uqPcs/bJ/gBe7FxR9yRnpttWd50fYJnud3 G+Px+3QpjISqas/yT9sneI533SbcAeM39AoPq/W4ZPsEbXgOTqjqjCqfqPW4 bPsErfnuktDCGRfT1XpcsX2CZ3l/SQJylukwOs5X63HV8p7PcFaZMBLGYfQF qqk9y2s2d32aew0J05zxTEW1Z3nd5q5Pca8h4UNntCml9iz/srnrk+ydJmx2 RtR5tR5/29z1Cc5ME/CcLjuCbPMitR6ZNndtxfvjEq47Y9JstR6ZNndtyfun Eu90xr0DxXpEbrK5awv2cBOrOmN7X7GHG8lmc9fmPAMmtsTu2NEbxsF7xHeH yM02d23Ge7ASezmj/ymxHpHsNndtynlFYowzbjui1iOHzV2b8D7pxOnOuLOt +HxEbrGfl0e5o5G4CPlmGOPa5hPy0U8yH51ruzOSL8MJ89EwF/QS7gSIPMVz 4NsdaF6xf1sLPx8NM0Myn+Y58OCTvhnmo2Fm762Yz/AceLWIb4Y9GphlJfNZ 5qOLNfbNsEcDc4dy/0mkNfPRq/h91eSrMMMejT1/yl0lkTY8B6666JthPhpm Oc3kObBddt8MezQwm0rmc8xHt1nmm2GPBmZH5Q6QyPPMR4/lPUjJ12CGPRqY 0yXzBZ4DL0T5ZpiPhrlKMtvyHJhvom+GPRqYP0pmO+ajv1/jm2GPBuZJ5f6T SHvmo+fxHrvk6zDDHg3M0sq9IpEOzEdvesU3w3w0zMaS2ZH56Bfv9c0wHw2z u2R2Yj46qpxvhj0amNczFbMz89GvZ/1s/gUz7NHArK7cKxLpwnz0ogzfDPPR MFtLZlfmo++/zTfDfDTMgZLZjfnogzt9M+zRwCyg3AES6c589KSs38l/wwx7 NDCbS2YP5qP3H/PNMB8NM0oyezIfXWK6b4b5aJijJbMX89HDZ/tmmI+GOU8y ezMf3W68b4b5aJj3KfeKRPowH72Jc/HkTJhhPhpmB8mMYj66RpxvhvlomG9o JvPR0570zTAfDXOaZPZlPvrH2r4Z5qNhrlDuP4n0Yz56Fs+nY2+CGeajYf4u mf2Zj374I98M89FH1TtVIi8yH516l2+G+eij6p0qkZeYj86b3TfDfDTMBsod IJGXmY/+jPcYjM0GM8xH27MkmQOYj25+xTfDfLQ9S5L5Sta35KN8M8xH23uI ZL7KfPSRj3wzzEcfw8wgj3gujdxq5/SG3A1P/Aq7hDC65FDPYbfZOb0BcwWJ +5yRvFPsW0Ry2jm9PvcZEy84474t6jksl53T6/HeqqR8MI7fMD6IU+uR287p j/De1aQHsAsII2drtR657Zxel98BSWrkjKMt1HNpHjun1+F9DEkdndGvkXpO z2vn9NpZ2aEhME7cMFZvU+uRz87ptYrRSMUuH4yy36r1yG/n9Jp5aCxyxrq1 aj1ut3N6dd71lfRfZ0R/otajgOWjq/H9OGk/jN/xDeYmat+ioOWjH2b2LOkP 7I7BOFFP7PFHClk++iG+W43O74y81cUef6Sw5aOr8t6i0eWdsTC7OAOKFLF8 9IOcAY1uDON/6AWtVutxh+Wjq3C/YnQnZ3T6SK3HnZaPrsz9itFDnXEmTq1H UctHV2JuZvREGCdxh3ctsccfucvy0RXjaXzgjIbVxB5/pJjloytwBjR6PfbW YPT9W61HcctHl+ee1uj9zqjxp1qPuy0f/QCzzaMvwjiF9+Claj3usXx0Oe4i jbndGSei1Xrca/nosvVplHdG/GvizCNSwvLRZbj3O+Yx7K7BWPlSXtWwfHTp kjQ6wziN36dX1HqUtHx0Kd4xPGaoM7KdU+txn+Wj7+cdw2PSnHH9uFqP+y0f XZLn0jGLsXsGo9MatR6lLB9dgnePjMFz2v8Mfl5eEWcekdKWj753H41fnFEi Spx5RMpYPvoeZr7G/OmMRbXUd4eylo++m3mXZDynL8Jo9KD67lDO8tHF+W6V XAHGWcyAflPr8YDlo4vxu27JTZxRbpVaj/KWj75rGo3Ozji/VK1HBctHF86q x+vI+MKo9LnwPbnIQOajE7l/mrIZTpiPhtlTMgcxHx2Z7JthPhpmgmQOZj56 5jXfDPPRMOdI5mvMRxdN9s0wHw3zi18Vcwjz0e/y7zW2xZJfD/PRMC9L5lDm o5/7xTfDfDTMoocU83Xmo19o6ZthPhpmDcl8g/noc1N8M8xHw3xOMocxHz2x gW+G+WiY7XMK33iKDGc++ifmFVK2wAzz0TBTJXME89HpQ3wzzEfDXKqZzEdv yfTNMB8Nc7NkRjMf3aqyb4b5aJjDqipmDPPRi9nvS9kKM8xHw1wpmSOZj+7X yzfDfDTMHZI5ivnoW/L5ZpiPhnlGMmOZj37sU98M89Ewh7ZVzDjmozek0vwe ZpiPtmdJMuOZj/7ylG+GszeY30lmAvPRGcV8M5y9wTwumYnMR3dZ4Jvh7A3m lJGKmcR89Iys30vbYIb5aJjfS+Zo5qPv+cU3w9kbzJOSOYb56LvG+GY4e4OZ e5RiJjMf/cJ23wxnbzDLLlDMscxHr+BsI+UHmGE+GmZ3yRzHfPTOKN8MZ28w YyVzPOfiUyb7Zjh7g/muZKZwLj6ohG+Gszd7Z5DMCZyLbx3jm2E+GuaKLYqZ ynz0CX7nNmU7zDAfbc+nZE5kPrpSpm+Gszd7PrcqZhrn4rlb+mY4e4NZXjM5 F9/8qW+G+WiYa84r5pvMR6/lfmfKDphhPtr+dkjmJOajF6b7Zjh7g5nvgmJO 5lw8aYZvhrM3OyNI5hTOxeMyfTPMR8Mc/L56Dqto+ehCzOIlT8LZ5Ryy4B3U nmUly0cXzDqHLXFGqwpqz7Ky5aMLZJ3DNjjj6H1qz7KK5aNv589v8gHs8cGI u0vtWT5o+ej8/JZY8iUY55E1WKjWo6rlo/O9xLlBAWfMf0+tx0OWj87bhUZF Z7SYrtbjYctH53mGRlNnzIhS61HN8tG5+U45tgt2+S6gJ3W32rOsZvnoXPxm xtg3nJFUWO3RVbd8dM6yNCY545vDaj1qWD76tqyZ2xJnbN6v1qOm5aNvzZqH fY1dvD+wkz1JrUcty0fn4Bl6bMQZvXqq9aht+ejsp2lcckbVDmrPso7lo28+ 6IxxBWFcRI8/v9qjq2v56GzceRtXCbt0MIpmqPV4xPLRN/EOg3FNnTF/t1qP epaPzuQ3Tcd1dUajbWo96ls++u9FNIbB+BPv4p3VejSwfPRfb9HAc1oHxszn 1Z5lQ8tHX+d9HeOWOuPrJ9Se5aOWj742ksY3zohqovYsG1k++uqrNDJgXLph LNmh1qOx5aOv9KBxGXtnMOpvVuvxmOU9Lz/njPGFnLH3K7UeTWzueon3kY2v 7IzMSWo9mtrc9c86NJrBuHzDeKi52rNsZnPXi7wjZ3w3Z3RrqPYsH7e56x/8 RuT44dg9g1H0drVn2dzmrhfy05jsjJm3qu8OzW3uep7frh+fDuMK7on7Qq1H C5u7nuXd8eO/dcaACWo9Wtrc9cxhGnhO68EYOVatRyubu57eTeOKMyLRaj2e sLnrqU18PysM4yr+v+RW3x2etLnrSX6XNaWyM87MVX9enrKfl+O8yyClAfLN MHrPyyHko6cyH72S73RpaXDCfDTMIiuU/b5pPAc24W542pv2b+vo56NhPi2Z 03kOHFfSN8N8NMzXJHMGz4Hzjvlm2KOBOU0yZzIfvb6wb4Y9GpgN9irmW8xH d+lKcxLMsEcDc4hkvs1zYJdNvhnmo2FO0UyeAzN/8s2wRwNzhWS+w3z0tVjf DHs0MJ/MVMxZzEc/zF3AtMkwwx4NzCTJfJfnwFl1fTPMR8OcJ5nv8Rz4aXff DHs0ML+SzNnMR78+wTfDHg3MzuVuEcw5zEc/wRlf2hSYYY/GniXJnMt89KDG vhnmo2GulMx5zEcXOuubYT4a5nbJfJ/56PxXfDPs0cA8+5Rizmc+uiT7KWlT YYY9GpiVnlbMBcxHP7vSN8N8NMyWkrmQ+ejNP/tmmI+G2U8yFzEfPW6hb4Y9 mqu266qYHzAf/cjNNKfBDHs0MFdL5mLmo5M+980wHw1zj2QuYT56Yz/fDPPR MC9K5lLmo0sN9s0wH21/34cqZjrz0Td18c0wHw1z0juKuYz56EHsx6RNhxnm o2FukszlzEcfaOObYT4a5jHNZD66YUnfDPPRMHPMUswPmY/Gl27+bYb5aJiz 1ivmR8xHN+M3X9JmwAzz0TB3SuYK5qP3JfhmmI+GeV4yVzIfXfN/vhnmo2EW 3KCYHzMfveon3wzz0TDTTyjmKuaju/Eu2rSZMMMHFOYRyfyE+eizW3wzzEfb s/S7Yn7KfHTrur4Z5qNhlpLMz5iPnpjom2E+GmZ5OXv2tJ3Tj2X199til9De 3+Xs2TN2Tj/6FI2BMK7h+0ol1XPps3ZOP/IojfHOuPmseg5rbef0w9xBTnnf GVuOqfVoY+f030rTWItdQBj1MtR6tLFz+iF+0yBlD4zruH9ggFqP5+yc/it3 kFPOOmNAH/Wc/ryd0zO4gzwhtzM+66ye01+wc3qEvxcmlMEuH4wmVdRzels7 px/gvecTcD6+9tcNY+4htR7t7Jz+C+e5E9o5o98+tR7tLR+9fz2NQc4oImfP Olg+et8qGnhOr8NInKfWo6Plo3/mu9WE+TD+xlyundq36GT56L0zaaxzRpvy 6syjs+Wj97DviVvC/zEOlFR7/F0sH707hsY57NLBiC6q9vi7Wj76J+5gpuaB kYmMywK1Ht0sH72Lfc/UMs6Y9a5aj+6Wj97J95TUR53x91S1Hj0sH/0jvzWc 2h57azDe7aPWo6flo3dwTyt1kO2+FapZ93hxtcffy/LR2yvQSHFGXCG1x9/b 8tE/sO+ZusAZVXKrPf4+lo/exuxR6pfO+GCfWo8oy0dv5Rw2dS9217LdMJa/ qdYjyvLRW3gvb+p5Z7yTrNajr+Wjv+P9YxPzOqNae7Ue/SwfvZl9z4llYdx8 w4j/v8bO9SfrMozjVopQoGEHddVm2cSiuSa9aIUBLtfKV83Dwmw2zA7K4eFo gHKWg8Bz/j2cNAHxAUQOoiDHFDRlTUcHZSnN5damtqEWZax60e+66LrsobV9 /4HP3NffPa77+3zu+w5CO/5P2I8eOaeMCDp7RoyFc9C/ldvZjz6n57Ot0cJo v4TmsYP96LP6O741SRhrLqB5xLIf/YW+U2stI8YDJuNKPppHHPvRZ/RshNVL Z8eI8d46NI949qNP60xtPSmM4bfQ2SGB/ehhdSGtl4Xxzmp0drCwHz2kPpB1 khizTUbtV2geiexHD+i7vbZAcnyJUfbDLGDW7FE/+gV1RV1biDPTjybmGYjZ q3500Ye+zJl+NDF/hJh96kdHjfoyZ/rRxLz/GsLsVz/61CZf5kw/mpht16eZ S6aZS/5hjvkyB6b96MVZN4wIX+ZMP5qY4SPoN5vELkn/vf3AMvl/XgD/TpfM Lkmf3tNoixSG145+synskvTe60uiiTHHZEysQee7VHZJevQuF1syOc/EiFmF zndp7JKcWKuMcmHcCELnu53sknS/pgyvMHLmoPPdp+ySdOm7SbZTxPAzGZf7 0DzS2SU5ru9K2q4I404pmkcGuyTH9N4Z2yR5z8R4Nh/NI4Ndkk4/YdiDhDGe geaRyS5Jx5QyQogx12Q8FYDOd7vYJWnXfYg9UhgFs9D5bje7JG3qrto3kbdM jIgpNI8sdkla9Z0Se4owem6jeWSzS3JE71O3W4nhbzJ25aF55LBL0qJ3y9ob heFNRfPIZZfkcIMyhsg7Jsa2OHTezWOXpFnvkrCPC+OXVeh8l88uSVOJMn4l RgDNmZNoHgXskjRmCsMxTxhnf0Lz2MMuiTdWGcvJGybG2n40j0J2SQ7pGW26 DWSacbUTzaOIXZKGt5XxLjEeNBlRH6PzbjG7JAf1jLYjVRhfvozOdyXsktTr GW0HfaejxCh7EZ3v9vJv43V6RtvRJIzREPQcbyl3VLXq2zuGifGQydjcjeZR xh3VAX230zEujLZWNI9y7qj263tcjt+EseEQmoeVO6p9ekbbOZ883UCTMRiG znc27qhq9G5653JhxISis4ONO6rqi8pYLYwVS9HZwc4dVZXe0+vcLIzPJ9E8 HNxRVeoZbSd9p18HmYyQZjQPJ3dUFXpG22kThqsWzcPFHZVHz2g7m4WxshLN w80dlaH3aDlPC+OJADQPg9eLQ88TO78lF4QY/veh68XD68Wu54mdE+Q/zDMZ z9nQ9VLB68Wmbza5/IRRVYiul0peL1Y9T+xaIoyVWWgeVbxeyvU8sesVYsw3 GdmB6PdRzeul9K4y1pG/QIxFs9Hvo5rXy149T+yKE0bHn2geNbxeSvQ8satI GNnn0Tz28XopvqCMWmI8bDLO56J57Of1UqTnZ1195B8QIywdzeMzXi+F6hq4 Lgpj6E10vRzg9bJH78dzTQgjOgrde9Ryp1ugHrt7LjGCTUbXXTSPOu5089Vj dz9N/gAx3jiH5lHPnW6e/m7qflUYmSfRPA5yp5urHrt7vTDqutE8GrjTzdHf hN3xxFhgMn5+Hd17HOJON1s9djd9p25ipIWjew8vd7pZ6o2464QxFYbO2o3c 6e5Wj93dLwwjFJ21m7jT3aUeu/sSMR6hPRC8F2vmTjdT75R136Lf2okxgt59 fPUwd7oZ6rEb/sKIakbzaAn1fQPPeEYY36SjeRzhTneneuwGfafGoybjrxXo rN3KnW6aeuzGBmG0LENn7TbudFPVYzcS6Pd2Yiz+Hc2jjTvdFPXYjWJh/HEL zaOdO91kdXyMemI8ZjKCG9A8OrjTTWpSxoAwGlPQPI5yp5uo71AY9J1WECMx Ft17dHKna1GnzLgtjLGt6N7jGHe6CeoreAKI8bjJeH4xOjsc50433qKMpcI4 ehPNo4s73bgYZYTT793EqL+G5tHNnW7semVsFMbN79A8TnCnu0PfQvJYiLGQ /lZ+hObRw53udnUBPCXC8NuCzg693OluU1fSc5A6RGJYNqKzQx/3hR/ouwme QWHMhfdi/dwXbtVzi54xYiyiPdD3aB4D3BfGaLfsuSOMJ+G92GAkvyOhe9OK AGGUWPxxRrDJeOm/jNb/Z/yr1x2MNDm7zX/H9Whl1LxPnL8BQfk3JAz8AAA= --0-1472595759-959784053=:26843-- From owner-netdev@oss.sgi.com Wed May 31 07:44:45 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 07:44:26 -0700 Received: from ikar.t17.ds.pwr.wroc.pl ([156.17.210.253]:61444 "HELO ikar.t17.ds.pwr.wroc.pl") by oss.sgi.com with SMTP id ; Wed, 31 May 2000 07:44:22 -0700 Received: by ikar.t17.ds.pwr.wroc.pl (Postfix, from userid 1002) id D23CFC8007; Wed, 31 May 2000 16:44:58 +0200 (CEST) Date: Wed, 31 May 2000 16:44:58 +0200 From: Arkadiusz Miskiewicz To: netdev@oss.sgi.com Subject: (fwd) Re: sin_zero question Message-ID: <20000531164458.A1467@ikar.t17.ds.pwr.wroc.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.1.11i X-URL: http://www.misiek.eu.org/ipv6/ X-Operating-System: Linux sunsite 4.0.20 #119 Tue Jan 16 12:21:53 MET 2001 i986 pld Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing ----- Forwarded message from Ulrich Drepper ----- > Any other suggestions ? Get the kernel people to handle this. I have no idea why they write in the rest of the structure. ----- End forwarded message ----- ok, so I'm asking: struct sockaddr_storage ss; memset(&ss, 0, sizeof(ss)); (gdb) print *(struct sockaddr_in *)&ss $4 = {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"} getpeername(s, (struct sockaddr *)&ss, &length); (gdb) print *(struct sockaddr_in *)&ss $4 = {sin_family = 2, sin_port = 5888, sin_addr = {s_addr = 16820416}, sin_zero = "¹\221\020À¼C¥Á"} and kernel put's garbage into rest of ss (sin_zero) instead leaving it alone. This is bad thing because we can't memcmp() two structs to check if adresses, families and ports are equal. Can someone fix this ? -- Arkadiusz Mi¶kiewicz http://www.misiek.eu.org/ PLD GNU/Linux [IPv6 enabled] http://www.pld.org.pl/ From owner-netdev@oss.sgi.com Wed May 31 08:57:06 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 08:56:45 -0700 Received: from smtprch1.nortelnetworks.com ([192.135.215.14]:3779 "EHLO smtprch1.nortel.com") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 08:56:34 -0700 Received: from zrchb213.us.nortel.com (actually zrchb213) by smtprch1.nortel.com; Wed, 31 May 2000 10:04:35 -0500 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zrchb213.us.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LW2B065P; Wed, 31 May 2000 10:04:24 -0500 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PT0L0; Thu, 1 Jun 2000 01:04:36 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.40]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id BAA23279 for ; Thu, 1 Jun 2000 01:04:25 +1000 Message-ID: <39352AAB.F9D2653D@uow.edu.au> Date: Thu, 01 Jun 2000 01:07:23 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Re: [timers] net/ipv4 References: <3933B0B2.50AB5EA1@uow.edu.au> from "Andrew Morton" at May 30, 0 10:14:42 pm <200005301659.UAA12445@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing I've revisited net/ipv4/* and I agree it's safe. kuznet@ms2.inr.ac.ru wrote: > > > OK as long as always called from BH context. > > Forget about this argument. I was assuming that the network rx path was called from bh_action() and hence that timers are serialised wrt network rx, But not process context. I see now that network rx is a softirq and is async wrt timers, which makes things tougher. From owner-netdev@oss.sgi.com Wed May 31 09:26:35 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 09:26:25 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:4023 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 09:26:03 -0700 Received: from fred.muc.de (none@ns1062.munich.netsurf.de [195.180.235.62]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id SAA24611; Wed, 31 May 2000 18:26:50 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12xBKm-0005CM-00; Wed, 31 May 2000 18:27:08 +0200 Date: Wed, 31 May 2000 18:27:08 +0200 From: Andi Kleen To: Arkadiusz Miskiewicz Cc: netdev@oss.sgi.com, alan@lxorguk.ukuu.org.uk Subject: Re: (fwd) Re: sin_zero question Message-ID: <20000531182708.A19818@fred.muc.de> References: <20000531164458.A1467@ikar.t17.ds.pwr.wroc.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 0.95.4us In-Reply-To: <20000531164458.A1467@ikar.t17.ds.pwr.wroc.pl>; from Arkadiusz Miskiewicz on Wed, May 31, 2000 at 04:47:55PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, May 31, 2000 at 04:47:55PM +0200, Arkadiusz Miskiewicz wrote: > > ok, so I'm asking: > > struct sockaddr_storage ss; > memset(&ss, 0, sizeof(ss)); > > (gdb) print *(struct sockaddr_in *)&ss > $4 = {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, > sin_zero = "\000\000\000\000\000\000\000"} > > getpeername(s, (struct sockaddr *)&ss, &length); > > (gdb) print *(struct sockaddr_in *)&ss > $4 = {sin_family = 2, sin_port = 5888, sin_addr = {s_addr = 16820416}, > sin_zero = "?\221\020??C?Á"} > > and kernel put's garbage into rest of ss (sin_zero) instead > leaving it alone. > > This is bad thing because we can't memcmp() two structs > to check if adresses, families and ports are equal. > > Can someone fix this ? Here is a patch for 2.2.14/15: Alan, looks like a 2.2.16 candidate. -Andi --- linux/net/socket.c.SA Thu May 4 02:16:54 2000 +++ linux/net/socket.c Wed May 31 18:23:31 2000 @@ -893,6 +893,7 @@ int len, err; lock_kernel(); + memset(address, 0, MAX_SOCK_ADDR); sock = sockfd_lookup(fd, &err); if (!sock) goto out; @@ -920,6 +921,7 @@ int len, err; lock_kernel(); + memset(address, 0, MAX_SOCK_ADDR); if ((sock = sockfd_lookup(fd, &err))!=NULL) { err = sock->ops->getname(sock, (struct sockaddr *)address, &len, 1); -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Wed May 31 10:18:26 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 10:18:16 -0700 Received: from ertpg14e1.nortelnetworks.com ([47.234.0.35]:50656 "EHLO ertpg14e1.nortelnetworks.com") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 10:17:56 -0700 Received: from zsngd101.asiapac.nortel.com (actually znsgd101) by ertpg14e1.nortelnetworks.com; Wed, 31 May 2000 11:04:44 -0400 Received: from zctwb003.asiapac.nortel.com ([47.152.32.111]) by zsngd101.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id L9MTWR5W; Wed, 31 May 2000 23:04:34 +0800 Received: from pwold011.asiapac.nortel.com ([47.181.193.45]) by zctwb003.asiapac.nortel.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id LT9PT0MB; Thu, 1 Jun 2000 01:04:44 +1000 Received: from uow.edu.au (IDENT:akpm@[47.181.194.40]) by pwold011.asiapac.nortel.com (8.9.3/8.9.3) with ESMTP id BAA23285 for ; Thu, 1 Jun 2000 01:04:32 +1000 Message-ID: <39352AB3.7C609B8C@uow.edu.au> Date: Thu, 01 Jun 2000 01:07:31 +1000 X-Sybari-Space: 00000000 00000000 00000000 From: Andrew Morton X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14-15mdk i586) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Re: [timers] net/core/* References: <3933BD65.4E106F8@uow.edu.au> from "Andrew Morton" at May 30, 0 11:08:53 pm <200005301707.VAA12627@ms2.inr.ac.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing kuznet@ms2.inr.ac.ru wrote: > > > I'm dazed and confused. > > I do not understand, what confused you. __dst_free can be called > from any context. I believe there's a very unlikely race in __dst_free(): void __dst_free(struct dst_entry * dst) { // dst_run_gc() could now be running, but hasn't hit the spin_trylock() yet. spin_lock_bh(&dst_lock); /* The first case (dev==NULL) is required, when protocol module is unloaded. */ if (dst->dev == NULL || !(dst->dev->flags&IFF_UP)) { dst->input = dst_discard; dst->output = dst_blackhole; } // still hasn't hit the spin_trylock() dst->obsolete = 2; dst->next = dst_garbage_list; dst_garbage_list = dst; if (dst_gc_timer_inc > DST_GC_INC) { del_timer_async(&dst_gc_timer); dst_gc_timer_inc = DST_GC_INC; dst_gc_timer_expires = DST_GC_MIN; dst_gc_timer.expires = jiffies + dst_gc_timer_expires; add_timer(&dst_gc_timer); } spin_unlock_bh(&dst_lock); // NOW hits spin_trylock(). It succeeds. dst_gc_timer is added a second time. } > > I used del_timer_async(). Added REVIEWME. > > It does reference counting. > > Andrew, did you read my mails or you did not? Avidly. I think I now see what you mean by "reference counting": static int neigh_del_timer(struct neighbour *n) { if (n->nud_state & NUD_IN_TIMER) { if (del_timer_async(&n->timer)) { neigh_release(n); return 1; } } return 0; } If del_timer_async() returns non-zero then we KNOW that the timer was deleted and we KNOW that it hasn't run and we KNOW that it isn't running now. So that's a reference count to the timer and we can safely do the neigh_release(). BTW: asmlinkage void do_softirq() { ... restart: ... if ((active &= mask) != 0) goto retry; ... retry: goto restart; } Doesn't need the double-goto. From owner-netdev@oss.sgi.com Wed May 31 10:33:56 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 10:33:36 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:28427 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 31 May 2000 10:33:21 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA23112; Wed, 31 May 2000 21:34:22 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005311734.VAA23112@ms2.inr.ac.ru> Subject: Re: [timers] net/ipv4 To: andrewm@uow.edu.au (Andrew Morton) Date: Wed, 31 May 2000 21:34:22 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <393478D7.7D131923@uow.edu.au> from "Andrew Morton" at May 31, 0 02:28:39 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Length: 2203 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello! > If you want to say "Andrew, TCP is safe" then that's cool. It is supposed to be safe. Please, audit all, but do not spin too long on one place. 8) You have already found bugs in ipv{4,6} defragmenters, big dusty hole in net/sched and bug in igmp (see below). > handler() > { > foo = 1; > } > > mainline() > { > del_timer_async(&timer); > foo = 2; > } > > That is a race. So why do you say del_timer_sync() is not needed for > static timers? Because del_timer_sync() is useless in such case. If "foo" is required to be reliable state, you must spinlock, because mainline is never single "mainline", there are lots of "mainline"s. F.e. look around your REVIEWME at ipmr.c. Either "foo" is just a hint, like some variables in various garbage collectors and synchronization is not required at all. The only case, when del_timer_sync() really helps, is when "mainline" is serialized by kernel lock or a sleeping semaphore (plus assumption that timers form single thread, which is correct, but I would not rely on this, otherwise future delevpoment will be only more difficult) In linux/net/ it occurs only in some control paths (dev->open etc.). That's why del_timer_sync() is almost never useful there. But it is very useful in the areas, which rely on kernel lock, inside networking device drivers et al. > Where are the refcounts held? It is sk->refcnt. Each reference to socket is counted, including running timers. Socket is destroyed, when the last reference disappears. Advantage of the scheme --- no points of synchronization (except for timer implementation, but it can be changed f.e. by using separate per-cpu timer pools). This property must be hold. But main advantage is that it is evidently free of any kind of deadlocks. Drawback --- trashing sk->refcnt. If we found a way to use something similar del_timer_sync() to call this before socket destruction, but insensitive to socket lock, we could avoid lots of useless work... But I still do not know how to make this not introducing new synchronization points. > // Race here. Right! spin_lock(&im->lock) is forgotten in timer handler. Good spot. > im->reporter = 1; > ip_ma_put(im); > } Alexey From owner-netdev@oss.sgi.com Wed May 31 10:32:36 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 10:32:16 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:49419 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 31 May 2000 10:32:07 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA23306; Wed, 31 May 2000 22:13:15 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005311813.WAA23306@ms2.inr.ac.ru> Subject: Re: recent TCP changes adversive on slow links To: amlaukka@cc.helsinki.FI (Aki M Laukkanen) Date: Wed, 31 May 2000 22:13:15 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Aki M Laukkanen" at May 31, 0 07:14:02 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 723 Lines: 22 Hello! > The following patch in will make the old behaviour come back. I assume > this change was made to prevent over-scheduling Yes. > and thus the patch as it > is might not be acceptable. Maybe a variant which tests against > (sk->sndbuf - WATERMARK) could be? It is difficult to believe that the problem is here, even if this change kills the effect. Actually, I am even not sure, that this is problem at all, congestion window is respected and burst exist only when looking to tcpdump but not in realily. TCP behaviour is controlled by ACKs, rather than by sndbuf. BTW seems, you have too huge value for tx_queue_len on this link. Set it to 4, it did not work earlier, but it should work in 2.3. Alexey From owner-netdev@oss.sgi.com Wed May 31 11:04:07 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 11:03:56 -0700 Received: from minus.inr.ac.ru ([193.233.7.97]:57611 "HELO ms2.inr.ac.ru") by oss.sgi.com with SMTP id ; Wed, 31 May 2000 11:03:32 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA23547; Wed, 31 May 2000 23:03:20 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200005311903.XAA23547@ms2.inr.ac.ru> Subject: Re: [timers] net/core/* To: andrewm@uow.EDU.AU (Andrew Morton) Date: Wed, 31 May 2000 23:03:20 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <39352AB3.7C609B8C@uow.edu.au> from "Andrew Morton" at May 31, 0 10:13:27 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1186 Lines: 47 Hello! > // NOW hits spin_trylock(). It succeeds. dst_gc_timer is added a > second time. That's why del_timer was made before add_timer, it was not redundant yet. 8) Certainly, add_timer should be converted to mod_timer and del_timer's are deleted. It will be cleaner. > If del_timer_async() returns non-zero then we KNOW that > the timer was deleted and we KNOW that it hasn't run > and we KNOW that it isn't running now. So that's a > reference count to the timer and we can safely do > the neigh_release(). Exactly! Also, we know that nobody but currently executed code needs or knows anything about this object, so that it will be destroyed by neigh_release(). > BTW: > > asmlinkage void do_softirq() > { > ... > restart: > ... > if ((active &= mask) != 0) > goto retry; > ... > retry: > goto restart; > } > > Doesn't need the double-goto. It is consequence of too careful reading intel and alpha programmer's guides. 8) All of them require to help static branch prediction and not to jump forward in the most frequent path. It looks silly, when done in C, but gcc has no means to do this in different way. Alexey From owner-netdev@oss.sgi.com Wed May 31 11:22:57 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 11:22:37 -0700 Received: from colin.muc.de ([193.149.48.1]:12551 "HELO colin.muc.de") by oss.sgi.com with SMTP id ; Wed, 31 May 2000 11:22:29 -0700 Received: by colin.muc.de id <140563-2>; Wed, 31 May 2000 21:21:53 +0200 Message-ID: <20000531212143.64448@colin.muc.de> From: Andi Kleen To: kuznet@ms2.inr.ac.ru Cc: Andrew Morton , netdev@oss.sgi.com Subject: Re: [timers] net/core/* References: <39352AB3.7C609B8C@uow.edu.au> <200005311903.XAA23547@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <200005311903.XAA23547@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Wed, May 31, 2000 at 09:05:06PM +0200 Date: Wed, 31 May 2000 21:21:44 +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 573 Lines: 13 On Wed, May 31, 2000 at 09:05:06PM +0200, kuznet@ms2.inr.ac.ru wrote: > > It is consequence of too careful reading intel and alpha programmer's > guides. 8) All of them require to help static branch prediction > and not to jump forward in the most frequent path. It looks silly, when done > in C, but gcc has no means to do this in different way. gcc 3.0 will fix this, with -freorder-blocks (which knows some common heuristics like == NULL is usually not taken already) and __builtin_expect() it'll allow exorcise of about a zillion gotos from the kernel tree. -Andi From owner-netdev@oss.sgi.com Wed May 31 12:20:57 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 12:20:47 -0700 Received: from panic.ohr.gatech.edu ([130.207.47.194]:271 "EHLO havoc.gtf.org") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 12:20:24 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id QAA19174; Wed, 31 May 2000 16:19:24 -0400 Message-ID: <393573CD.DADE1A55@mandrakesoft.com> Date: Wed, 31 May 2000 16:19:25 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.73 [en] (X11; I; Linux 2.2.15-6mdksmp i686) X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: kuznet@ms2.inr.ac.ru, Andrew Morton , netdev@oss.sgi.com Subject: Re: [timers] net/core/* References: <39352AB3.7C609B8C@uow.edu.au> <200005311903.XAA23547@ms2.inr.ac.ru> <20000531212143.64448@colin.muc.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 845 Lines: 20 Andi Kleen wrote: > > On Wed, May 31, 2000 at 09:05:06PM +0200, kuznet@ms2.inr.ac.ru wrote: > > > > It is consequence of too careful reading intel and alpha programmer's > > guides. 8) All of them require to help static branch prediction > > and not to jump forward in the most frequent path. It looks silly, when done > > in C, but gcc has no means to do this in different way. > > gcc 3.0 will fix this, with -freorder-blocks (which knows some common > heuristics like == NULL is usually not taken already) and __builtin_expect() > > it'll allow exorcise of about a zillion gotos from the kernel tree. I wish... :) We'll still have to support gcc 2.7.2... -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Wed May 31 12:25:27 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 12:25:07 -0700 Received: from laurin.munich.netsurf.de ([194.64.166.1]:54499 "EHLO laurin.munich.netsurf.de") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 12:25:00 -0700 Received: from fred.muc.de (none@ns1096.munich.netsurf.de [195.180.235.96]) by laurin.munich.netsurf.de (8.9.3/8.9.3) with ESMTP id WAA25883; Wed, 31 May 2000 22:24:35 +0200 (MET DST) Received: from andi by fred.muc.de with local (Exim 2.05 #1) id 12xF2Y-0005Jf-00; Wed, 31 May 2000 22:24:34 +0200 Date: Wed, 31 May 2000 22:24:34 +0200 From: Andi Kleen To: Jeff Garzik Cc: Andi Kleen , kuznet@ms2.inr.ac.ru, Andrew Morton , netdev@oss.sgi.com Subject: Re: [timers] net/core/* Message-ID: <20000531222434.A20433@fred.muc.de> References: <39352AB3.7C609B8C@uow.edu.au> <200005311903.XAA23547@ms2.inr.ac.ru> <20000531212143.64448@colin.muc.de> <393573CD.DADE1A55@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.4us In-Reply-To: <393573CD.DADE1A55@mandrakesoft.com>; from Jeff Garzik on Wed, May 31, 2000 at 10:19:37PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 681 Lines: 17 On Wed, May 31, 2000 at 10:19:37PM +0200, Jeff Garzik wrote: > > gcc 3.0 will fix this, with -freorder-blocks (which knows some common > > heuristics like == NULL is usually not taken already) and __builtin_expect() > > > > it'll allow exorcise of about a zillion gotos from the kernel tree. > > I wish... :) We'll still have to support gcc 2.7.2... Not with the highest efficieny though. gcc 2.7.2 kernels will be most likely slower than gcc 3.0 kernels (3.0 has a much better scheduler at least on x86). Adding a few more cycles difference probably do not hurt -- especially when it makes the code easier to read and maintain. -Andi -- This is like TV. I don't like TV. From owner-netdev@oss.sgi.com Wed May 31 13:59:06 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 13:58:46 -0700 Received: from panic.ohr.gatech.edu ([130.207.47.194]:27664 "EHLO havoc.gtf.org") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 13:58:27 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id RAA25863; Wed, 31 May 2000 17:57:58 -0400 Message-ID: <39358AE7.55E892A@mandrakesoft.com> Date: Wed, 31 May 2000 17:57:59 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.73 [en] (X11; I; Linux 2.2.15-6mdksmp i686) X-Accept-Language: en MIME-Version: 1.0 To: Anton Blanchard CC: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com Subject: Re: [PATCH] eepro100 device name <-> pci bus/slot/func mapping References: <20000531114830.J1417@linuxcare.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 1100 Lines: 38 Anton Blanchard wrote: > --- linux/drivers/net/eepro100.c Tue May 23 13:11:19 2000 > +++ linux_work/drivers/net/eepro100.c Wed May 31 11:20:47 2000 > @@ -838,6 +844,10 @@ > > pdev->driver_data = dev; > > +#ifndef USE_IO > + dev->mem_start = pci_resource_start(pdev, 0); > + dev->mem_end = dev->mem_start + pci_resource_len(pdev, 0); Use pci_resource_end, avoid the unnecessary addition. > +#endif > dev->base_addr = ioaddr; > dev->irq = irq; Why is the dev->base_addr assignment not conditional on USE_IO as well? Or be more specific, 1) What are the semantics of mem_{start,end} versus base_addr? And, 2) Why does ifconfig truncate a valid 32-bit address, when dev->base_addr equals something like 0xF1234567? I noticed this but never got around to looking into the reason. In any case, you point out something that needs to be fixed in many drivers... Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Wed May 31 14:16:06 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 14:15:56 -0700 Received: from ren.mcnc.org ([152.45.4.110]:45073 "EHLO ren.mcnc.org") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 14:15:45 -0700 Received: from anr.mcnc.org (localhost.localdomain [127.0.0.1]) by ren.mcnc.org (8.9.3/8.9.3) with ESMTP id SAA18882 for ; Wed, 31 May 2000 18:15:44 -0400 Message-ID: <39358F0D.17627685@anr.mcnc.org> Date: Wed, 31 May 2000 18:15:41 -0400 From: Phoemphun Oothongsap Organization: Advanced Networking Group X-Mailer: Mozilla 4.61 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: sendto problem Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 653 Lines: 33 Hi I am using Red Hat 6.2. I have a problem with using "sendto" command. I try to send a udp packet to different three machines. The packets are sent back to back. The first packet can get through but the second packet gets the Connection refused, the third packet is ok. I get the alternated result between connection refused and successful packet. This is a code snippet : : : while(1) { for(i=1;i<=3;i++) { ret = sendto(sock,(char *)&packet, length,0, (struct sockaddr*)&neighborAddr[i], addrlen); } : : } : : Do anybody have an idea how to fix this problem? Regards, Phoemphun Oothongsap From owner-netdev@oss.sgi.com Wed May 31 14:26:54 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 14:26:36 -0700 Received: from m204-1-p04.warwick.net ([208.242.204.9]:25349 "EHLO circuit.moureaux.com") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 14:26:21 -0700 Received: from circuit.moureaux.com (IDENT:statux@circuit.moureaux.com [192.168.0.1]) by circuit.moureaux.com (8.9.3/8.9.3) with SMTP id SAA01530; Wed, 31 May 2000 18:28:33 -0400 Date: Wed, 31 May 2000 18:28:33 -0400 (EDT) From: Statux X-Sender: statux@circuit.moureaux.com To: Phoemphun Oothongsap cc: netdev@oss.sgi.com Subject: Re: sendto problem In-Reply-To: <39358F0D.17627685@anr.mcnc.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 88 Lines: 4 Well udp packets aren't guaranteed to be delivered or anything.. that's my 2 cents :) From owner-netdev@oss.sgi.com Wed May 31 14:27:06 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 14:26:53 -0700 Received: from [209.245.157.172] ([209.245.157.172]:11140 "EHLO anton") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 14:26:31 -0700 Received: from anton by anton with local (Exim 3.12 #1 (Debian)) id 12xGwC-0006Yx-00; Wed, 31 May 2000 15:26:08 -0700 Date: Wed, 31 May 2000 15:26:08 -0700 From: Anton Blanchard To: Jeff Garzik Cc: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com Subject: Re: [PATCH] eepro100 device name <-> pci bus/slot/func mapping Message-ID: <20000531152608.A24217@linuxcare.com> References: <20000531114830.J1417@linuxcare.com> <39358AE7.55E892A@mandrakesoft.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii User-Agent: Mutt/1.0.1i In-Reply-To: <39358AE7.55E892A@mandrakesoft.com>; from jgarzik@mandrakesoft.com on Wed, May 31, 2000 at 05:57:59PM -0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Content-Length: 848 Lines: 32 > Use pci_resource_end, avoid the unnecessary addition. Good point. > Why is the dev->base_addr assignment not conditional on USE_IO as well? OK, to be consistent it should not be. > Or be more specific, > 1) What are the semantics of mem_{start,end} versus base_addr? And, > 2) Why does ifconfig truncate a valid 32-bit address, when > dev->base_addr equals something like 0xF1234567? I noticed this but > never got around to looking into the reason. I do not know the intended semantics, but someone decided that a short was enough to store the base address: struct ifmap { unsigned long mem_start; unsigned long mem_end; unsigned short base_addr; unsigned char irq; unsigned char dma; unsigned char port; /* 3 bytes spare */ }; So we have to use mem_start/end for this. Anton From owner-netdev@oss.sgi.com Wed May 31 16:26:57 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 16:26:37 -0700 Received: from mail.cyberus.ca ([209.195.95.1]:48846 "EHLO cyberus.ca") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 16:26:29 -0700 Received: from shell.cyberus.ca (shell [209.195.95.7]) by cyberus.ca (8.9.3/8.9.3/Cyberus Online Inc.) with ESMTP id UAA28939; Wed, 31 May 2000 20:26:28 -0400 (EDT) Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.1b+Sun/8.9.3) with ESMTP id UAA10412; Wed, 31 May 2000 20:26:28 -0400 (EDT) Date: Wed, 31 May 2000 20:26:28 -0400 (EDT) From: jamal To: Ben Greear cc: netdev@oss.sgi.com Subject: 802.1q Was (Re: Plans for 2.5 / 2.6 ??? In-Reply-To: <3933777C.E562388C@candelatech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Ben, Your architecture of maintaining a device per VLAN does not scale; (as you might have heard from your numerous attempts to change device lookups). What is the specific reason that you insist on mapping a VLAN to a device? Have you thought of using a VLAN lookup table instead? cheers, jamal I am only asking because i think that sooner than later we need to have 802.1p/q in the kernel and your current scheme is problematic. BTW, it seems there is another 802.1p/q project at sourceforge; From owner-netdev@oss.sgi.com Wed May 31 17:20:48 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 17:20:28 -0700 Received: from dhcp41.toaster.net ([199.108.84.41]:51464 "EHLO schmee.sfgoth.com") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 17:20:13 -0700 Received: (from mitch@localhost) by schmee.sfgoth.com (8.9.3/8.9.3) id SAA15522; Wed, 31 May 2000 18:19:55 -0700 (PDT) Date: Wed, 31 May 2000 18:19:55 -0700 From: Mitchell Blank Jr To: jamal Cc: Ben Greear , netdev@oss.sgi.com Subject: Re: 802.1q Was (Re: Plans for 2.5 / 2.6 ??? Message-ID: <20000531181955.D7402@sfgoth.com> References: <3933777C.E562388C@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from hadi@cyberus.ca on Wed, May 31, 2000 at 08:26:28PM -0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing > Your architecture of maintaining a device per VLAN does not scale; > (as you might have heard from your numerous attempts to change device > lookups). Is it just impossible to make this scale in 2.5? There are other things which could require large numbers of network devices (like large-scale PPPoA/PPPoE termination), it would be nice to support them. > What is the specific reason that you insist on mapping a VLAN to a device? > Have you thought of using a VLAN lookup table instead? How would you implement IP filtering on each VLAN then? -Mitch From owner-netdev@oss.sgi.com Wed May 31 17:34:18 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 17:33:58 -0700 Received: from panic.ohr.gatech.edu ([130.207.47.194]:530 "EHLO havoc.gtf.org") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 17:33:44 -0700 Received: from mandrakesoft.com (adsl-77-228-135.atl.bellsouth.net [216.77.228.135]) by havoc.gtf.org (8.9.3/8.9.3) with ESMTP id VAA03115; Wed, 31 May 2000 21:33:20 -0400 Message-ID: <3935BD60.BB1F922D@mandrakesoft.com> Date: Wed, 31 May 2000 21:33:20 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.73 [en] (X11; I; Linux 2.2.15-6mdksmp i686) X-Accept-Language: en MIME-Version: 1.0 To: Anton Blanchard CC: linux-kernel@vger.rutgers.edu, netdev@oss.sgi.com, Alan Cox Subject: Re: [PATCH] eepro100 device name <-> pci bus/slot/func mapping References: <20000531114830.J1417@linuxcare.com> <39358AE7.55E892A@mandrakesoft.com> <20000531152608.A24217@linuxcare.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Anton Blanchard wrote: > > > > Use pci_resource_end, avoid the unnecessary addition. > > Good point. > > > Why is the dev->base_addr assignment not conditional on USE_IO as well? > > OK, to be consistent it should not be. > > > Or be more specific, > > 1) What are the semantics of mem_{start,end} versus base_addr? And, > > 2) Why does ifconfig truncate a valid 32-bit address, when > > dev->base_addr equals something like 0xF1234567? I noticed this but > > never got around to looking into the reason. > > I do not know the intended semantics, but someone decided that a short was > enough to store the base address: > > struct ifmap > { > unsigned long mem_start; > unsigned long mem_end; > unsigned short base_addr; > unsigned char irq; > unsigned char dma; > unsigned char port; > /* 3 bytes spare */ > }; > > So we have to use mem_start/end for this. Well crap. All of the drivers use base_addr regardless of PIO or MMIO, and none use mem_start/end AFAICS. It should be no trouble to simply assign mem_start/end for reporting purposes and ignore them otherwise, but having 'base_addr' be a short here is a real PITA, since it doesn't reflect real-life usage. IMHO we should ditch mem_start/end, OR ditch base_addr... Jeff -- Jeff Garzik | Liberty is always dangerous, but Building 1024 | it is the safest thing we have. MandrakeSoft, Inc. | -- Harry Emerson Fosdick From owner-netdev@oss.sgi.com Wed May 31 18:33:48 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 18:33:28 -0700 Received: from saw.sw.com.sg ([203.120.9.98]:42368 "HELO saw.sw.com.sg") by oss.sgi.com with SMTP id ; Wed, 31 May 2000 18:33:03 -0700 Received: (qmail 12197 invoked by uid 577); 1 Jun 2000 02:32:59 -0000 Message-ID: <20000601103259.A12145@saw.sw.com.sg> Date: Thu, 1 Jun 2000 10:32:59 +0800 From: Andrey Savochkin To: Phoemphun Oothongsap Cc: netdev@oss.sgi.com Subject: Re: sendto problem References: <39358F0D.17627685@anr.mcnc.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <39358F0D.17627685@anr.mcnc.org>; from "Phoemphun Oothongsap" on Wed, May 31, 2000 at 06:15:41PM Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing On Wed, May 31, 2000 at 06:15:41PM -0400, Phoemphun Oothongsap wrote: > I am using Red Hat 6.2. I have a problem with using "sendto" command. > I try to send a udp packet to different three machines. The packets are > sent back to back. The first packet can get through but the second > packet gets the Connection refused, the third packet is ok. I get the The system receives something like ICMP Unreachable message for the previous packet and reported it on the next system call. > alternated > result between connection refused and successful packet. Best regards Andrey V. Savochkin From owner-netdev@oss.sgi.com Wed May 31 20:02:08 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 20:01:58 -0700 Received: from cx97923-a.phnx3.az.home.com ([24.9.112.194]:54022 "EHLO grok.yi.org") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 20:01:40 -0700 Received: from candelatech.com (IDENT:greear@localhost [127.0.0.1]) by grok.yi.org (8.9.3/8.9.3) with ESMTP id VAA26688; Wed, 31 May 2000 21:37:44 -0700 Message-ID: <3935E898.83674015@candelatech.com> Date: Wed, 31 May 2000 21:37:44 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i586) X-Accept-Language: en MIME-Version: 1.0 To: jamal CC: netdev@oss.sgi.com Subject: Re: 802.1q Was (Re: Plans for 2.5 / 2.6 ??? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing jamal wrote: > > Ben, > > Your architecture of maintaining a device per VLAN does not scale; > (as you might have heard from your numerous attempts to change device > lookups). I have no intention of changing device lookups. I did write my own method of naming them, but that's quite trivial. > What is the specific reason that you insist on mapping a VLAN to a device? > Have you thought of using a VLAN lookup table instead? I do use a vlan lookup table, did you think I do a linear search through the vlans for incomming pkts or something??? As far as I recall, every lookup involved in any critical path is constant time. That is one of the reasons I believe my implementation will scale better than the other project... (could be wrong...) A VLAN should look just like an ethernet port. I **WANT** it to look exactly like a device, so why not actually make it one?? I want to use tcpdump on it, I want to route, firewall, bridge and do everything else that a device would do. How can that be done if the VLAN is not a device, and if it could be done, why would it be any more efficient? Doesn't Frame Relay and ATM PVCs have the exact same problem? I keep hearing that it doesn't scale, but I think that is a problem that should be fixed elsewhere, if its true. I can see no logical reason why adding devices should have a linear slowdown on critical paths. If one is too squeamish to deal with an ifconfig -a that takes forever, I'm sure one could write ifconfig to be faster, but it doesn't really matter, because the people who want lots of interfaces, want lots of interfaces. Ever configured 100 FrameRelay pvcs on a Cisco? It sucks to print out the config, but it works just fine. > > cheers, > jamal > I am only asking because i think that sooner than later we need to have > 802.1p/q in the kernel and your current scheme is problematic. I haven't looked at the higher protocols, so if you can point out a real deficiency that multiple interfaces cause, and can suggest any other way of doing multiple interfaces without doing multiple interfaces, please let me know! > BTW, it seems there is another 802.1p/q project at sourceforge; Yes, there is a link from my page to theirs. I was under the impression that they use multiple interfaces too, just that they put them in a linked list (slaved off of an ethernet device), instead of in an indexed table like I do. This would imply to me that their solution really wouldn't be too great if they put 100 VLANs on a single NIC. They may have changed their implementation though... Ben -- Ben Greear (greearb@candelatech.com) http://www.candelatech.com Author of ScryMUD: scry.wanfear.com 4444 (Released under GPL) http://scry.wanfear.com http://scry.wanfear.com/~greear